由 Swain 提交于 12月 24, 2021

* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma

* polish(nyz): polish ppo config continous field, move to action_space field

* fix(nyz): fix ppo action_space field compatibility bug

* fix(nyz): fix ppg/sac/cql action_space field compatibility bug

* demo(nyz): update gym hybrid hppo config

* polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase

* polish(pu):polish as review

* polish(pu): polish hppo config

* polish(pu): entropy weight=0.03 performs best empirically

* fix(nyz): fix unittest compatibility bugs

* polish(nyz): remove atari env unused print(ci skip)
Co-authored-by: Npuyuan1996 <2402552459@qq.com>

0b71fc4e

walker2d_onppo_default_config.py 2.4 KB

OpenDILab开源决策智能平台 / DI-engine 上一次同步 2 年多

Replace walker2d_onppo_default_config.py

OpenDILab开源决策智能平台 / DI-engine
上一次同步 2 年多