-
由 Swain 提交于
* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma * polish(nyz): polish ppo config continous field, move to action_space field * fix(nyz): fix ppo action_space field compatibility bug * fix(nyz): fix ppg/sac/cql action_space field compatibility bug * demo(nyz): update gym hybrid hppo config * polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase * polish(pu):polish as review * polish(pu): polish hppo config * polish(pu): entropy weight=0.03 performs best empirically * fix(nyz): fix unittest compatibility bugs * polish(nyz): remove atari env unused print(ci skip) Co-authored-by: Npuyuan1996 <2402552459@qq.com>
0b71fc4e