feature(nyz): add H-PPO hybrid action space algorithm (#140)
* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma
* polish(nyz): polish ppo config continous field, move to action_space field
* fix(nyz): fix ppo action_space field compatibility bug
* fix(nyz): fix ppg/sac/cql action_space field compatibility bug
* demo(nyz): update gym hybrid hppo config
* polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase
* polish(pu):polish as review
* polish(pu): polish hppo config
* polish(pu): entropy weight=0.03 performs best empirically
* fix(nyz): fix unittest compatibility bugs
* polish(nyz): remove atari env unused print(ci skip)
Co-authored-by: Npuyuan1996 <2402552459@qq.com>
Showing