• S
    feature(nyz): add H-PPO hybrid action space algorithm (#140) · 0b71fc4e
    Swain 提交于
    * feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma
    
    * polish(nyz): polish ppo config continous field, move to action_space field
    
    * fix(nyz): fix ppo action_space field compatibility bug
    
    * fix(nyz): fix ppg/sac/cql action_space field compatibility bug
    
    * demo(nyz): update gym hybrid hppo config
    
    * polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase
    
    * polish(pu):polish as review
    
    * polish(pu): polish hppo config
    
    * polish(pu): entropy weight=0.03 performs best empirically
    
    * fix(nyz): fix unittest compatibility bugs
    
    * polish(nyz): remove atari env unused print(ci skip)
    Co-authored-by: Npuyuan1996 <2402552459@qq.com>
    0b71fc4e
walker2d_onppo_default_config.py 2.4 KB