- 01 1月, 2022 2 次提交
- 31 12月, 2021 1 次提交
-
-
由 qiuqiu 提交于
* feature(gg15): add gumbel softmax * add ww * fix format
-
- 30 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 29 12月, 2021 2 次提交
-
-
由 Robin Chen 提交于
-
由 Xu Jingxin 提交于
-
- 27 12月, 2021 1 次提交
-
-
由 Xu Jingxin 提交于
-
- 24 12月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma * polish(nyz): polish ppo config continous field, move to action_space field * fix(nyz): fix ppo action_space field compatibility bug * fix(nyz): fix ppg/sac/cql action_space field compatibility bug * demo(nyz): update gym hybrid hppo config * polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase * polish(pu):polish as review * polish(pu): polish hppo config * polish(pu): entropy weight=0.03 performs best empirically * fix(nyz): fix unittest compatibility bugs * polish(nyz): remove atari env unused print(ci skip) Co-authored-by: Npuyuan1996 <2402552459@qq.com>
-
- 23 12月, 2021 2 次提交
-
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
* Cli ditask * Import ditask in init * Add current path as default package path * Fix style * Add topology on ditask
-
- 22 12月, 2021 2 次提交
-
-
由 Xu Jingxin 提交于
* Fix recur reference in task and parallel, add distributed logger * Update logger * Clear ref list when exit task/parallel * Put task in with statment * Fix test * FFix test * Test is hard * More comments
-
由 蒲源 提交于
* polish(pu): polish r2d3 * polish(pu): first abs then sum each item in td-error * fix(pu): fix dqfd compatibility
-
- 21 12月, 2021 3 次提交
- 19 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 17 12月, 2021 2 次提交
- 16 12月, 2021 1 次提交
-
-
由 Will-Nie 提交于
* add comments for r2d2 * sort style * revise according to the comments * fix style * add r2d2 residual link + commnets * revise according to comments, add spaceinvader * add test for the model and fix test bugs
-
- 15 12月, 2021 4 次提交
-
-
由 Weiyuhong-1998 提交于
* ma mujoco env and masac code * env(wyh):ma mujoco agent id * feature(wyh):maqac continuous * fix(wyh):multi-mujoco add readme * fix(wyh): td error * fix(wyh)style * fix(wyh):multi agent mujoco test
-
由 niuyazhe 提交于
-
由 Xu Jingxin 提交于
* Init base buffer and storage * Use ratelimit as middleware * Pass style check * Keep the return original return value * Add buffer.view * Add replace flag on sample, rewrite middleware processing * Test slicing * Add buffer copy middleware * Add update/delete api in buffer, rename middleware * Implement update and delete api of buffer * add naive use time count middleware in buffer * Rename next to chain * feature(nyz): add staleness check middleware and polish buffer * feature(nyz): add naive priority experience replay * Sample by indices * Combine buffer and storage layers * Support indices when deleting items from the queue * Use dataclass to save buffered data, remove return_index and return_meta * Add ignore_insufficient * polish(nyz): add return index in push and copy same data in sample * Drop useless import * Fix sample with indices, ensure return size is equal to input size or indices size * Make sure sampled data in buffer is different from each other * Support sample by grouped meta key * Support sample by rolling window * Add import/export data in buffer * Padding after sampling from buffer * Polish use_time_check * Use buffer as dataset * Set collate_fn in buffer test * Init framework * Remove set_default, add keep * Move backward_stack to task * Fix total_step * Pydash pick is too slow * Add step records * Add async mode * Reuse forward and backward functions in sequence * Fix sample profile * demo(nyz): add atari pong runnable demo * Fix forward bug * Add task test * Test pong * feature(nyz): add deque buffer compatibility wrapper and demo * polish(nyz): polish code style and add pong dqn new deque buffer demo * Use sync mode * Config worker number * Init parallel mode * Add prev property on context * Mesh workers * First version of parallel mode * Make send rpc async * Dont pickle prev * Support tcp * More cleanup on system exit * Test parallel and task * Enable task copy * Test attach mode * Add with statment * Polish code * Raise exception when timeout in attach mode * Add event listeners * feature(nyz): add pendulum sac new pipeline demo * Fix main * Add profiler and step profiler * Rewrite parallel, cleanup res after task finished * Add comments * Remove ctx.prev * Enable standalone parallel mode * Remove hooks on ctx * Add max mean * demo(nyz): add pong dqn new pipeline demo * Ensure parallel sock closed before program exit * Fix parallel test * Fix pong * feature(zjow): add feature of profile in ding (#135) * add profiling feature in ding cli. * fix ding --profile cli. * reformat files. * reformat files again. * reformat files again. * Remove flameprof * Change kept_keys to set * Use finish as a properity * Use wrapper * Reformat step timer output * Test random seed * Revert learning rate * Add topology on parallel * Use labels on task * Star in parallel mode * Don't use daemon process * Auto sync finish state * Return logvars * Fix test wrapper * Fix test profiler helper * Pass flake_check * Lazy launch * Reporter * Replace main with main_sac * Fix parallel ctx * Fix test * Fix merge issues Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com> Co-authored-by: Nzjowowen <93968541+zjowowen@users.noreply.github.com>
-
由 Ke Li 提交于
* feature(lk): fix port conflict * polish(lk): polish code style and format * fix(lk): change to subprocess
-
- 14 12月, 2021 4 次提交
-
-
由 niuyazhe 提交于
-
由 Will-Nie 提交于
* add trex algorithm for pong * sort style * add atari, ll,cp; fix device, collision; add_ppo * add accuracy evaluation * correct style * add seed to make sure results are replicable * remove useless part in cum return of model part * add mujoco onppo training pipeline; ppo config * improve style * add sac training config for mujoco * add log, add save data; polish config * logger; hyperparameter;walker * correct style * modify else condition * change rnd to trex * revise according to comments, add eposode collect * new collect mode for trex, fix all bugs, commnets * final change * polish after the final comment * add readme/test * add test for serial entry of trex/gcl * sort style * change mujoco to cartpole for test for trex_onppo * remove files generated by testing * revise tests for entry * sort style * revise tests * modify pytest * fix(nyz): speed up ppg/ppo and marl algo unittest * polish(nyz): speed up trex unittest and fix trex entry default config bug * fix(nyz): fix same name bug * fix(nyz): fix remove conflict bug(ci skip) Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
由 niuyazhe 提交于
-
由 Will-Nie 提交于
* add comments for r2d2 * sort style * revise according to the comments * fix style
-
- 13 12月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add delay reward mujoco env * test(nyz): add delay reward mujoco env test and fix bug
-
- 12 12月, 2021 1 次提交
-
-
由 Ming Zhang 提交于
-
- 09 12月, 2021 2 次提交
-
-
由 niuyazhe 提交于
-
由 Xu Jingxin 提交于
* Init base buffer and storage * Use ratelimit as middleware * Pass style check * Keep the return original return value * Add buffer.view * Add replace flag on sample, rewrite middleware processing * Test slicing * Add buffer copy middleware * Add update/delete api in buffer, rename middleware * Implement update and delete api of buffer * add naive use time count middleware in buffer * Rename next to chain * feature(nyz): add staleness check middleware and polish buffer * feature(nyz): add naive priority experience replay * Sample by indices * Combine buffer and storage layers * Support indices when deleting items from the queue * Use dataclass to save buffered data, remove return_index and return_meta * Add ignore_insufficient * polish(nyz): add return index in push and copy same data in sample * Drop useless import * Fix sample with indices, ensure return size is equal to input size or indices size * Make sure sampled data in buffer is different from each other * Support sample by grouped meta key * Support sample by rolling window * Add import/export data in buffer * Padding after sampling from buffer * Polish use_time_check * Use buffer as dataset * Set collate_fn in buffer test * feature(nyz): add deque buffer compatibility wrapper and demo * polish(nyz): polish code style and add pong dqn new deque buffer demo * feature(nyz): add use_time_count compatibility in wrapper * feature(nyz): add priority replay buffer compatibility in wrapper * Improve performance of buffer.update * polish(nyz): add priority max limit and correct flake8 * Use __call__ to rewrite middleware * Rewrite buffer index * Fix buffer delete * Skip first item * Rewrite buffer delete * Use caller * Use caller in priority * Add group sample Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 08 12月, 2021 4 次提交
-
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 Will-Nie 提交于
* add trex algorithm for pong * sort style * add atari, ll,cp; fix device, collision; add_ppo * add accuracy evaluation * correct style * add seed to make sure results are replicable * remove useless part in cum return of model part * add mujoco onppo training pipeline; ppo config * improve style * add sac training config for mujoco * add log, add save data; polish config * logger; hyperparameter;walker * correct style * modify else condition * change rnd to trex * revise according to comments, add eposode collect * new collect mode for trex, fix all bugs, commnets * final change * polish after the final comment * add readme/test * add test for serial entry of trex/gcl * sort style
-
由 Weiyuhong-1998 提交于
* fix(wyh):masac * feature(wyh):single agent discrete sac * feature(wyh):single agent discrete sac td * fix(wyh):fix pong bug * fix(wyh):fix smac bug * fix(wyh):masac_5m6m best config * env(wyh):allow SMAC env return ippo/isac obs * fix(wyh):masac polish * fix(wyh):masac style * fix(wyh):masac test
-
- 06 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 03 12月, 2021 4 次提交
-
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 Ke Li 提交于
* feature(lk): add initial version of MP-PDQN * fix(lk): fix expand function bug * refactor(nyz): refactor mpdqn continuous args inputs module * fix(nyz): fix pdqn scatter index generation * fix(lk): fix pdqn scatter assignment bug * feature(lk): polish mpdqn code and style format * feature(lk): add mpdqn config and test file * feature(lk): polish mpdqn code and style format * fix(lk): fix import bug * polish(lk): add test for mpdqn * polish(lk): polish code style and format * polish(lk): rm print debug info * polish(lk): rm print debug info * polish(lk): polish code style and format * polish(lk): add MPDQN in readme.md Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
由 Davide Liu 提交于
* added r2d2 + a2c configs * changed convergence reward for some env * removed configs that don't converge * removed 'on_policy' param in 2rd2 configs
-