Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
OpenDILab开源决策智能平台
DI-engine
提交
2b181eda
D
DI-engine
项目概览
OpenDILab开源决策智能平台
/
DI-engine
上一次同步 2 年多
通知
56
Star
321
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
1
Wiki
分析
仓库
DevOps
项目成员
Pages
D
DI-engine
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
1
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
2b181eda
编写于
12月 03, 2021
作者:
N
niuyazhe
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix(nyz): rename sum keepdims to keepdim for compatiblity and remove sql wrapper
上级
f087d2c7
变更
1
隐藏空白更改
内联
并排
Showing
1 changed file
with
2 addition
and
48 deletion
+2
-48
ding/model/wrapper/model_wrappers.py
ding/model/wrapper/model_wrappers.py
+2
-48
未找到文件。
ding/model/wrapper/model_wrappers.py
浏览文件 @
2b181eda
...
...
@@ -338,7 +338,7 @@ class EpsGreedyMultinomialSampleWrapper(IModelWrapper):
for
i
,
l
in
enumerate
(
logit
):
if
np
.
random
.
random
()
>
eps
:
prob
=
torch
.
softmax
(
output
[
'logit'
]
/
alpha
,
dim
=-
1
)
prob
=
prob
/
torch
.
sum
(
prob
,
1
,
keepdim
s
=
True
)
prob
=
prob
/
torch
.
sum
(
prob
,
1
,
keepdim
=
True
)
pi_action
=
torch
.
zeros
(
prob
.
shape
)
pi_action
=
Categorical
(
prob
)
pi_action
=
pi_action
.
sample
()
...
...
@@ -386,7 +386,7 @@ class HybridEpsGreedyMultinomialSampleWrapper(IModelWrapper):
for
i
,
l
in
enumerate
(
logit
):
if
np
.
random
.
random
()
>
eps
:
prob
=
torch
.
softmax
(
l
,
dim
=-
1
)
prob
=
prob
/
torch
.
sum
(
prob
,
1
,
keepdim
s
=
True
)
prob
=
prob
/
torch
.
sum
(
prob
,
1
,
keepdim
=
True
)
pi_action
=
Categorical
(
prob
)
pi_action
=
pi_action
.
sample
()
action
.
append
(
pi_action
)
...
...
@@ -441,51 +441,6 @@ class EpsGreedySampleNGUWrapper(IModelWrapper):
return
output
class
EpsGreedySampleWrapperSql
(
IModelWrapper
):
r
"""
Overview:
Epsilon greedy sampler coupled with multinomial sample used in collector_model
to help balance exploration and exploitation.
Interfaces:
register
"""
def
forward
(
self
,
*
args
,
**
kwargs
):
eps
=
kwargs
.
pop
(
'eps'
)
alpha
=
kwargs
.
pop
(
'alpha'
)
output
=
self
.
_model
.
forward
(
*
args
,
**
kwargs
)
assert
isinstance
(
output
,
dict
),
"model output must be dict, but find {}"
.
format
(
type
(
output
))
logit
=
output
[
'logit'
]
assert
isinstance
(
logit
,
torch
.
Tensor
)
or
isinstance
(
logit
,
list
)
if
isinstance
(
logit
,
torch
.
Tensor
):
logit
=
[
logit
]
if
'action_mask'
in
output
:
mask
=
output
[
'action_mask'
]
if
isinstance
(
mask
,
torch
.
Tensor
):
mask
=
[
mask
]
logit
=
[
l
.
sub_
(
1e8
*
(
1
-
m
))
for
l
,
m
in
zip
(
logit
,
mask
)]
else
:
mask
=
None
action
=
[]
for
i
,
l
in
enumerate
(
logit
):
if
np
.
random
.
random
()
>
eps
:
prob
=
torch
.
softmax
(
output
[
'logit'
]
/
alpha
,
dim
=-
1
)
prob
=
prob
/
torch
.
sum
(
prob
,
1
,
keepdims
=
True
)
pi_action
=
torch
.
zeros
(
prob
.
shape
)
pi_action
=
Categorical
(
prob
)
pi_action
=
pi_action
.
sample
()
action
.
append
(
pi_action
)
else
:
if
mask
:
action
.
append
(
sample_action
(
prob
=
mask
[
i
].
float
()))
else
:
action
.
append
(
torch
.
randint
(
0
,
l
.
shape
[
-
1
],
size
=
l
.
shape
[:
-
1
]))
if
len
(
action
)
==
1
:
action
,
logit
=
action
[
0
],
logit
[
0
]
output
[
'action'
]
=
action
return
output
class
ActionNoiseWrapper
(
IModelWrapper
):
r
"""
Overview:
...
...
@@ -629,7 +584,6 @@ wrapper_name_map = {
'hybrid_argmax_sample'
:
HybridArgmaxSampleWrapper
,
'eps_greedy_sample'
:
EpsGreedySampleWrapper
,
'eps_greedy_sample_ngu'
:
EpsGreedySampleNGUWrapper
,
'eps_greedy_sample_sql'
:
EpsGreedySampleWrapperSql
,
'eps_greedy_multinomial_sample'
:
EpsGreedyMultinomialSampleWrapper
,
'hybrid_eps_greedy_sample'
:
HybridEpsGreedySampleWrapper
,
'hybrid_eps_greedy_multinomial_sample'
:
HybridEpsGreedyMultinomialSampleWrapper
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录