AlphaGo [[1]](#ref1) 在仿真阶段使用了一个深度策略网络,这使得仿真比仅仅使用随机仿真更加真实。在围棋这种复杂的游戏中,仿真直到结束是不合适的,AlphaGo 会提前停止仿真,同时还使用了一个价值网络来获得获胜概率。最近,AlphaGo Zero [[2]](#ref2) 被提出,它使用一个单一的网络来同时输出策略和价值函数,并且只使用自玩来训练而没有内置的专家知识。AlphaGo Zero 的表现比 AlphaGo 更加令人印象深刻。
## 参考文献
1.<spanid="ref1">D. Silver et al, "Mastering the game of Go with deep neural networks and tree search," *Nature*, 2016.</span>
2.<spanid="ref2">D. Silver et al, "Mastreing the game of Go without human knowledge," *Nature*, 2017.</span>
3.<spanid="ref3">J. Bradberry, "[Introduction to Monte Carlo tree search](https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/)," 2015.</span>
4.<spanid="ref4">S. Gelly, and D. Silver, "Monte-Carlo tree search and rapid action value estimation in computer Go," *Artificial Intelligence*, 2011.</span>