Zhoubo01 es (#127)

* add learning curve for ES * add learning curve for ES * support new APIs of the cluster * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * rename learner.py * Update README.md * Update README.md * Update README.cn.md * Update README.md * Update README.cn.md * Update README.md

Zhoubo01 es (#127)
* add learning curve for ES * add learning curve for ES * support new APIs of the cluster * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * rename learner.py * Update README.md * Update README.md * Update README.cn.md * Update README.md * Update README.cn.md * Update README.md
5612ecde · Bo Zhou · GitHub · 60d68135 · 5612ecde · 5612ecde
13 changed file
--- a/README.cn.md
+++ b/README.cn.md
@@ -2,7 +2,8 @@
 <img src=".github/PARL-logo.png" alt="PARL" width="500"/>
 </p>

-[English](./README.md) | 简体中文
+[English](./README.md) | 简体中文   
+[**文档**](https://parl.readthedocs.io)

 > PARL 是一个高性能、灵活的强化学习框架。
 # 特点
@@ -28,46 +29,11 @@ PARL的目标是构建一个可以完整复杂任务的智能体。以下是用
 ### Agent
 `Agent` 负责算法与环境的交互，在交互过程中把生成的数据提供给`Algorithm`来更新模型(`Model`)，数据的预处理流程也一般定义在这里。

-以下是构建一个包含DQN算法的智能体(`Agent`)用来玩雅达利游戏(`Atari Games`)的示例：
-
-```python
-import parl
-from parl.algorithms import DQN
-
-class AtariModel(parl.Model):
-	"""AtariModel
-	This class defines the forward part for an algorithm,
-	its input is state observed on environment.
-	"""
-	def __init__(self, img_shape, action_dim):
-		# define your layers
-		self.cnn1 = layers.conv_2d(num_filters=32, filter_size=5,
-			 			stride=1, padding=2, act='relu')
-		...
-		self.fc1 = layers.fc(action_dim)
-		
-	def value(self, img):
-		# define how to estimate the Q value based on the image of atari games.
-		img = img / 255.0
-		l = self.cnn1(img)
-		...
-		Q = self.fc1(l)
-		return Q
-"""
-三步定义一个智能体：
-   1.  定义前向模型，就是上面的值函数网络(Value)，定义了如何针对输入的游戏图像评估Q值。
-   2.  通过DQN算法来更新模型(Model)，在这里我们直接import仓库中实现好的DQN算法即可。
-   3.  在AtariAgent中定义数据交互部分，把交互过程中得到的数据用来传给DQN算法以更新模型。
-"""
-
-model = AtariModel(img_shape=(32, 32), action_dim=4)
-algorithm = DQN(model)
-agent = AtariAgent(algorithm)
-```
+提示： 请访问[教程](https://parl.readthedocs.io/en/latest/getting_started.html) and [API 文档](https://parl.readthedocs.io/en/latest/model.html)以获取更多关于基础类的信息。

 # 简易高效的并行接口
 在PARL中，一个**修饰符**(parl.remote_class)就可以帮助用户实现自己的并行算法。
-以下我们通过`Hello World`的例子来说明如何简单地通过PARL来调度外部的计算资源实现并行计算。
+以下我们通过`Hello World`的例子来说明如何简单地通过PARL来调度外部的计算资源实现并行计算。 请访问我们的[教程文档](https://parl.readthedocs.io/en/latest/parallel_training/setup.html)以获取更多的并行训练信息。
 ```python
 #============Agent.py=================
 @parl.remote_class
@@ -79,21 +45,14 @@ class Agent(object):
 	def sum(self, a, b):
 		return a+b

-# launch `Agent.py` at any computation platforms such as a CPU cluster.
-if __main__ == '__main__':
-    agent = Agent()
-    agent.as_remote(server_address)
-
-
-#============Server.py=================
-remote_manager = parl.RemoteManager()
-agent = remote_manager.get_remote()
+parl.connect('localhost:8037')
+agent = Agent()
 agent.say_hello()
 ans = agent.sum(1,5) # run remotely and not comsume any local computation resources 
 ```
 两步调度外部的计算资源：
 1. 使用`parl.remote_class`修饰一个类，之后这个类就被转化为可以运行在其他CPU或者机器上的类。
-2. 通过`RemoteManager`获取远端的类实例，通过这种方式获取到的实例和原来的类是有同样的函数的。由于这些类是在别的计算资源上运行的，执行这些函数**不再消耗当前线程计算资源**。
+2. 调用`parl.connect`函数来初始化并行通讯，通过这种方式获取到的实例和原来的类是有同样的函数的。由于这些类是在别的计算资源上运行的，执行这些函数**不再消耗当前线程计算资源**。

 <img src=".github/decorator.png" alt="PARL" width="450"/>


--- a/README.md
+++ b/README.md
@@ -2,7 +2,8 @@
 <img src=".github/PARL-logo.png" alt="PARL" width="500"/>
 </p>

-English | [简体中文](./README.cn.md)
+English | [简体中文](./README.cn.md)   
+[**Documentation**](https://parl.readthedocs.io)

 > PARL is a flexible and high-efficient reinforcement learning framework.

@@ -28,47 +29,12 @@ The main abstractions introduced by PARL that are used to build an agent recursi
 `Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.

 ### Agent
-`Agent`, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process.
+`Agent`, a data bridge between the environment and the algorithm, is responsible for data I/O with the outside environment and describes data preprocessing before feeding data into the training process.  

-Here is an example of building an agent with DQN algorithm for Atari games.
-```python
-import parl
-from parl.algorithms import DQN, DDQN
-
-class AtariModel(parl.Model):
-    """AtariModel
-    This class defines the forward part for an algorithm,
-    its input is state observed on the environment.
-    """
-    def __init__(self, img_shape, action_dim):
-        # define your layers
-        self.cnn1 = layers.conv_2d(num_filters=32, filter_size=5,
-                         stride=1, padding=2, act='relu')
-        ...
-        self.fc1 = layers.fc(action_dim)
-        
-    def value(self, img):
-        # define how to estimate the Q value based on the image of atari games.
-        img = img / 255.0
-        l = self.cnn1(img)
-        ...
-        Q = self.fc1(l)
-        return Q
-"""
-three steps to build an agent
-   1.  define a forward model which is critic_model in this example
-   2.  a. to build a DQN algorithm, just pass the critic_model to `DQN`
-       b. to build a DDQN algorithm, just replace DQN in the following line with DDQN
-   3.  define the I/O part in AtariAgent so that it could update the algorithm based on the interactive data
-"""
-
-model = AtariModel(img_shape=(32, 32), action_dim=4)
-algorithm = DQN(model)
-agent = AtariAgent(algorithm)
-```
+Note: For more information about base classes, please visit our [tutorial](https://parl.readthedocs.io/en/latest/getting_started.html) and [API documentation](https://parl.readthedocs.io/en/latest/model.html).

 # Parallelization
-PARL provides a compact API for distributed training, allowing users to transfer the code into a parallelized version by simply adding a decorator.  
+PARL provides a compact API for distributed training, allowing users to transfer the code into a parallelized version by simply adding a decorator. For more information about our APIs for parallel training, please visit our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html).  
 Here is a `Hello World` example to demonstrate how easy it is to leverage outer computation resources.
 ```python
 #============Agent.py=================
@@ -81,21 +47,14 @@ class Agent(object):
    def sum(self, a, b):
        return a+b

-# launch `Agent.py` at any computation platforms such as a CPU cluster.
-if __main__ == '__main__':
-    agent = Agent()
-    agent.as_remote(server_address)
-
-
-#============Server.py=================
-remote_manager = parl.RemoteManager()
-agent = remote_manager.get_remote()
+parl.connect('localhost:8037')
+agent = Agent()
 agent.say_hello()
 ans = agent.sum(1,5) # run remotely and not consume any local computation resources
 ```
 Two steps to use outer computation resources:
 1. use the `parl.remote_class` to decorate a class at first, after which it is transferred to be a new class that can run in other CPUs or machines.
-2. Get remote objects from the `RemoteManager`, and these objects have the same functions as the real ones. However, calling any function of these objects **does not** consume local computation resources since they are executed elsewhere.
+2. call `parl.connect` to initialize parallel communication before creating an object. Calling any function of the objects **does not** consume local computation resources since they are executed elsewhere.

 <img src=".github/decorator.png" alt="PARL" width="450"/>
 As shown in the above figure, real actors(orange circle) are running at the cpu cluster, while the learner(blue circle) is running at the local gpu with several remote actors(yellow circle with dotted edge).  

--- a/examples/A2C/README.md
+++ b/examples/A2C/README.md
@@ -34,10 +34,10 @@ Note that if you have started a master before, you don't have to run the above
 command. For more information about the cluster, please refer to our
 [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html)

-Then we can start the distributed training by running `learner.py`.
+Then we can start the distributed training by running:

 ```bash
-python learner.py
+python train.py
 ```

 ### Reference

--- a/examples/A2C/learner.py
+++ b/examples/A2C/learner.py
--- a/examples/ES/README.md
+++ b/examples/ES/README.md
 ## Reproduce ES with PARL
-Based on PARL, the Evolution Strategies (ES) algorithm has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks.
+Based on PARL, we have implemented the Evolution Strategies (ES) algorithm and evaluate it in Mujoco environments. Its performance reaches the same level of indicators as the paper.

 + ES in
 [Evolution Strategies as a Scalable Alternative to Reinforcement Learning](https://arxiv.org/abs/1703.03864)
@@ -8,7 +8,7 @@ Based on PARL, the Evolution Strategies (ES) algorithm has been reproduced, reac
 Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.

 ### Benchmark result
-TODO
+![learninng_curve](learning_curve.png)

 ## How to use
 ### Dependencies
@@ -20,18 +20,21 @@ TODO

 ### Distributed Training

-#### Learner
-```sh
-python learner.py 
+To replicate the performance reported above, we encourage you to train with 96 CPUs.  
+If you haven't created a cluster before, enter the following command to create a cluster. For more information about the cluster, please refer to our [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html).
+
+```bash
+xparl start --port 8037 --cpu_num 96
 ```

-#### Actors
-```sh
-sh run_actors.sh
+Then we can start the distributed training by running:
+
+
+```bash
+python train.py
 ```

-You can change training settings (e.g. `env_name`, `server_ip`) in `es_config.py`. If you want to use different number of actors, please modify `actor_num` in both `es_config.py` and `run_actors.sh`.
-Training result will be saved in `log_dir/train/result.csv`.
+Training result will be saved in `train_log` with training curve that can be visualized in tensorboard data.

 ### Reference
 + [Ray](https://github.com/ray-project/ray)

--- a/examples/ES/es_config.py
+++ b/examples/ES/es_config.py
@@ -14,8 +14,7 @@

 config = {
    #==========  remote config ==========
-    'server_ip': 'localhost',
-    'server_port': 8037,
+    'master_address': 'localhost:8037',

    #==========  env config ==========
    'env_name': 'Humanoid-v1',

--- a/examples/ES/learning_curve.png
+++ b/examples/ES/learning_curve.png
--- a/examples/ES/run_actors.sh
+++ b/examples/ES/run_actors.sh
-#!/bin/bash
-
-export CPU_NUM=1
-
-actor_num=96
-
-for i in $(seq 1 $actor_num); do
-    python actor.py &
-done;
-wait
--- a/examples/ES/learner.py
+++ b/examples/ES/learner.py
@@ -23,10 +23,10 @@ from obs_filter import MeanStdFilter
 from mujoco_agent import MujocoAgent
 from mujoco_model import MujocoModel
 from noise import SharedNoiseTable
-from parl import RemoteManager
 from parl.utils import logger, tensorboard
 from parl.utils.window_stat import WindowStat
 from six.moves import queue
+from actor import Actor


 class Learner(object):
@@ -53,40 +53,37 @@ class Learner(object):
        self.actors_signal_input_queues = []
        self.actors_output_queues = []

-        self.run_remote_manager()
+        self.create_actors()

        self.eval_rewards_stat = WindowStat(self.config['report_window_size'])
        self.eval_lengths_stat = WindowStat(self.config['report_window_size'])

-    def run_remote_manager(self):
-        """ Accept connection of new remote actor and start sampling of the remote actor.
+    def create_actors(self):
+        """ create actors for parallel training.
        """
-        remote_manager = RemoteManager(port=self.config['server_port'])
-        logger.info('Waiting for {} remote actors to connect.'.format(
-            self.config['actor_num']))

+        parl.connect(self.config['master_address'])
        self.remote_count = 0
        for i in range(self.config['actor_num']):
-            remote_actor = remote_manager.get_remote()
            signal_queue = queue.Queue()
            output_queue = queue.Queue()
            self.actors_signal_input_queues.append(signal_queue)
            self.actors_output_queues.append(output_queue)

            self.remote_count += 1
-            logger.info('Remote actor count: {}'.format(self.remote_count))

            remote_thread = threading.Thread(
                target=self.run_remote_sample,
-                args=(remote_actor, signal_queue, output_queue))
+                args=(signal_queue, output_queue))
            remote_thread.setDaemon(True)
            remote_thread.start()

        logger.info('All remote actors are ready, begin to learn.')

-    def run_remote_sample(self, remote_actor, signal_queue, output_queue):
+    def run_remote_sample(self, signal_queue, output_queue):
        """ Sample data from remote actor or get filters of remote actor. 
        """
+        remote_actor = Actor(self.config)
        while True:
            info = signal_queue.get()
            if info['signal'] == 'sample':
@@ -211,6 +208,9 @@ class Learner(object):
 if __name__ == '__main__':
    from es_config import config

+    logger.info(
+        "Before training, it takes a few mimutes to initialize a noise table for exploration"
+    )
    learner = Learner(config)

    while True:

--- a/examples/GA3C/README.md
+++ b/examples/GA3C/README.md
@@ -33,10 +33,10 @@ Note that if you have started a master before, you don't have to run the above
 command. For more information about the cluster, please refer to our
 [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html)

-Then we can start the distributed training by running `learner.py`.
+Then we can start the distributed training by running:

 ```bash
-python learner.py
+python train.py
 ```

 [Tips] The performance can be influenced dramatically in a slower computational

--- a/examples/GA3C/learner.py
+++ b/examples/GA3C/learner.py
--- a/examples/IMPALA/README.md
+++ b/examples/IMPALA/README.md
@@ -37,10 +37,10 @@ Note that if you have started a master before, you don't have to run the above
 command. For more information about the cluster, please refer to our
 [documentation](https://parl.readthedocs.io/en/latest/parallel_training/setup.html)

-Then we can start the distributed training by running `learner.py`.
+Then we can start the distributed training by running:

 ```bash
-python learner.py
+python train.py
 ```

 ### Reference

--- a/examples/IMPALA/learner.py
+++ b/examples/IMPALA/learner.py