README.md 2.9 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
<p align="center">
<img src=".github/PARL-logo.png" alt="PARL" width="500"/>
</p>

# Features
**Reproducible**. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms

**Large Scale**. Ability to support high performance parallelization of training with thousands of CPUs and multi-GPUs 

**Reusable**.  Algorithms provided in repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.

**Extensible**. Build new algorithms quickly by inheriting the abstract class in the framework.


# Abstractions
<img src=".github/abstractions.png" alt="abstractions" width="400"/>  
PARL aims to build an agent for training algorithms to perform complex tasks.   
The main abstractions introduced by PARL that are used to build an agent recursively are the following:

### Model
`Model` is abstracted to construct the forward network which defines a policy network or critic network given state as input.

### Algorithm
`Algorithm` describes the mechanism to update parameters in `Model` and often contains at least one model.

### Agent
`Agent` is a data bridge between environment and algorithm. It is responsible for data I/O with outside and describes data preprocessing before feeding into the training process.

Here is an example of building an agent with DQN algorithm for atari games.
```python
import parl
from parl.algorithms import DQN, DDQN

B
Bo Zhou 已提交
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
class AtariModel(parl.Model):
	"""AtariModel
	This class defines the forward part for an algorithm,
	its input is state observed on environment.
	"""
	def __init__(self, img_shape, action_dim):
		# define your layers
		self.cnn1 = layers.conv_2d(num_filters=32, filter_size=5,
			 			stride=[1, 1], padding=[2, 2], act='relu')
		...
		self.fc1 = layers.fc(action_dim)
	def value(self, img):
		# define how to estimate the Q value based on the image of atari games.
		img = img / 255.0
		l = self.cnn1(img)
		...
		Q = self.fc1(l)
		return Q
52 53 54 55 56 57 58 59
"""
three steps to build an agent
   1.  define a forward model which is critic_model is this example
   2.  a. to build a DQN algorithm, just pass the critic_model to `DQN`
       b. to build a DDQN algorithm, just replace DQN in following line with DDQN
   3.  define the I/O part in AtariAgent so that it could update the algorithm based on the interactive data
"""

B
Bo Zhou 已提交
60 61
model = AtariModel(img_shape=(32, 32), action_dim=4)
algorithm = DQN(model)
62 63 64 65 66 67 68 69 70 71 72 73 74 75
agent = AtariAgent(aglrotihm)
```

# Install:
### Dependencies
- Python 2.7 or 3.5+. 
- PaddlePaddle >=1.0 (We try to make our repository always compatible with newest version PaddlePaddle)  


```
pip install --upgrade git+https://github.com/PaddlePaddle/PARL.git
```

# Examples
H
Hongsheng Zeng 已提交
76 77
- [QuickStart](examples/QuickStart/)
- [DQN](examples/DQN/)
H
Hongsheng Zeng 已提交
78
- [DDPG](examples/DDPG/)
79
- PPO
H
Hongsheng Zeng 已提交
80
- [Winning Solution for NIPS2018: AI for Prosthetics Challenge](examples/NeurIPS2018-AI-for-Prosthetics-Challenge/)