README.md 16.6 KB
Newer Older
T
tangwei 已提交
1 2 3
<p align="center">
<img align="center" src="doc/imgs/logo.png">
<p>
C
fix  
chengmo 已提交
4 5 6
<p align="center">
<img align="center" src="doc/imgs/structure.png">
<p>
C
fix  
chengmo 已提交
7 8 9
<p align="center">
<img align="center" src="doc/imgs/overview.png">
<p>
C
fix  
chengmo 已提交
10

T
tangwei 已提交
11

C
fix  
chengmo 已提交
12
<h2 align="center">什么是推荐系统?</h2>
C
chengmo 已提交
13 14 15 16 17 18 19 20 21 22
<p align="center">
<img align="center" src="doc/imgs/rec-overview.png">
<p>

- 推荐系统是在互联网信息爆炸式增长的时代背景下,帮助用户高效获得感兴趣信息的关键;

- 推荐系统也是帮助产品最大限度吸引用户、留存用户、增加用户粘性、提高用户转化率的银弹。

- 有无数优秀的产品依靠用户可感知的推荐系统建立了良好的口碑,也有无数的公司依靠直击用户痛点的推荐系统在行业中占领了一席之地。

C
fix  
chengmo 已提交
23 24
  > 可以说,谁能掌握和利用好推荐系统,谁就能在信息分发的激烈竞争中抢得先机。
  > 但与此同时,有着许多问题困扰着推荐系统的开发者,比如:庞大的数据量,复杂的模型结构,低效的分布式训练环境,波动的在离线一致性,苛刻的上线部署要求,以上种种,不胜枚举。
C
chengmo 已提交
25

C
fix  
chengmo 已提交
26
<h2 align="center">什么是PaddleRec?</h2>
T
tangwei 已提交
27 28


C
chengmo 已提交
29 30
- 源于飞桨生态的搜索推荐模型 **一站式开箱即用工具** 
- 适合初学者,开发者,研究者的推荐系统全流程解决方案
C
fix  
chengmo 已提交
31
- 包含内容理解、匹配、召回、排序、 多任务、重排序等多个任务的完整推荐搜索算法库
T
tangwei 已提交
32 33


C
Chengmo 已提交
34 35
    |   方向   |                                   模型                                    | 单机CPU | 单机GPU | 分布式CPU | 分布式GPU | 论文                                                                                                                                                                                                        |
    | :------: | :-----------------------------------------------------------------------: | :-----: | :-----: | :-------: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
C
Chengmo 已提交
36 37 38 39
    | 内容理解 | [Text-Classifcation](models/contentunderstanding/classification/model.py) |    ✓    |    ✓    |     ✓     |     x     | [EMNLP 2014][Convolutional neural networks for sentence classication](https://www.aclweb.org/anthology/D14-1181.pdf)                                                                                        |
    | 内容理解 |         [TagSpace](models/contentunderstanding/tagspace/model.py)         |    ✓    |    ✓    |     ✓     |     x     | [EMNLP 2014][TagSpace: Semantic Embeddings from Hashtags](https://www.aclweb.org/anthology/D14-1194.pdf)                                                                                                    |
    |   匹配   |                    [DSSM](models/match/dssm/model.py)                     |    ✓    |    ✓    |     ✓     |     x     | [CIKM 2013][Learning Deep Structured Semantic Models for Web Search using Clickthrough Data](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)             |
    |   匹配   |        [MultiView-Simnet](models/match/multiview-simnet/model.py)         |    ✓    |    ✓    |     ✓     |     x     | [WWW 2015][A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)             |
C
Chengmo 已提交
40
    |   召回   |                   [TDM](models/treebased/tdm/model.py)                    |    ✓    | >=1.8.0 |     ✓     |  >=1.8.0  | [KDD 2018][Learning Tree-based Deep Model for Recommender Systems](https://arxiv.org/pdf/1801.02294.pdf)                                                                                                    |
C
Chengmo 已提交
41 42
    |   召回   |                [fasttext](models/recall/fasttext/model.py)                |    ✓    |    ✓    |     x     |     x     | [EACL 2017][Bag of Tricks for Efficient Text Classification](https://www.aclweb.org/anthology/E17-2068.pdf)                                                                                                 |
    |   召回   |                [Word2Vec](models/recall/word2vec/model.py)                |    ✓    |    ✓    |     ✓     |     x     | [NIPS 2013][Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) |
C
Chengmo 已提交
43 44 45 46 47
    |   召回   |                     [SSR](models/recall/ssr/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [SIGIR 2016][Multi-Rate Deep Learning for Temporal Recommendation](http://sonyis.me/paperpdf/spr209-song_sigir16.pdf)                                                                                       |
    |   召回   |                 [Gru4Rec](models/recall/gru4rec/model.py)                 |    ✓    |    ✓    |     ✓     |     ✓     | [2015][Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)                                                                                                      |
    |   召回   |             [Youtube_dnn](models/recall/youtube_dnn/model.py)             |    ✓    |    ✓    |     ✓     |     ✓     | [RecSys 2016][Deep Neural Networks for YouTube Recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)                                               |
    |   召回   |                     [NCF](models/recall/ncf/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [WWW 2017][Neural Collaborative Filtering](https://arxiv.org/pdf/1708.05031.pdf)                                                                                                                            |
    |   召回   |                     [GNN](models/recall/gnn/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [AAAI 2019][Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)                                                                                                      |
C
Chengmo 已提交
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
    |   排序   |      [Logistic Regression](models/rank/logistic_regression/model.py)      |    ✓    |    x    |     ✓     |     x     | /                                                                                                                                                                                                           |
    |   排序   |                      [Dnn](models/rank/dnn/model.py)                      |    ✓    |    ✓    |     ✓     |     ✓     | /                                                                                                                                                                                                           |
    |   排序   |                       [FM](models/rank/fm/model.py)                       |    ✓    |    x    |     ✓     |     x     | [IEEE Data Mining 2010][Factorization machines](https://analyticsconsultores.com.mx/wp-content/uploads/2019/03/Factorization-Machines-Steffen-Rendle-Osaka-University-2010.pdf)                             |
    |   排序   |                      [FFM](models/rank/ffm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [RECSYS 2016][Field-aware Factorization Machines for CTR Prediction](https://dl.acm.org/doi/pdf/10.1145/2959100.2959134)                                                                                    |
    |   排序   |                      [FNN](models/rank/fnn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [ECIR 2016][Deep Learning over Multi-field Categorical Data](https://arxiv.org/pdf/1601.02376.pdf)                                                                                                          |
    |   排序   |            [Deep Crossing](models/rank/deep_crossing/model.py)            |    ✓    |    x    |     ✓     |     x     | [ACM 2016][Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features](https://www.kdd.org/kdd2016/papers/files/adf0975-shanA.pdf)                                                   |
    |   排序   |                      [Pnn](models/rank/pnn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [ICDM 2016][Product-based Neural Networks for User Response Prediction](https://arxiv.org/pdf/1611.00144.pdf)                                                                                               |
    |   排序   |                      [DCN](models/rank/dcn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [KDD 2017][Deep & Cross Network for Ad Click Predictions](https://dl.acm.org/doi/pdf/10.1145/3124749.3124754)                                                                                               |
    |   排序   |                      [NFM](models/rank/nfm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [SIGIR 2017][Neural Factorization Machines for Sparse Predictive Analytics](https://dl.acm.org/doi/pdf/10.1145/3077136.3080777)                                                                             |
    |   排序   |                      [AFM](models/rank/afm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [IJCAI 2017][Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks](https://arxiv.org/pdf/1708.04617.pdf)                                                  |
    |   排序   |                   [DeepFM](models/rank/deepfm/model.py)                   |    ✓    |    x    |     ✓     |     x     | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/pdf/1703.04247.pdf)                                                                                 |
    |   排序   |                  [xDeepFM](models/rank/xdeepfm/model.py)                  |    ✓    |    x    |     ✓     |     x     | [KDD 2018][xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/3219819.3220023)                                                       |
    |   排序   |                      [DIN](models/rank/din/model.py)                      |    ✓    |    x    |     ✓     |     x     | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://dl.acm.org/doi/pdf/10.1145/3219819.3219823)                                                                                     |
    |   排序   |                [Wide&Deep](models/rank/wide_deep/model.py)                |    ✓    |    x    |     ✓     |     x     | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454)                                                                                               |
    |   排序   |                    [FGCNN](models/rank/fgcnn/model.py)                    |    ✓    |    ✓    |     ✓     |     ✓     | [WWW 2019][Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1904.04447.pdf)                                                                      |
C
Chengmo 已提交
63 64 65
    |  多任务  |                  [ESMM](models/multitask/esmm/model.py)                   |    ✓    |    ✓    |     ✓     |     ✓     | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931)                                                              |
    |  多任务  |                  [MMOE](models/multitask/mmoe/model.py)                   |    ✓    |    ✓    |     ✓     |     ✓     | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007)                                                       |
    |  多任务  |           [ShareBottom](models/multitask/share-bottom/model.py)           |    ✓    |    ✓    |     ✓     |     ✓     | [1998][Multitask learning](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf)                                                                                                               |
C
Chengmo 已提交
66
    |  重排序  |                [Listwise](models/rerank/listwise/model.py)                |    ✓    |    ✓    |     ✓     |     x     | [2019][Sequential Evaluation and Generation Framework for Combinatorial Recommender System](https://arxiv.org/pdf/1902.00245.pdf)                                                                           |
T
tangwei 已提交
67 68 69 70 71





C
chengmo 已提交
72
<h2 align="center">快速安装</h2>
T
tangwei 已提交
73 74 75 76 77

### 环境要求
* Python 2.7/ 3.5 / 3.6 / 3.7
* PaddlePaddle  >= 1.7.2
* 操作系统: Windows/Mac/Linux
C
chengmo 已提交
78

C
fix  
chengmo 已提交
79
  > Windows下目前仅提供单机训练,建议分布式使用Linux
T
tangwei 已提交
80 81 82
  
### 安装命令

C
fix  
chengmo 已提交
83
- 安装方法一 **PIP源直接安装**
T
tangwei 已提交
84 85 86
  ```bash
  python -m pip install paddle-rec
  ```
C
fix  
chengmo 已提交
87
    > 该方法会默认下载安装`paddlepaddle v1.7.2 cpu版本`,若提示`PaddlePaddle`无法安装,则依照下述方法首先安装`PaddlePaddle`,再安装`PaddleRec`:
C
fix  
chengmo 已提交
88
    > - 可以在[该地址](https://pypi.org/project/paddlepaddle/1.7.2/#files),下载PaddlePaddle后手动安装whl包
C
fix  
chengmo 已提交
89
    > - 可以先pip安装`PaddlePaddle`,`python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple`
C
fix  
chengmo 已提交
90
    > - 其他安装问题可以在[Paddle Issue](https://github.com/PaddlePaddle/Paddle/issues)或[PaddleRec Issue](https://github.com/PaddlePaddle/PaddleRec/issues)提出,会有工程师及时解答
T
tangwei 已提交
91

C
fix  
chengmo 已提交
92 93
- 安装方法二 **源码编译安装**
  
C
fix  
chengmo 已提交
94
  - 安装飞桨  **注:需要用户安装版本 == 1.7.2 的飞桨**
T
tangwei 已提交
95 96

    ```shell
C
fix  
chengmo 已提交
97
    python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple
T
tangwei 已提交
98 99
    ```

C
fix  
chengmo 已提交
100
  - 源码安装PaddleRec
T
tangwei 已提交
101 102 103 104 105 106 107

    ```
    git clone https://github.com/PaddlePaddle/PaddleRec/
    cd PaddleRec
    python setup.py install
    ```

C
fix  
chengmo 已提交
108 109 110 111
- PaddleRec-GPU安装方法

  在使用方法一或方法二完成PaddleRec安装后,需再手动安装`paddlepaddle-gpu`,并根据自身环境(Cuda/Cudnn)选择合适的版本,安装教程请查阅[飞桨-开始使用](https://www.paddlepaddle.org.cn/install/quick)

T
tangwei 已提交
112

C
chengmo 已提交
113
<h2 align="center">一键启动</h2>
C
chengmo 已提交
114

C
chengmo 已提交
115
我们以排序模型中的`dnn`模型为例介绍PaddleRec的一键启动。训练数据来源为[Criteo数据集](https://www.kaggle.com/c/criteo-display-ad-challenge/),我们从中截取了100条数据:
T
tangwei 已提交
116 117 118

```bash
# 使用CPU进行单机训练
C
chengmo 已提交
119
python -m paddlerec.run -m paddlerec.models.rank.dnn  
T
tangwei 已提交
120 121 122
```


C
chengmo 已提交
123
<h2 align="center">帮助文档</h2>
T
tangwei 已提交
124

C
chengmo 已提交
125
### 项目背景
T
tangwei 已提交
126 127 128
* [推荐系统介绍](doc/rec_background.md)
* [分布式深度学习介绍](doc/ps_background.md)

C
chengmo 已提交
129
### 快速开始
C
Chengmo 已提交
130
* [十分钟上手PaddleRec](https://aistudio.baidu.com/aistudio/projectdetail/559336)
T
tangwei 已提交
131

C
chengmo 已提交
132 133 134 135 136 137
### 入门教程
* [数据准备](doc/slot_reader.md)
* [模型调参](doc/model.md)
* [启动训练](doc/train.md)
* [启动预测](doc/predict.md)
* [快速部署](doc/serving.md)
T
tangwei 已提交
138

C
chengmo 已提交
139 140 141

### 进阶教程
* [自定义Reader](doc/custom_reader.md)
C
Chengmo 已提交
142 143
* [自定义模型](doc/model_develop.md)
* [自定义流程](doc/trainer_develop.md)
C
chengmo 已提交
144
* [yaml配置说明](doc/yaml.md)
C
chengmo 已提交
145 146
* [PaddleRec设计文档](doc/design.md)

C
chengmo 已提交
147
### Benchmark
C
chengmo 已提交
148 149
* [Benchmark](doc/benchmark.md)

T
tangwei 已提交
150 151 152 153 154 155
### FAQ
* [常见问题FAQ](doc/faq.md)


<h2 align="center">社区</h2>

C
fix  
chengmo 已提交
156 157 158 159 160 161 162
<p align="center">
    <br>
    <img alt="Release" src="https://img.shields.io/badge/Release-0.1.0-yellowgreen">
    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/Serving">
    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
    <br>
<p>
T
tangwei 已提交
163 164 165 166 167 168

### 版本历史
- 2020.5.14 - PaddleRec v0.1
  
### 许可证书
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
C
chengmo 已提交
169 170 171 172 173 174 175 176

### 联系我们

如有意见、建议及使用中的BUG,欢迎在[GitHub Issue](https://github.com/PaddlePaddle/PaddleRec/issues)提交

亦可通过以下方式与我们沟通交流:

<p align="center"><img width="200" height="200"  src="doc/imgs/wechet.png"/>&#8194;&#8194;&#8194;&#8194;&#8194;<img width="200" height="200" margin="500" src="./doc/imgs/QQ_group.png"/>&#8194;&#8194;&#8194;&#8194;&#8194<img width="200" height="200"  src="doc/imgs/weixin_supporter.png"/></p>
C
fix  
chengmo 已提交
177
<p align="center">  &#8194;&#8194;&#8194;&#8194;&#8194;&#8194;微信公众号&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;PaddleRec交流QQ群&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;PaddleRec微信小助手</p>