提交 2d707c56 编写于 作者: T TeslaZhao

update doc

上级 bbe8fbaf
......@@ -28,7 +28,7 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
- Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection]()
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md)
- Support multiple [protocols]() such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
......@@ -54,46 +54,46 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
This chapter guides you through the installation and deployment steps. It is strongly recommended to use Docker to deploy Paddle Serving. If you do not use docker, ignore the docker-related steps. Paddle Serving can be deployed on cloud servers using Kubernetes, running on many commonly hardwares such as ARM CPU, Intel CPU, Nvidia GPU, Kunlun XPU. The latest development kit of the develop branch is compiled and generated every day for developers to use.
- [Install Paddle Serving using docker](doc/Install.md)
- [Build Paddle Serving from Source with Docker](doc/COMPILE.md)
- [Deploy Paddle Serving on Kubernetes](doc/PADDLE_SERVING_ON_KUBERNETES.md)
- [Deploy Paddle Serving with Security gateway](doc/SERVING_AUTH_DOCKER.md)
- [Deploy Paddle Serving on more hardwares](doc/BAIDU_KUNLUN_XPU_SERVING.md)
- [Latest Wheel packages](doc/LATEST_PACKAGES.md)(Update everyday on branch develop)
- [Install Paddle Serving using docker](doc/Install_EN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_EN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker.md)
- [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)
> Use
The first step is to call the model save interface to generate a model parameter configuration file (.prototxt), which will be used on the client and server. The second step, read the configuration and startup parameters and start the service. According to API documents and your case, the third step is to write client requests based on the SDK, and test the inference service.
- [Quick Start](doc/QuickStart.md)
- [Save a servable model](doc/SAVE.md)
- [Description of configuration and startup parameters](doc/SERVING_CONFIGURE.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/HTTP_SERVICE_CN.md)
- [Infer on quantizative models](doc/LOW_PRECISION_DEPLOYMENT_CN.md)
- [Data format of classic models](doc/PROCESS_DATA.md)
- [C++ Serving](doc/C++Serving/Introduction_EN.md)
- [Hot loading models](doc/C++Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++Serving/ABTest_EN.md)
- [Encryption](doc/C++Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++Serving/Benchmark_CN.md)
- [Python Pipeline](doc/python_server/PIPELINE_SERVING_CN.md)
- [Analyze and optimize performance](doc/python_server/PIPELINE_SERVING_CN.md)
- [Benchmark(Chinese)](doc/python_server/BENCHMARKING_GPU.md)
- [Client SDK]()
- [Python SDK](doc/PYTHON_SDK_CN.md)
- [JAVA SDK](doc/JAVA_SDK.md)
- [C++ SDK](doc/C++_SDK_CN.md)
- [Large-scale sparse parameter server](doc/CUBE_LOCAL_EN.md)
- [Quick Start](doc/Quick_Start_EN.md)
- [Save a servable model](doc/Save_EN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Http_Service_EN.md)
- [Infer on quantizative models](doc/Low_Precision_CN.md)
- [Data format of classic models](doc/Process_Data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_EN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
- Client SDK
- [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
- [JAVA SDK](doc/Java_SDK_EN.md)
- [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
<br>
> Developers
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/OP_EN.md)
- [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ.md)
- [FAQ(Chinese)](doc/FAQ_CN.md)
<h2 align="center">Model Zoo</h2>
......@@ -108,7 +108,7 @@ Paddle Serving works closely with the Paddle model suite, and implements a large
</center>
For more model examples, read [Model zoo](doc/Model_Zoo.md)
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)
<center class="half">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="280"/> <img src="https://github.com/PaddlePaddle/PaddleDetection/raw/release/2.3/docs/images/road554.png" width="160"/>
......@@ -132,7 +132,7 @@ If you want to communicate with developers and other users? Welcome to join us,
> Contribution
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/CONTRIBUTE.md)
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md)
- Special Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
- Special Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark
......
......@@ -27,13 +27,13 @@
Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:
- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python2个框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型]()
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
- 支持HTTP、gRPC、bRPC等多种[协议](链接protocol文档);提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理
- 提供一套模型安全部署解决方案,包括加密模型部署、鉴权校验、HTTPs安全网关,并在实际项目中应用
- 支持云端部署,提供百度云智能云kubernetes集群部署Paddle Serving案例
- 提供丰富的经典预模型部署示例,如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件,共计40个预训练精品模型,更多模型持续扩展
- 提供丰富的经典预模型部署示例,如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件,共计40+个预训练精品模型,更多模型持续扩展
- 支持大规模稀疏参数索引模型分布式部署,具有多表、多分片、多副本、本地高频cache等特性、可单机或云端部署
......@@ -54,41 +54,42 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/COMPILE_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/PADDLE_SERVING_ON_KUBERNETES.md)
- [部署Paddle Serving安全网关](doc/SERVING_AUTH_DOCKER.md)
- [在异构硬件部署Paddle Serving](doc/BAIDU_KUNLUN_XPU_SERVING_CN.md)
- [最新Wheel开发包](doc/LATEST_PACKAGES.md)(develop分支每日更新)
- [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker.md)
- [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)
> 使用
安装Paddle Serving后,使用快速开始将引导您运行Serving。第一步,调用模型保存接口,生成模型参数配置文件(.prototxt)用以在客户端和服务端使用;第二步,阅读配置和启动参数并启动服务;第三步,根据API和您的使用场景,基于SDK编写客户端请求,并测试推理服务。您想了解跟多特性的使用场景和方法,请详细阅读以下文档。
- [快速开始](doc/QuickStart_CN.md)
- [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/SAVE_CN.md)
- [配置和启动参数的说明](doc/SERVING_CONFIGURE.md)
- [RESTful/gRPC/bRPC API指南](doc/HTTP_SERVICE_CN.md)
- [低精度推理](doc/LOW_PRECISION_DEPLOYMENT_CN.md)
- [常见模型数据处理](doc/PROCESS_DATA.md)
- [C++ Serving简介](doc/C++Serving/Introduction_CN.md)
- [模型热加载](doc/C++Serving/Hot_Loading_CN.md)
- [A/B Test](doc/C++Serving/ABTest_CN.md)
- [加密模型推理服务](doc/C++Serving/Encryption_CN.md)
- [性能优化指南](doc/C++Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/python_server/PIPELINE_SERVING_CN.md)
- [性能优化指南](doc/python_server/PIPELINE_SERVING_CN.md)
- [客户端SDK]()
- [Python SDK](doc/PYTHON_SDK_CN.md)
- [JAVA SDK](doc/JAVA_SDK_CN.md)
- [C++ SDK](doc/C++_SDK_CN.md)
- [大规模稀疏参数索引服务](doc/CUBE_LOCAL_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Http_Service_CN.md)
- [低精度推理](doc/Low_Precision_CN.md)
- [常见模型数据处理](doc/Process_data_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md)
- [模型热加载](doc/C++_Serving/Hot_Loading_CN.md)
- [A/B Test](doc/C++_Serving/ABTest_CN.md)
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
- 客户端SDK
- [Python SDK](doc/C++_Serving/Http_Service_CN.md)
- [JAVA SDK](doc/Java_SDK_CN.md)
- [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)
> 开发者
为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/OP_CN.md)
- [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ.md)
- [常见问答](doc/FAQ_CN.md)
<h2 align="center">模型库</h2>
......@@ -126,7 +127,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
> 贡献代码
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/CONTRIBUTE.md)
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute.md)
- 特别感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
- 特别感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本
......
......@@ -75,7 +75,7 @@ service ImageClassifyService {
#### 2.2.2 示例配置
关于Serving端的配置的详细信息,可以参考[Serving端配置](../SERVING_CONFIGURE_CN.md)
关于Serving端的配置的详细信息,可以参考[Serving端配置](../Serving_Configure_CN.md)
以下配置文件将ReaderOP, ClassifyOP和WriteJsonOP串联成一个workflow (关于OP/workflow等概念,可参考[OP介绍](OP_CN.md)[DAG介绍](DAG_CN.md))
......
......@@ -32,7 +32,7 @@ Server端的核心是一个由项目代码编译产生的名称为serving的二
为了方便用户快速的启动C++ Serving的Server端,除了用户自行修改配置文件并通过命令行传参运行serving二进制可执行文件以外,我们也提供了另外一种通过python脚本启动的方式。python脚本启动本质上仍是运行serving二进制可执行文件,但python脚本中会自动完成两件事:1、配置文件的生成;2、根据需要配置的参数,生成命令行,通过命令行的方式,传入参数信息并运行serving二进制可执行文件。
更多详细说明和示例,请参考[C++ Serving 参数配置和启动的详细说明](../SERVING_CONFIGURE_CN.md)
更多详细说明和示例,请参考[C++ Serving 参数配置和启动的详细说明](../Serving_Configure_CN.md)
### 3.2 同步/异步模式
同步模式比较简单直接,适用于模型预测时间短,单个Request请求的batch已经比较大的情况。
......
# 如何编译PaddleServing
(简体中文|[English](./COMPILE.md))
(简体中文|[English](./Compile_EN.md))
## 编译环境设置
......
# How to compile PaddleServing
([简体中文](./COMPILE_CN.md)|English)
([简体中文](./Compile_CN.md)|English)
## Compilation environment requirements
......@@ -23,7 +23,7 @@
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](Docker_Images_EN.md).
## Get Code
......@@ -159,8 +159,7 @@ cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR/ \
-DSERVER=ON ..
make -j10
```
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Notes).
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](Compile_EN.md#Notes).
## Compile Client
......
......@@ -68,7 +68,7 @@ Paddle Serving uses this [Git branching model](http://nvie.com/posts/a-successfu
1. Build and test
Users can build Paddle Serving natively on Linux, see the [BUILD steps](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md).
Users can build Paddle Serving natively on Linux, see the [BUILD steps](Compile_EN.md).
1. Keep pulling
......
# 稀疏参数索引服务Cube单机版使用指南
(简体中文|[English](./CUBE_LOCAL.md))
(简体中文|[English](./Cube_Local_EN.md))
## 引言
在python/examples下有两个关于CTR的示例,他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型,包括稀疏参数。后者是将稀疏参数裁剪出来,保存成两个部分,一个是稀疏参数,另一个是稠密参数。由于在工业级的场景中,稀疏参数的规模非常大,达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的,因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube,提供分布式的稀疏参数服务。
<!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](CUBE_LOCAL_CN.md)(正在建设中)。-->
<!--单机版Cube是分布式Cube的弱化版本,旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求,请在读完此文档之后,继续阅读 [稀疏参数索引服务Cube使用指南](Cube_Local_CN.md)(正在建设中)。-->
本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md)
本文档使用的都是未经过任何压缩算法处理的原始模型,如果有量化模型上线需求,请阅读[Cube稀疏参数索引量化存储使用指南](./Cube_Quant_CN.md)
## 示例
......
# Cube: Sparse Parameter Indexing Service (Local Mode)
([简体中文](./CUBE_LOCAL_CN.md)|English)
([简体中文](./Cube_Local_CN.md)|English)
## Overview
There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services.
The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos.
<!--If there is a demand for distributed sparse parameter service, please continue reading [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md) after reading this document (still developing).-->
<!--If there is a demand for distributed sparse parameter service, please continue reading [Quantization Storage on Cube Sparse Parameter Indexing](./Cube_Quant_EN.md) after reading this document (still developing).-->
This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md)
This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./Cube_Quant_EN.md)
## Example
in directory python/example/criteo_ctr_with_cube, run
......
# Cube稀疏参数索引量化存储使用指南
(简体中文|[English](./CUBE_QUANT.md))
(简体中文|[English](./Cube_Quant_EN.md))
## 总体概览
......@@ -9,7 +9,7 @@
## 前序要求
请先读取 [稀疏参数索引服务Cube单机版使用指南](./CUBE_LOCAL_CN.md)
请先读取 [稀疏参数索引服务Cube单机版使用指南](./Cube_Local_CN.md)
## 组件介绍
......
# Quantization Storage on Cube Sparse Parameter Indexing
([简体中文](./CUBE_QUANT_CN.md)|English)
([简体中文](./Cube_Quant_CN.md)|English)
## Overview
......@@ -8,7 +8,7 @@ In our previous article, we know that the sparse parameter is a series of floati
## Precondition
Please Read [Cube: Sparse Parameter Indexing Service (Local Mode)](./CUBE_LOCAL_CN.md)
Please Read [Cube: Sparse Parameter Indexing Service (Local Mode)](./Cube_Local_EN.md)
## Components
......
......@@ -13,7 +13,7 @@
### 预备知识
- 需要会编译Paddle Serving,参见[编译文档](./COMPILE.md)
- 需要会编译Paddle Serving,参见[编译文档](./Compile_EN.md)
### 用法
......
# Docker 镜像
(简体中文|[English](DOCKER_IMAGES.md))
(简体中文|[English](Docker_Images_EN.md))
该文档维护了 Paddle Serving 提供的镜像列表。
......
# Docker Images
([简体中文](DOCKER_IMAGES_CN.md)|English)
([简体中文](Docker_Images_CN.md)|English)
This document maintains a list of docker images provided by Paddle Serving.
......
......@@ -142,7 +142,7 @@ make: *** [all] Error 2
#### Q:使用过程中出现CXXABI错误。
这个问题出现的原因是Python使用的gcc版本和Serving所需的gcc版本对不上。对于Docker用户,推荐使用[Docker容器](./RUN_IN_DOCKER_CN.md),由于Docker容器内的Python版本与Serving在发布前都做过适配,这样就不会出现类似的错误。如果是其他开发环境,首先需要确保开发环境中具备GCC 8.2,如果没有gcc 8.2,参考安装方式
这个问题出现的原因是Python使用的gcc版本和Serving所需的gcc版本对不上。对于Docker用户,推荐使用[Docker容器](./Run_In_Docker_CN.md),由于Docker容器内的Python版本与Serving在发布前都做过适配,这样就不会出现类似的错误。如果是其他开发环境,首先需要确保开发环境中具备GCC 8.2,如果没有gcc 8.2,参考安装方式
```bash
wget -q https://paddle-ci.gz.bcebos.com/gcc-8.2.0.tar.xz
......@@ -198,7 +198,7 @@ wget https://paddle-serving.bj.bcebos.com/others/centos_ssl.tar && \
(1)Cuda显卡驱动:文件名通常为 `libcuda.so.$DRIVER_VERSION` 例如驱动版本为440.10.15,文件名就是`libcuda.so.440.10.15`
(2)Cuda和Cudnn动态库:文件名通常为 `libcudart.so.$CUDA_VERSION`,和 `libcudnn.so.$CUDNN_VERSION`。例如Cuda9就是 `libcudart.so.9.0`,Cudnn7就是 `libcudnn.so.7`。Cuda和Cudnn与Serving的版本匹配参见[Serving所有镜像列表](DOCKER_IMAGES_CN.md#%E9%99%84%E5%BD%95%E6%89%80%E6%9C%89%E9%95%9C%E5%83%8F%E5%88%97%E8%A1%A8).
(2)Cuda和Cudnn动态库:文件名通常为 `libcudart.so.$CUDA_VERSION`,和 `libcudnn.so.$CUDNN_VERSION`。例如Cuda9就是 `libcudart.so.9.0`,Cudnn7就是 `libcudnn.so.7`。Cuda和Cudnn与Serving的版本匹配参见[Serving所有镜像列表](Docker_Images_CN.md#%E9%99%84%E5%BD%95%E6%89%80%E6%9C%89%E9%95%9C%E5%83%8F%E5%88%97%E8%A1%A8).
(3) Cuda10.1及更高版本需要TensorRT。安装TensorRT相关文件的脚本参考 [install_trt.sh](../tools/dockerfiles/build_scripts/install_trt.sh).
......@@ -232,15 +232,15 @@ InvalidArgumentError: Device id must be less than GPU count, but received id is:
#### Q: 目前Paddle Serving支持哪些镜像环境?
**A:** 目前(0.4.0)仅支持CentOS,具体列表查阅[这里](https://github.com/PaddlePaddle/Serving/blob/develop/doc/DOCKER_IMAGES.md)
**A:** 目前(0.4.0)仅支持CentOS,具体列表查阅[这里](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Docker_Images_CN.md)
#### Q: python编译的GCC版本与serving的版本不匹配
**A:**:1)使用[GPU docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md#gpunvidia-docker)解决环境问题;2)修改anaconda的虚拟环境下安装的python的gcc版本[改变python的GCC编译环境](https://www.jianshu.com/p/c498b3d86f77)
**A:**:1)使用[GPU docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Run_In_Docker_CN.md#gpunvidia-docker)解决环境问题;2)修改anaconda的虚拟环境下安装的python的gcc版本[改变python的GCC编译环境](https://www.jianshu.com/p/c498b3d86f77)
#### Q: paddle-serving是否支持本地离线安装
**A:** 支持离线部署,需要把一些相关的[依赖包](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md)提前准备安装好
**A:** 支持离线部署,需要把一些相关的[依赖包](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Compile_CN.md)提前准备安装好
#### Q: Docker中启动server IP地址 127.0.0.1 与 0.0.0.0 差异
**A:** 您必须将容器的主进程设置为绑定到特殊的 0.0.0.0 “所有接口”地址,否则它将无法从容器外部访问。在Docker中 127.0.0.1 代表“这个容器”,而不是“这台机器”。如果您从容器建立到 127.0.0.1 的出站连接,它将返回到同一个容器;如果您将服务器绑定到 127.0.0.1,接收不到来自外部的连接。
......@@ -280,7 +280,7 @@ client.connect(["127.0.0.1:9393"])
#### Q: 如何使用多语言客户端
**A:** 多语言客户端要与多语言服务端配套使用。当前版本下(0.4.0),服务端需要将Server改为MultiLangServer(如果是以命令行启动的话只需要添加--use_multilang参数),Python客户端需要将Client改为MultiLangClient,同时去除load_client_config的过程。[Java客户端参考文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/JAVA_SDK_CN.md)
**A:** 多语言客户端要与多语言服务端配套使用。当前版本下(0.4.0),服务端需要将Server改为MultiLangServer(如果是以命令行启动的话只需要添加--use_multilang参数),Python客户端需要将Client改为MultiLangClient,同时去除load_client_config的过程。[Java客户端参考文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Java_SDK_CN.md)
#### Q: 如何在Windows下使用Paddle Serving
......
# 如何在Paddle Serving使用Go Client
(简体中文|[English](./IMDB_GO_CLIENT.md))
(简体中文|[English](./Imdb_GO_Client_EN.md))
本文档说明了如何将Go用作客户端语言。对于Paddle Serving中的Go客户端,提供了一个简单的客户端程序包https://github.com/PaddlePaddle/Serving/tree/develop/go/serving_client, 用户可以根据需要引用该程序包。这是一个基于IMDB数据集的情感分析任务的简单示例。
......
# How to use Go Client of Paddle Serving
([简体中文](./IMDB_GO_CLIENT_CN.md)|English)
([简体中文](./Imdb_GO_Client_CN.md)|English)
This document shows how to use Go as your client language. For Go client in Paddle Serving, a simple client package is provided https://github.com/PaddlePaddle/Serving/tree/develop/go/serving_client, a user can import this package as needed. Here is a simple example of sentiment analysis task based on IMDB dataset.
......
# 使用Docker安装Paddle Serving
(简体中文|[English](./Install_CN.md))
(简体中文|[English](./Install_EN.md))
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](DOCKER_IMAGES_CN.md)
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](Run_In_Docker_CN.md)。更多镜像请查看[Docker镜像列表](Docker_Images_CN.md)
**提示**:目前paddlepaddle 2.1版本的默认GPU环境是Cuda 10.2,因此GPU Docker的示例代码以Cuda 10.2为准。镜像和pip安装包也提供了其余GPU环境,用户如果使用其他环境,需要仔细甄别并选择合适的版本。
......@@ -41,7 +41,7 @@ pip3 install paddle-serving-server-gpu==0.6.2.post11 # GPU with CUDA10.1 + Tenso
您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](COMPILE_CN.md)
如果需要使用develop分支编译的安装包,请从[最新安装包列表](Latest_Packages_CN.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](Compile_CN.md)
paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
......@@ -65,8 +65,8 @@ pip3 install paddlepaddle-gpu==2.1.0
```
pip3 install https://paddle-wheel.bj.bcebos.com/with-trt/2.1.0-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-2.1.0.post101-cp36-cp36m-linux_x86_64.whl
```
由于默认的`paddlepaddle-gpu==2.1.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。更多信息请参考[如何使用TensorRT?](TENSOR_RT_CN.md)
由于默认的`paddlepaddle-gpu==2.1.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本
如果是其他环境和Python版本,请在表格中找到对应的链接并用pip安装。
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md)
......@@ -2,8 +2,7 @@
([简体中文](Install_CN.md)|English)
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](Run_In_Docker_EN.md). See the [document](Docker_Images_EN.md) for more docker images.
**Attention:**: Currently, the default GPU environment of paddlepaddle 2.1 is Cuda 10.2, so the sample code of GPU Docker is based on Cuda 10.2. We also provides docker images and whl packages for other GPU environments. If users use other environments, they need to carefully check and select the appropriate version.
......@@ -41,7 +40,7 @@ pip install paddle-serving-server-gpu==0.6.0.post11 # GPU with CUDA10.1 + Tensor
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](./doc/COMPILE.md)
If you need install modules compiled with develop branch, please download packages from [latest packages list](Latest_Package_CN.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](Compile_EN.md)
Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
......@@ -69,11 +68,11 @@ The url corresponding to `cuda10.1-cudnn7-mkl-gcc8.2-avx-trt6.0.1.5`, copy it an
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.1.0-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-2.1.0.post101-cp36-cp36m-linux_x86_64.whl
```
the default `paddlepaddle-gpu==2.1.0` is Cuda 10.2 with no TensorRT. If you want to install PaddlePaddle with TensorRT. please also check the documentation-multi-version whl package list and find key word `cuda10.2-cudnn8.0-trt7.1.3`. More info please check [Paddle Serving uses TensorRT](./doc/TENSOR_RT.md)
the default `paddlepaddle-gpu==2.1.0` is Cuda 10.2 with no TensorRT. If you want to install PaddlePaddle with TensorRT. please also check the documentation-multi-version whl package list and find key word `cuda10.2-cudnn8.0-trt7.1.3`.
If it is other environment and Python version, please find the corresponding link in the table and install it with pip.
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
For **Windows Users**, please read the document [Paddle Serving for Windows Users](Windows_Tutorial_EN.md)
<h2 align="center">Quick Start Example</h2>
......
# Paddle Serving Client Java SDK
(简体中文|[English](JAVA_SDK.md))
(简体中文|[English](Java_SDK_EN.md))
Paddle Serving 提供了 Java SDK,支持 Client 端用 Java 语言进行预测,本文档说明了如何使用 Java SDK。
......
# Paddle Serving Client Java SDK
([简体中文](JAVA_SDK_CN.md)|English)
([简体中文](Java_SDK_CN.md)|English)
Paddle Serving provides Java SDK,which supports predict on the Client side with Java language. This document shows how to use the Java SDK.
......
# Lod字段说明
(简体中文|[English](LOD.md))
(简体中文|[English](LOD_EN.md))
## 概念
......
# Paddle Serving低精度部署
(简体中文|[English](./LOW_PRECISION_DEPLOYMENT.md))
## Paddle Serving低精度部署
(简体中文|[English](./Low_Precision_EN.md))
低精度部署, 在Intel CPU上支持int8、bfloat16模型,Nvidia TensorRT支持int8、float16模型。
## 通过PaddleSlim量化生成低精度模型
### 通过PaddleSlim量化生成低精度模型
详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)
## 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
### 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
首先下载Resnet50 [PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz),并转换为Paddle Serving支持的部署模型格式。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
......@@ -40,7 +41,7 @@ fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
```
## 参考文档
### 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
# Low-Precision Deployment for Paddle Serving
(English|[简体中文](./LOW_PRECISION_DEPLOYMENT_CN.md))
## Low-Precision Deployment for Paddle Serving
(English|[简体中文](./Low_Precision_CN.md))
Intel CPU supports int8 and bfloat16 models, NVIDIA TensorRT supports int8 and float16 models.
## Obtain the quantized model through PaddleSlim tool
### Obtain the quantized model through PaddleSlim tool
Train the low-precision models please refer to [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html).
## Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
### Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
Firstly, download the [Resnet50 int8 model](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz) and convert to Paddle Serving's saved model。
```
......@@ -41,7 +42,7 @@ fetch_map = client.predict(feed={"image": img}, fetch=["save_infer_model/scale_0
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
## Reference
### Reference
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* [Deploy the quantized model Using Paddle Inference on Intel CPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
# Model Zoo
([English](./Model_Zoo.md)|简体中文)
([English](./Model_Zoo_EN.md)|简体中文)
本页面展示了Paddle Serving目前支持的预训练模型以及下载链接
若您想为Paddle Serving提供新的模型,可通过[pull request](https://github.com/PaddlePaddle/Serving/pulls)提交PR
......
# Pipeline Serving
(简体中文|[English](PIPELINE_SERVING.md))
- [架构设计](PIPELINE_SERVING_CN.md#1架构设计)
- [详细设计](PIPELINE_SERVING_CN.md#2详细设计)
- [典型示例](PIPELINE_SERVING_CN.md#3典型示例)
- [高阶用法](PIPELINE_SERVING_CN.md#4高阶用法)
- [日志追踪](PIPELINE_SERVING_CN.md#5日志追踪)
- [性能分析与优化](PIPELINE_SERVING_CN.md#6性能分析与优化)
(简体中文|[English](Pipeline_Design_EN.md))
- [架构设计](Pipeline_Design_CN.md#1架构设计)
- [详细设计](Pipeline_Design_CN.md#2详细设计)
- [典型示例](Pipeline_Design_CN.md#3典型示例)
- [高阶用法](Pipeline_Design_CN.md#4高阶用法)
- [日志追踪](Pipeline_Design_CN.md#5日志追踪)
- [性能分析与优化](Pipeline_Design_CN.md#6性能分析与优化)
在许多深度学习框架中,Serving通常用于单模型的一键部署。在AI工业大生产的背景下,端到端的深度学习模型当前还不能解决所有问题,多个深度学习模型配合起来使用还是解决现实问题的常规手段。但多模型应用设计复杂,为了降低开发和维护难度,同时保证服务的可用性,通常会采用串行或简单的并行方式,但一般这种情况下吞吐量仅达到可用状态,而且GPU利用率偏低。
......
# Pipeline Serving
([简体中文](PIPELINE_SERVING_CN.md)|English)
- [Architecture Design](PIPELINE_SERVING.md#1architecture-design)
- [Detailed Design](PIPELINE_SERVING.md#2detailed-design)
- [Classic Examples](PIPELINE_SERVING.md#3classic-examples)
- [Advanced Usages](PIPELINE_SERVING.md#4advanced-usages)
- [Log Tracing](PIPELINE_SERVING.md#5log-tracing)
- [Performance Analysis And Optimization](PIPELINE_SERVING.md#6performance-analysis-and-optimization)
([简体中文](Pipeline_Design_CN.md)|English)
- [Architecture Design](Pipeline_Design_EN.md#1architecture-design)
- [Detailed Design](Pipeline_Design_EN.md#2detailed-design)
- [Classic Examples](Pipeline_Design_EN.md#3classic-examples)
- [Advanced Usages](Pipeline_Design_EN.md#4advanced-usages)
- [Log Tracing](Pipeline_Design_EN.md#5log-tracing)
- [Performance Analysis And Optimization](Pipeline_Design_EN.md#6performance-analysis-and-optimization)
In many deep learning frameworks, Serving is usually used for the deployment of single model.but in the context of AI industrial, the end-to-end deep learning model can not solve all the problems at present. Usually, it is necessary to use multiple deep learning models to solve practical problems.However, the design of multi-model applications is complicated. In order to reduce the difficulty of development and maintenance, and to ensure the availability of services, serial or simple parallel methods are usually used. In general, the throughput only reaches the usable state and the GPU utilization rate is low.
......
## Paddle Serving 快速开始示例
([English](./QuickStart.md)|简体中文)
([English](./Quick_Start_EN.md)|简体中文)
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考前文的AiStudio教程。
......
## Paddle Serving Quick Start Examples
(English|[简体中文](./QuickStart_CN.md))
(English|[简体中文](./Quick_Start_CN.md))
This quick start example is mainly for those users who already have a model to deploy, and we also provide a model that can be used for deployment. in case if you want to know how to complete the process from offline training to online service, please refer to the AiStudio tutorial above.
......
# 如何在Docker中运行PaddleServing
(简体中文|[English](RUN_IN_DOCKER.md))
(简体中文|[English](Run_In_Docker_EN.md))
Docker最大的好处之一就是可移植性,可在多种操作系统和主流的云计算平台部署。使用Paddle Serving Docker镜像可在Linux、Mac和Windows平台部署。
......@@ -14,7 +14,7 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
### 获取镜像
参考[该文档](DOCKER_IMAGES_CN.md)获取镜像:
参考[该文档](Docker_Images_CN.md)获取镜像:
以CPU编译镜像为例
......@@ -59,9 +59,9 @@ docker exec -it test bash
### 安装PaddleServing
请参照首页的指导,下载对应版本的pip包。[最新安装包合集](LATEST_PACKAGES.md)
请参照首页的指导,下载对应版本的pip包。[最新安装包合集](Latest_Packages_CN.md)
## 注意事项
- 运行时镜像不能用于开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](COMPILE.md)
- 运行时镜像不能用于开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](Compile_CN.md)
- 由于Cuda10和Cuda9的环境受限于GCC版本,无法同时运行CPU版本的`paddle_serving_server`,因此如果想要在GPU环境中同时使用CPU版本的`paddle_serving_server`,请选择Cuda10.1,Cuda10.2和Cuda11版本的镜像。
# How to run PaddleServing in Docker
([简体中文](RUN_IN_DOCKER_CN.md)|English)
([简体中文](Run_In_Docker_CN.md)|English)
One of the biggest benefits of Docker is portability, which can be deployed on multiple operating systems and mainstream cloud computing platforms. The Paddle Serving Docker image can be deployed on Linux, Mac and Windows platforms.
......@@ -14,7 +14,7 @@ This document takes Python2 as an example to show how to run Paddle Serving in d
### Get docker image
Refer to [this document](DOCKER_IMAGES.md) for a docker image:
Refer to [this document](Docker_Images_EN.md) for a docker image:
```shell
docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
......@@ -41,7 +41,7 @@ The GPU version is basically the same as the CPU version, with only some differe
### Get docker image
Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
Refer to [this document](Docker_Images_EN.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
```shell
docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
......@@ -67,9 +67,9 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of
The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./LATEST_PACKAGES.md)
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./Latest_Packages_CN.md)
## Precautious
- Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
- Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](Compile_EN.md).
- If you use Cuda9 and Cuda10 docker images, you cannot use `paddle_serving_server` CPU version at the same time, due to the limitation of gcc version. If you want to use both in one docker image, please choose images of Cuda10.1, Cuda10.2 and Cuda11.
# Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
## Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./Run_On_XPU_EN.md))
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64), 或者百度昆仑芯片和Intel CPU服务器,上进行部署,后续完善对其他异构硬件服务器部署能力。
# 编译、安装
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。下面以飞腾FT-2000+/64机器为例进行介绍。
## 编译
## 编译、安装
基本环境配置可参考[该文档](Compile_CN.md)进行配置。下面以飞腾FT-2000+/64机器为例进行介绍。
### 编译
* 编译server部分
```
cd Serving
......@@ -50,23 +50,23 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
make -j10
```
## 安装wheel包
### 安装wheel包
以上编译步骤完成后,会在各自编译目录$build_dir/python/dist生成whl包,分别安装即可。例如server步骤,会在server-build-arm/python/dist目录下生成whl包, 使用命令```pip install -u xxx.whl```进行安装。
# 请求参数说明
## 请求参数说明
为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。
| 参数 | 参数说明 | 备注 |
| :------- | :-------------------------- | :--------------------------------------------------------------- |
| use_lite | 使用Paddle-Lite Engine | 使用Paddle-Lite cpu预测能力 |
| use_xpu | 使用Baidu Kunlun进行预测 | 该选项需要与use_lite配合使用 |
| ir_optim | 开启Paddle-Lite计算子图优化 | 详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) |
# 部署使用示例
## 下载模型
## 部署使用示例
### 下载模型
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## 启动rpc服务
### 启动rpc服务
主要有三种启动配置:
* 使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用cpu部署,使用Paddle-Lite优化加速能力;
......@@ -86,7 +86,7 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --po
```
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
```
## client调用
### client调用
```
from paddle_serving_client import Client
import numpy as np
......@@ -98,9 +98,9 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
# 其他说明
## 其他说明
## 模型实例及说明
### 模型实例及说明
以下提供部分样例,其他模型可参照进行修改。
| 示例名称 | 示例链接 |
| :--------- | :---------------------------------------------------------- |
......@@ -108,6 +108,7 @@ print(fetch_map)
| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
注:支持昆仑芯片部署模型列表见[链接](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html)。不同模型适配上存在差异,可能存在不支持的情况,部署使用存在问题时,欢迎以[Github issue](https://github.com/PaddlePaddle/Serving/issues),我们会实时跟进。
## 昆仑芯片支持相关参考资料
### 昆仑芯片支持相关参考资料
* [昆仑XPU芯片运行飞桨](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [PaddleLite使用百度XPU预测部署](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
# Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
## Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./Run_On_XPU_CN.md))
Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips
(such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve
the deployment capability on various heterogeneous hardware servers in the future.
# Compilation and installation
Refer to [compile](COMPILE.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.
## Compilatiton
## Compilation and installation
Refer to [compile](Compile.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.
### Compilatiton
* Compile the Serving Server
```
cd Serving
......@@ -52,11 +53,11 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
make -j10
```
## Install the wheel package
### Install the wheel package
After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package.
# Request parameters description
## Request parameters description
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment.
| param | param description | about |
......@@ -64,13 +65,13 @@ In order to deploy serving
| use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite |
| use_xpu | using Baidu Kunlun for inference | need to be used with the use_lite option |
| ir_optim | open the graph optimization | refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) |
# Deplyment examples
## Download the model
## Deplyment examples
### Download the model
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## Start RPC service
### Start RPC service
There are mainly three deployment methods:
* deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the cpu server standalone with Paddle-Lite;
......@@ -90,7 +91,7 @@ Start the rpc service, deploying on cpu server.
```
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
```
##
###
```
from paddle_serving_client import Client
import numpy as np
......@@ -102,8 +103,8 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
# Others
## Model example and explanation
## Others
### Model example and explanation
Some examples are provided below, and other models can be modifed with reference to these examples.
| sample name | sample links |
......@@ -113,6 +114,6 @@ Some examples are provided below, and other models can be modifed with reference
Note:Supported model lists refer to [doc](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html). There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit [Github issue](https://github.com/PaddlePaddle/Serving/issues), and we will follow up in real time.
## Kunlun chip related reference materials
### Kunlun chip related reference materials
* [PaddlePaddle on Baidu Kunlun xpu chips](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [Deployment on Baidu Kunlun xpu chips using PaddleLite](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
# 怎样保存用于Paddle Serving的模型?
(简体中文|[English](./SAVE.md))
(简体中文|[English](./Save_EN.md))
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
......
# How to save a servable model of Paddle Serving?
([简体中文](./SAVE_CN.md)|English)
([简体中文](./Save_CN.md)|English)
## Export from saved model files
......
......@@ -63,7 +63,7 @@ ee59a3dd4806 registry.baidubce.com/serving_dev/serving-runtime:cpu-py36
### Step 1:启动Serving服务
我们仍然以 [Uci房价预测](../python/examples/fit_a_line)服务作为例子,这里省略了镜像制作的过程,详情可以参考 [在Kubernetes集群上部署Paddle Serving](./PADDLE_SERVING_ON_KUBERNETES.md)
我们仍然以 [Uci房价预测](../python/examples/fit_a_line)服务作为例子,这里省略了镜像制作的过程,详情可以参考 [在Kubernetes集群上部署Paddle Serving](./Run_On_Kubernetes.md)
在这里我们直接执行
```
......
# Serving Configuration
(简体中文|[English](SERVING_CONFIGURE.md))
(简体中文|[English](Serving_Configure_EN.md))
## 简介
......
# Serving Configuration
([简体中文](SERVING_CONFIGURE_CN.md)|English)
([简体中文](Serving_Configure_CN.md)|English)
## Overview
......@@ -12,7 +12,7 @@ This guide focuses on Paddle C++ Serving and Python Pipeline configuration:
## Model Configuration
The model configuration is generated by converting PaddleServing model and named serving_client_conf.prototxt/serving_server_conf.prototxt. It specifies the info of input/output so that users can fill parameters easily. The model configuration file should not be modified. See the [Saving guide](SAVE.md) for model converting. The model configuration file provided must be a `core/configure/proto/general_model_config.proto`.
The model configuration is generated by converting PaddleServing model and named serving_client_conf.prototxt/serving_server_conf.prototxt. It specifies the info of input/output so that users can fill parameters easily. The model configuration file should not be modified. See the [Saving guide](Save_EN.md) for model converting. The model configuration file provided must be a `core/configure/proto/general_model_config.proto`.
Example:
......@@ -39,7 +39,7 @@ fetch_var {
- fetch_var:model output
- name:node name
- alias_name:alias name
- is_lod_tensor:lod tensor, ref to [Lod Introduction](LOD.md)
- is_lod_tensor:lod tensor, ref to [Lod Introduction](LOD_EN.md)
- feed_type/fetch_type:data type
|feed_type|类型|
......
# Paddle Serving设计文档
(简体中文|[English](./DESIGN_DOC.md))
(简体中文|[English](./Serving_Design_EN.md))
## 1. 设计目标
......@@ -55,15 +55,15 @@ Paddle Serving从做顶层设计时考虑到不同团队在工业级场景中会
> 跨平台运行
跨平台是不依赖于操作系统,也不依赖硬件环境。一个操作系统下开发的应用,放到另一个操作系统下依然可以运行。因此,设计上既要考虑开发语言、组件是跨平台的,同时也要考虑不同系统上编译器的解释差异。
Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像,镜像列表参考《[Docker镜像](DOCKER_IMAGES_CN.md)》,根据用户的使用场景选择镜像。为方便用户使用Docker,我们提供了帮助文档《[如何在Docker中运行PaddleServing](RUN_IN_DOCKER_CN.md)》。目前,Python webservice模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](WINDOWS_TUTORIAL_CN.md)
Docker 是一个开源的应用容器引擎,让开发者可以打包他们的应用以及依赖包到一个可移植的容器中,然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像,镜像列表参考《[Docker镜像](Docker_Images_CN.md)》,根据用户的使用场景选择镜像。为方便用户使用Docker,我们提供了帮助文档《[如何在Docker中运行PaddleServing](Run_In_Dokcer_CN.md)》。目前,Python webservice模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md)
> 支持多种开发语言SDK
Paddle Serving提供了4种开发语言SDK,包括Python、C++、Java、Golang。Golang SDK在建设中,有兴趣的开源开发者可以提交PR。
Paddle Serving提供了3种开发语言SDK,包括Python、C++、Java。Golang SDK在建设中,有兴趣的开源开发者可以提交PR。
+ Python,参考python/examples下client示例 或 4.2 web服务示例
+ C++,参考《[从零开始写一个预测服务](CREATING.md)
+ Java,参考《[Paddle Serving Client Java SDK](JAVA_SDK_CN.md)
+ Golang,参考《[如何在Paddle Serving使用Go Client](deprecated/IMDB_GO_CLIENT_CN.md)
+ C++,参考《[从零开始写一个预测服务](C++_Serving/Creat_C++Serving_CN.md)
+ Java,参考《[Paddle Serving Client Java SDK](Java_SDK_CN.md)
> 支持多种硬件设备
......@@ -76,7 +76,7 @@ Paddle Serving提供了4种开发语言SDK,包括Python、C++、Java、Golang
以IMDB评论情感分析任务为例通过9步展示,Paddle Serving从模型的训练到部署预测服务的全流程《[AIStudio教程-Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/1555945)
由于无法直接查看模型文件中feed和fetch参数信息,不方便用户拼装参数。因此,Paddle Serving开发一个工具将Paddle模型转成Serving的格式,生成包含feed和fetch参数信息的prototxt文件。下图是uci_housing示例的生成的prototxt文件,更多转换方法参考文档《[怎样保存用于Paddle Serving的模型](SAVE_CN.md)》。
由于无法直接查看模型文件中feed和fetch参数信息,不方便用户拼装参数。因此,Paddle Serving开发一个工具将Paddle模型转成Serving的格式,生成包含feed和fetch参数信息的prototxt文件。下图是uci_housing示例的生成的prototxt文件,更多转换方法参考文档《[怎样保存用于Paddle Serving的模型](Save_CN.md)》。
```
feed_var {
name: "x"
......@@ -124,15 +124,15 @@ C++ Serving的核心执行引擎是一个有向无环图,图中的每个节点
### 3.3 模型管理与热加载
Paddle Serving的C++引擎支持模型管理功能,支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性,需要在服务不中断的情况下对模型进行热加载。Paddle Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考《[Paddle Serving中的模型热加载](HOT_LOADING_IN_SERVING_CN.md)》。
Paddle Serving的C++引擎支持模型管理功能,支持多种模型和模型不同版本的管理。为了保证在模型更换期间推理服务的可用性,需要在服务不中断的情况下对模型进行热加载。Paddle Serving对该特性进行了支持,并提供了一个监控产出模型更新本地模型的工具,具体例子请参考《[Paddle Serving中的模型热加载](C++_Serving/Hot_Loading_CN.md)》。
### 3.4 模型加解密
Paddle Serving采用对称加密算法对模型进行加密,在服务加载模型过程中在内存中解密。目前,提供基础的模型安全能力,并不保证模型绝对安全性,用户可根据我们的设计加以完善,实现更高级别的安全性。说明文档参考《[加密模型预测](ENCRYPTION_CN.md)
Paddle Serving采用对称加密算法对模型进行加密,在服务加载模型过程中在内存中解密。目前,提供基础的模型安全能力,并不保证模型绝对安全性,用户可根据我们的设计加以完善,实现更高级别的安全性。说明文档参考《[加密模型预测](C++_Serving/Encryption_CN.md)
### 3.5 A/B Test
在对模型进行充分的离线评估后,通常需要进行在线A/B测试,来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构,Client端做好相应的配置后,自动将流量分发给不同的Server,从而完成A/B测试。具体例子请参考《[如何使用Paddle Serving做ABTEST](ABTEST_IN_PADDLE_SERVING_CN.md)》。
在对模型进行充分的离线评估后,通常需要进行在线A/B测试,来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构,Client端做好相应的配置后,自动将流量分发给不同的Server,从而完成A/B测试。具体例子请参考《[如何使用Paddle Serving做ABTEST](C++_Serving/ABTEST_CN.md)》。
<p align="center">
<br>
......@@ -193,7 +193,7 @@ Pipeline Serving的网络框架采用gRPC和gPRC gateway。gRPC service接收RPC
</center>
### 5.2 核心设计与使用用例
Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Channel,通过组合实现一套有向无环图,设计与使用文档参考《[Pipeline Serving设计与实现](PIPELINE_SERVING_CN.md)
Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Channel,通过组合实现一套有向无环图,设计与使用文档参考《[Pipeline Serving设计与实现](Python_Pipeline/Pipeline_Design_CN.md)
<center>
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
</center>
......@@ -201,11 +201,8 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann
## 6. 未来计划
### 6.1 云端自动部署能力
为了方便用户更容易将Paddle的预测模型部署到线上,Paddle Serving在接下来的版本会提供Kubernetes生态下任务编排的工具。
### 6.2 向量检索、树结构检索
### 6.1 向量检索、树结构检索
在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。
### 6.3 服务监控
### 6.2 服务监控
集成普罗米修斯监控,一套开源的监控&报警&时间序列数据库的组合,适合k8s和docker的监控系统。
# Paddle Serving Design Doc
([简体中文](./DESIGN_DOC_CN.md)|English)
([简体中文](./Serving_Design_CN.md)|English)
## 1. Design Objectives
......@@ -53,16 +53,16 @@ Paddle Serving takes into account a series of issues such as different operating
Cross-platform is not dependent on the operating system, nor on the hardware environment. Applications developed under one operating system can still run under another operating system. Therefore, the design should consider not only the development language and the cross-platform components, but also the interpretation differences of the compilers on different systems.
Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](DOCKER_IMAGES.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](RUN_IN_DOCKER.md)》.Currently, the Python webservice mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](WINDOWS_TUTORIAL.md)
Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](Docker_Images_EN.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](Run_In_Docker_EN.md)》.Currently, the Python webservice mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](Windows_Tutorial_EN.md)
> Support multiple development languages client ​​SDKs
Paddle Serving provides 4 development language client SDKs, including Python, C++, Java, and Golang. Golang SDK is under construction, We hope that interested open source developers can help submit PR.
Paddle Serving provides 3 development language client SDKs, including Python, C++, Java, we hope that interested open source developers can help submit PR.
+ Python, Refer to the client example under python/examples or 4.2 web service example.
+ C++, Refer to《[从零开始写一个预测服务](CREATING.md)
+ Java, Refer to《[Paddle Serving Client Java SDK](JAVA_SDK.md)
+ Golang, Refer to《[How to use Go Client of Paddle Serving](deprecated/IMDB_GO_CLIENT.md)
+ C++, Refer to《[从零开始写一个预测服务](C++_Serving/Creat_C++Serving_CN.md)
+ Java, Refer to《[Paddle Serving Client Java SDK](Java_SDK_EN.md)
> Support multiple hardware devices
......@@ -72,7 +72,7 @@ The inference framework of the well-known deep learning platform only supports C
Models trained on other deep learning platforms can be passed《[PaddlePaddle/X2Paddle工具](https://github.com/PaddlePaddle/X2Paddle)》.We convert multiple mainstream CV models to Paddle models. TensorFlow, Caffe, ONNX, PyTorch model conversion is tested.《[AIStudio教程-Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/1555945)
Because it is impossible to directly view the feed and fetch parameter information in the model file, it is not convenient for users to assemble the parameters. Therefore, Paddle Serving developed a tool to convert the Paddle model into Serving format and generate a prototxt file containing feed and fetch parameter information. The following figure is the generated prototxt file of the uci_housing example. For more conversion methods, refer to the document《[How to save a servable model of Paddle Serving?](SAVE.md)》.
Because it is impossible to directly view the feed and fetch parameter information in the model file, it is not convenient for users to assemble the parameters. Therefore, Paddle Serving developed a tool to convert the Paddle model into Serving format and generate a prototxt file containing feed and fetch parameter information. The following figure is the generated prototxt file of the uci_housing example. For more conversion methods, refer to the document《[How to save a servable model of Paddle Serving?](Save_EN.md)》.
```
feed_var {
name: "x"
......@@ -121,14 +121,14 @@ The core execution engine of Paddle Serving is a Directed acyclic graph(DAG). In
<p>
### 3.3 Model Management and Hot Reloading
C++ Serving supports model management functions, including management of multiple models and multiple model versions.In order to ensure the availability of services, the model needs to be hot loaded without service interruption. Paddle Serving supports this feature and provides a tool for monitoring output models to update local models. Please refer to [Hot loading in Paddle Serving](HOT_LOADING_IN_SERVING.md) for specific examples.
C++ Serving supports model management functions, including management of multiple models and multiple model versions.In order to ensure the availability of services, the model needs to be hot loaded without service interruption. Paddle Serving supports this feature and provides a tool for monitoring output models to update local models. Please refer to [Hot loading in Paddle Serving](C++_Serving/Hot_Loading_EN.md) for specific examples.
### 3.4 MOEDL ENCRYPTION INFERENCE
Paddle Serving uses a symmetric encryption algorithm to encrypt the model, and decrypts it in memory during the service loading model. At present, providing basic model security capabilities does not guarantee absolute model security. Users can improve them according to our design to achieve a higher level of security. Documentation reference《[MOEDL ENCRYPTION INFERENCE](ENCRYPTION.md)
Paddle Serving uses a symmetric encryption algorithm to encrypt the model, and decrypts it in memory during the service loading model. At present, providing basic model security capabilities does not guarantee absolute model security. Users can improve them according to our design to achieve a higher level of security. Documentation reference《[MOEDL ENCRYPTION INFERENCE](C++_Serving/Encryption_EN.md)
### 3.5 A/B Test
After sufficient offline evaluation of the model, online A/B test is usually needed to decide whether to enable the service on a large scale. The following figure shows the basic structure of A/B test with Paddle Serving. After the client is configured with the corresponding configuration, the traffic will be automatically distributed to different servers to achieve A/B test. Please refer to [ABTEST in Paddle Serving](ABTEST_IN_PADDLE_SERVING.md) for specific examples.
After sufficient offline evaluation of the model, online A/B test is usually needed to decide whether to enable the service on a large scale. The following figure shows the basic structure of A/B test with Paddle Serving. After the client is configured with the corresponding configuration, the traffic will be automatically distributed to different servers to achieve A/B test. Please refer to [ABTEST in Paddle Serving](C++_Serving/ABTEST_EN.md) for specific examples.
<p align="center">
<br>
......@@ -193,7 +193,7 @@ The network framework of Pipeline Serving uses gRPC and gPRC gateway. The gRPC s
### 5.2 Core Design And Use Cases
The core design of Pipeline Serving is a graph execution engine, and the basic processing units are OP and Channel. A set of directed acyclic graphs can be realized through combination. Reference for design and use documents《[Pipeline Serving](PIPELINE_SERVING.md)
The core design of Pipeline Serving is a graph execution engine, and the basic processing units are OP and Channel. A set of directed acyclic graphs can be realized through combination. Reference for design and use documents《[Pipeline Serving](Python_Pipeline/Pipeline_Design_EN.md)
<center>
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
......
## Paddle Serving uses TensorRT
(English|[简体中文](./TENSOR_RT_CN.md))
### Background
Deploying models trained on mainstream frameworks through the tensorRT tool launched by Nvidia can greatly increase the speed of model inference, which is often at least 1 times faster than the original framework, and it also takes up more device memory. less. Therefore, it is very useful for all users who need to deploy models to master the method of deploying deep learning models with tensorRT. Paddle Serving provides comprehensive TensorRT ecological support.
### surroundings
Serving Cuda10.1 Cuda10.2 and Cuda11 versions support TensorRT.
#### Install Paddle
In [Development using Docker environment](./RUN_IN_DOCKER.md) and [Docker image list](./DOCKER_IMAGES.md), we give the development image of TensorRT. After using the mirror to start, you need to install the Paddle whl package that supports TensorRT, refer to the documentation on the home page
```
# GPU Cuda10.2 environment please execute
pip install paddlepaddle-gpu==2.0.0
```
**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list
](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release)
Select the URL link of the corresponding GPU environment and install it. For example, for Python2.7 users of Cuda 10.1, please select `cp27-cp27mu` and
`cuda10.1-cudnn7.6-trt6.0.1.5` corresponding url, copy it and execute
```
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl
```
Since the default `paddlepaddle-gpu==2.0.0` is Cuda 10.2 and TensorRT is not built, if you need to use TensorRT on `paddlepaddle-gpu`, you need to find `cuda10 in the above multi-version whl package list .2-cudnn8.0-trt7.1.3`, download the corresponding Python version.
#### Install Paddle Serving
```
# Cuda10.2
pip install paddle-server-server==${VERSION}.post102
# Cuda 10.1
pip install paddle-server-server==${VERSION}.post101
# Cuda 11
pip install paddle-server-server==${VERSION}.post11
```
### Use TensorRT
#### RPC mode
In [Serving model example](../python/examples), we have given models that can be accelerated using TensorRT, such as [Faster_RCNN model](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco) under detection
We just need
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
The TensorRT version of the faster_rcnn model server is started
#### Local Predictor mode
In [local_predictor](../python/paddle_serving_app/local_predict.py#L52), users can explicitly specify `use_trt=True` and pass it to `load_model_config`.
Other methods are no different from other Local Predictor methods, and you need to pay attention to the compatibility of the model with TensorRT.
#### Pipeline Mode
In [Pipeline mode](./PIPELINE_SERVING.md), our [imagenet example](../python/examples/pipeline/imagenet/config.yml#L23) gives the way to set TensorRT.
## Paddle Serving 使用 TensorRT
([English](./TENSOR_RT.md)|简体中文)
### 背景
通过Nvidia推出的tensorRT工具来部署主流框架上训练的模型能够极大的提高模型推断的速度,往往相比与原本的框架能够有至少1倍以上的速度提升,同时占用的设备内存也会更加的少。因此对是所有需要部署模型的用户来说,掌握用tensorRT来部署深度学习模型的方法是非常有用的。Paddle Serving提供了全面的TensorRT生态支持。
### 环境
Serving 的Cuda10.1 Cuda10.2和Cuda11版本支持TensorRT。
#### 安装Paddle
[使用Docker环境开发](./RUN_IN_DOCKER_CN.md)[Docker镜像列表](./DOCKER_IMAGES_CN.md)当中,我们给出了TensorRT的开发镜像。使用镜像启动之后,需要安装支持TensorRT的Paddle whl包,参考首页的文档
```
# GPU Cuda10.2环境请执行
pip install paddlepaddle-gpu==2.0.0
```
**注意**: 如果您的Cuda版本不是10.2,请勿直接执行上述命令,需要参考[Paddle官方文档-多版本whl包列表
](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
选择相应的GPU环境的url链接并进行安装,例如Cuda 10.1的Python2.7用户,请选择表格当中的`cp27-cp27mu`
`cuda10.1-cudnn7.6-trt6.0.1.5`对应的url,复制下来并执行
```
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl
```
由于默认的`paddlepaddle-gpu==2.0.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。
#### 安装Paddle Serving
```
# Cuda10.2
pip install paddle-server-server==${VERSION}.post102
# Cuda 10.1
pip install paddle-server-server==${VERSION}.post101
# Cuda 11
pip install paddle-server-server==${VERSION}.post11
```
### 使用TensorRT
#### RPC模式
[Serving模型示例](../python/examples)当中,我们有给出可以使用TensorRT加速的模型,例如detection下的[Faster_RCNN模型](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco)
我们只需
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
TensorRT版本的faster_rcnn模型服务端就启动了
#### Local Predictor模式
[local_predictor](../python/paddle_serving_app/local_predict.py#L52)当中,用户可以显式制定`use_trt=True`传入到`load_model_config`当中。
其他方式和其他Local Predictor使用方法没有区别,需要注意模型对TensorRT的兼容性。
#### Pipeline模式
[Pipeline模式](./PIPELINE_SERVING_CN.md)当中,我们的[imagenet例子](../python/examples/pipeline/imagenet/config.yml#L23)给出了设置TensorRT的方式。
## Windows平台使用Paddle Serving指导
([English](./WINDOWS_TUTORIAL.md)|简体中文)
([English](./Windows_Turtial_EN.md)|简体中文)
### 综述
......@@ -97,7 +97,7 @@ r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
用户只需要按照如上指示,在对应函数中实现相关内容即可。更多信息请参见[如何开发一个新的Web Service?](./NEW_WEB_SERVICE_CN.md)
用户只需要按照如上指示,在对应函数中实现相关内容即可。更多信息请参见[如何开发一个新的Web Service?](./C++_Serving/Http_Service_CN.md)
开发完成后执行
......
## Paddle Serving for Windows Users
(English|[简体中文](./WINDOWS_TUTORIAL_CN.md))
(English|[简体中文](./Windows_Tutorial_CN.md))
### Summary
......@@ -97,7 +97,7 @@ r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
The user only needs to follow the above instructions and implement the relevant content in the corresponding function. For more information, please refer to [How to develop a new Web Service? ](./NEW_WEB_SERVICE.md)
The user only needs to follow the above instructions and implement the relevant content in the corresponding function. For more information, please refer to [How to develop a new Web Service? ](./C++_Serving/Http_Service_EN.md)
Execute after development
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册