## Paddle Serving Using Baidu Kunlun Chips (English|[简体中文](./Run_On_XPU_CN.md)) Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips (such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve the deployment capability on various heterogeneous hardware servers in the future. ## Install docker images We recommend using the docker deployment service. In the xpu environment, you can refer to the [Docker image document](Docker_Images_EN.md) to install the xpu image, and further complete tasks such as construction, installation, and deployment. ## Compilation and installation Refer to [compile](./Compile_EN.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform. ### Compilatiton * Compile the Serving Server ``` cd Serving mkdir -p server-build-arm && cd server-build-arm cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \ -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \ -DPYTHON_EXECUTABLE=/usr/bin/python \ -DWITH_PYTHON=ON \ -DWITH_LITE=ON \ -DWITH_XPU=ON \ -DSERVER=ON .. make -j10 ``` You can run `make install` to produce the target in `./output` directory. Add `-DCMAKE_INSTALL_PREFIX=./output` to specify the output path to CMake command shown above. Please specify `-DWITH_MKL=ON` on Intel CPU platform with AVX2 support. * Compile the Serving Client ``` mkdir -p client-build-arm && cd client-build-arm cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \ -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \ -DPYTHON_EXECUTABLE=/usr/bin/python \ -DWITH_PYTHON=ON \ -DWITH_LITE=ON \ -DWITH_XPU=ON \ -DCLIENT=ON .. make -j10 ``` * Compile the App ``` cd Serving mkdir -p app-build-arm && cd app-build-arm cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \ -DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \ -DPYTHON_EXECUTABLE=/usr/bin/python \ -DWITH_PYTHON=ON \ -DWITH_LITE=ON \ -DWITH_XPU=ON \ -DAPP=ON .. make -j10 ``` ### Install the wheel package After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories. For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package. ## Request parameters description In order to deploy serving service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment. | param | param description | about | | :------- | :------------------------------- | :----------------------------------------------------------------- | | use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite | | use_xpu | using Baidu Kunlun for inference | need to be used with the use_lite option | | ir_optim | open the graph optimization | refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) | ## Deplyment examples ### Download the model ``` wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz tar -xzf uci_housing.tar.gz ``` ### Start RPC service There are mainly three deployment methods: * deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu; * deploy on the cpu server standalone with Paddle-Lite; * deploy on the cpu server standalone without Paddle-Lite. The first two deployment methods are recommended. Start the rpc service, deploying on cpu server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu. ``` python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim ``` Start the rpc service, deploying on cpu server,and accelerate with Paddle-Lite. ``` python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim ``` Start the rpc service, deploying on cpu server. ``` python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 ``` ### ``` from paddle_serving_client import Client import numpy as np client = Client() client.load_client_config("uci_housing_client/serving_client_conf.prototxt") client.connect(["127.0.0.1:9292"]) data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332] fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"]) print(fetch_map) ``` ## Others ### Model example and explanation Some examples are provided below, and other models can be modifed with reference to these examples. | sample name | sample links | | :---------- | :---------------------------------------------------------- | | fit_a_line | [fit_a_line_xpu](../examples/C++/xpu/fit_a_line_xpu) | | resnet | [resnet_v2_50_xpu](../examples/C++/xpu/resnet_v2_50_xpu) | Note:Supported model lists refer to [doc](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html). There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit [Github issue](https://github.com/PaddlePaddle/Serving/issues), and we will follow up in real time. ### Kunlun chip related reference materials * [PaddlePaddle on Baidu Kunlun xpu chips](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html) * [Deployment on Baidu Kunlun xpu chips using PaddleLite](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)