未验证 提交 15558f7d 编写于 作者: H Houjiang Chen 提交者: GitHub

Update xrt readme (#2611)

上级 3fdb1229
...@@ -6,13 +6,13 @@ ...@@ -6,13 +6,13 @@
Building OneFlow from source requires a `BLAS libary` installed. On CentOS, if you have `Intel MKL` installed, please update the environment variable. Building OneFlow from source requires a `BLAS libary` installed. On CentOS, if you have `Intel MKL` installed, please update the environment variable.
``` ```shell
export LD_LIBRARY_PATH=/opt/intel/lib/intel64_lin:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/intel/lib/intel64_lin:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
``` ```
Or you can install OpenBLAS and other tools through: Or you can install OpenBLAS and other tools through:
``` ```shell
sudo yum -y install epel-release && sudo yum -y install git gcc-c++ cmake3 openblas-devel kernel-devel-$(uname -r) nasm sudo yum -y install epel-release && sudo yum -y install git gcc-c++ cmake3 openblas-devel kernel-devel-$(uname -r) nasm
``` ```
...@@ -20,26 +20,26 @@ Or you can install OpenBLAS and other tools through: ...@@ -20,26 +20,26 @@ Or you can install OpenBLAS and other tools through:
> note: with `--recursive` flag to clone third_party submodules > note: with `--recursive` flag to clone third_party submodules
``` ```shell
git clone https://github.com/Oneflow-Inc/oneflow --recursive git clone https://github.com/Oneflow-Inc/oneflow --recursive
``` ```
or you can just clone source code and submodules step by step or you can just clone source code and submodules step by step
``` ```shell
git clone https://github.com/Oneflow-Inc/oneflow git clone https://github.com/Oneflow-Inc/oneflow
git submodule update --init --recursive git submodule update --init --recursive
``` ```
#### build third party from source #### build third party from source
``` ```shell
cmake -DTHIRD_PARTY=ON .. && make -j cmake -DTHIRD_PARTY=ON .. && make -j
``` ```
#### build oneflow #### build oneflow
``` ```shell
cmake -DTHIRD_PARTY=OFF .. && make -j cmake -DTHIRD_PARTY=OFF .. && make -j
``` ```
...@@ -55,7 +55,7 @@ or you can just clone source code and submodules step by step ...@@ -55,7 +55,7 @@ or you can just clone source code and submodules step by step
- Update cmake - Update cmake
It is needed only if CMake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall CMake if you encountered a downloading error while building the third-parties. It is needed only if cmake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall cmake if you encountered a downloading error while building the third-parties.
Download cmake(>=3.7) from [here](https://cmake.org/download/) , configure and install it by the following command: Download cmake(>=3.7) from [here](https://cmake.org/download/) , configure and install it by the following command:
...@@ -90,18 +90,14 @@ or you can just clone source code and submodules step by step ...@@ -90,18 +90,14 @@ or you can just clone source code and submodules step by step
make -j$(nproc) make -j$(nproc)
``` ```
- XLA documents
You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA.
### Build with TensorRT ### Build with TensorRT
- Build third-parties - Build third-parties
Run the following command to build third-parties. Download TensorRT(>=6.0) .tgz and unzip the package, then run the following command to build third-parties.
```shell ```shell
cd build && cmake -DWITH_TENSORRT=ON -DTHIRD_PARTY=ON .. cd build && cmake -DWITH_TENSORRT=ON -DTENSORRT_ROOT=your_tensorrt_path -DTHIRD_PARTY=ON ..
make -j$(nproc) make -j$(nproc)
``` ```
- Build OneFlow - Build OneFlow
...@@ -109,9 +105,17 @@ or you can just clone source code and submodules step by step ...@@ -109,9 +105,17 @@ or you can just clone source code and submodules step by step
```shell ```shell
cmake .. \ cmake .. \
-DWITH_TENSORRT=ON \ -DWITH_TENSORRT=ON \
-DTENSORRT_ROOT=your_tensorrt_path \
-DPYTHON_LIBRARY=your_python_lib_path \ -DPYTHON_LIBRARY=your_python_lib_path \
-DPYTHON_INCLUDE_DIR=your_python_include_dir \ -DPYTHON_INCLUDE_DIR=your_python_include_dir \
-DPython_NumPy_INCLUDE_DIRS=your_numpy_include_dir -DPython_NumPy_INCLUDE_DIRS=your_numpy_include_dir
make -j$(nproc) make -j$(nproc)
``` ```
### Documents
- XRT documents
You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA and TensorRT with OneFlow.
...@@ -2,16 +2,21 @@ ...@@ -2,16 +2,21 @@
XRT是一个同时支持多个计算引擎的运行时加速库,目前已经集成了TensorFlow XLA和Nvidia TensorRT两个后端引擎。其中XLA全面支持训练和预测,TensorRT支持预测以及部分算子支持训练。对于同一个计算图,XRT允许多个计算引擎联合使用,以获得更好的加速效果。 XRT是一个同时支持多个计算引擎的运行时加速库,目前已经集成了TensorFlow XLA和Nvidia TensorRT两个后端引擎。其中XLA全面支持训练和预测,TensorRT支持预测以及部分算子支持训练。对于同一个计算图,XRT允许多个计算引擎联合使用,以获得更好的加速效果。
不同的后端引擎支持不同的后端硬件,比如XLA支持CPU和Nvidia GPU,但TensorRT仅支持Nvidia GPU。
对于任意后端引擎,XRT的执行过程均分成以下四个步骤: 对于任意后端引擎,XRT的执行过程均分成以下四个步骤:
1. 计算图的转换 1. 计算图的转换
2. 引擎无关优化 2. 划分计算子图
3. 生成引擎相关Executable 3. 引擎无关优化
4. 执行Executable 4. 生成引擎相关Executable
5. 执行Executable
### 引擎无关优化 ### 计算图的转换
将OneFlow Job转换成XRT的计算流图 (XrtGraph),该计算流图经过一序列变换后,最终被编译成后端引擎相关的Executable。
- 划分子图 ### 划分计算子图
根据计算图中每个计算节点是否可编译、device、sbp policy等一系列属性,对节点进行聚合,被聚合的节点被新的节点(Launch节点)折叠后并在节点内进行子图重建,同时确定子图的后端执行引擎。 根据计算图中每个计算节点是否可编译、device、sbp policy等一系列属性,对节点进行聚合,被聚合的节点被新的节点(Launch节点)折叠后并在节点内进行子图重建,同时确定子图的后端执行引擎。
...@@ -43,11 +48,13 @@ XRT是一个同时支持多个计算引擎的运行时加速库,目前已经 ...@@ -43,11 +48,13 @@ XRT是一个同时支持多个计算引擎的运行时加速库,目前已经
同时FLAGS_strict_clustering=true时会导致合并的子图变小,可能导致后端引擎丧失一些优化机会。FLAGS_strict_clustering默认设为true。 同时FLAGS_strict_clustering=true时会导致合并的子图变小,可能导致后端引擎丧失一些优化机会。FLAGS_strict_clustering默认设为true。
- ... ### 引擎无关优化
暂未提供,后续可以加入一些图优化相关的pass。
### Executable的生成 ### Executable的生成
在runtime阶段,每个子图都可以被编译成一个与引擎相关的Executable。 在runtime阶段,每个计算子图都可以被编译成一个与引擎相关的Executable。
对于静态shape的子图,由于缓存机制,每个子图只需要在运行时编译一次。对于包含动态shape的子图,则可能每次运行时都需要编译一次,因此如果计算图中包含动态shape的节点,暂时不建议使用XRT。 对于静态shape的子图,由于缓存机制,每个子图只需要在运行时编译一次。对于包含动态shape的子图,则可能每次运行时都需要编译一次,因此如果计算图中包含动态shape的节点,暂时不建议使用XRT。
...@@ -86,13 +93,13 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和 ...@@ -86,13 +93,13 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和
```python ```python
import oneflow as flow import oneflow as flow
config = flow.function_config()
# 配置使用XLA # 配置使用XLA
# True开启XLA,False关闭XLA,默认为未定义状态 config.use_xla_jit()
flow.config.use_xla_jit(True)
# 配置使用TensorRT # 配置使用TensorRT
# True开启TensorRT,False关闭TensorRT,默认为未定义状态 config.use_tensorrt()
flow.config.use_tensorrt(True)
``` ```
- 从环境变量配置 - 从环境变量配置
...@@ -103,6 +110,19 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和 ...@@ -103,6 +110,19 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和
export FLAGS_use_tensorrt=true # true为开启,false为关闭 export FLAGS_use_tensorrt=true # true为开启,false为关闭
``` ```
- 低精度配置
```python
# XLA自动混合精度(float16)
config.enable_auto_mixed_precision()
# TensorRT float16
config.tensorrt.use_fp16()
# TensorRT int8 (目前尚未支持)
config.tensorrt.use_int8()
```
### BenchMark ### BenchMark
- Bert base (batch size = 60) - Bert base (batch size = 60)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册