Update xrt readme (#2611)

15558f7d · Houjiang Chen · GitHub · 3fdb1229 · 15558f7d · 15558f7d
隐藏空白更改
内联并排

Showing with 48 addition and 24 deletion

README.md README.md +17 -13

oneflow/xrt/README.md oneflow/xrt/README.md +31 -11

未找到文件。
--- a/README.md
+++ b/README.md
@@ -6,13 +6,13 @@
 Building OneFlow from source requires a `BLAS libary` installed. On CentOS, if you have `Intel MKL` installed, please update the environment variable. 
-```
+```shell
    export LD_LIBRARY_PATH=/opt/intel/lib/intel64_lin:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
 ```
 Or you can install OpenBLAS and other tools through:
-```
+```shell
    sudo yum -y install epel-release && sudo yum -y install git gcc-c++ cmake3 openblas-devel kernel-devel-$(uname -r) nasm
 ```
@@ -20,26 +20,26 @@ Or you can install OpenBLAS and other tools through:
 > note: with `--recursive` flag to clone third_party submodules
-```
+```shell
    git clone https://github.com/Oneflow-Inc/oneflow --recursive
 ```
 or you can just clone source code and submodules step by step
-```
+```shell
    git clone https://github.com/Oneflow-Inc/oneflow
    git submodule update --init --recursive
 ```
 #### build third party from source
-```
+```shell
  cmake -DTHIRD_PARTY=ON .. && make -j
 ```
 #### build oneflow
-```
+```shell
    cmake -DTHIRD_PARTY=OFF .. && make -j
 ```
@@ -55,7 +55,7 @@ or you can just clone source code and submodules step by step
 - Update cmake
-  It is needed only if CMake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall CMake if you encountered a downloading error while building the third-parties.
+  It is needed only if cmake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall cmake if you encountered a downloading error while building the third-parties.
  Download cmake(>=3.7) from [here](https://cmake.org/download/) , configure and install it by the following command:
@@ -90,18 +90,14 @@ or you can just clone source code and submodules step by step
  make -j$(nproc)
  ```
- XLA documents
-  You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA.
 ### Build with TensorRT
 - Build third-parties
-  Run the following command to build third-parties.
+  Download TensorRT(>=6.0) .tgz and unzip the package, then run the following command to build third-parties.
  ```shell
-  cd build && cmake -DWITH_TENSORRT=ON -DTHIRD_PARTY=ON ..
+  cd build && cmake -DWITH_TENSORRT=ON -DTENSORRT_ROOT=your_tensorrt_path -DTHIRD_PARTY=ON ..
  make -j$(nproc)
  ```
 - Build OneFlow
@@ -109,9 +105,17 @@ or you can just clone source code and submodules step by step
  ```shell
  cmake .. \
  -DWITH_TENSORRT=ON \
+  -DTENSORRT_ROOT=your_tensorrt_path \
  -DPYTHON_LIBRARY=your_python_lib_path \
  -DPYTHON_INCLUDE_DIR=your_python_include_dir \
  -DPython_NumPy_INCLUDE_DIRS=your_numpy_include_dir
  make -j$(nproc)
  ```
+### Documents
+ - XRT documents
+   You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA and TensorRT with OneFlow.
--- a/oneflow/xrt/README.md
+++ b/oneflow/xrt/README.md
@@ -2,16 +2,21 @@
 XRT是一个同时支持多个计算引擎的运行时加速库，目前已经集成了TensorFlow XLA和Nvidia TensorRT两个后端引擎。其中XLA全面支持训练和预测，TensorRT支持预测以及部分算子支持训练。对于同一个计算图，XRT允许多个计算引擎联合使用，以获得更好的加速效果。
+不同的后端引擎支持不同的后端硬件，比如XLA支持CPU和Nvidia GPU，但TensorRT仅支持Nvidia GPU。
 对于任意后端引擎，XRT的执行过程均分成以下四个步骤：
 1. 计算图的转换
-2. 引擎无关优化
+2. 划分计算子图
-3. 生成引擎相关Executable
+3. 引擎无关优化
-4. 执行Executable
+4. 生成引擎相关Executable
+5. 执行Executable
-### 引擎无关优化
+### 计算图的转换
+  将OneFlow Job转换成XRT的计算流图 (XrtGraph)，该计算流图经过一序列变换后，最终被编译成后端引擎相关的Executable。
- 划分子图
+### 划分计算子图
  根据计算图中每个计算节点是否可编译、device、sbp policy等一系列属性，对节点进行聚合，被聚合的节点被新的节点（Launch节点）折叠后并在节点内进行子图重建，同时确定子图的后端执行引擎。
@@ -43,11 +48,13 @@ XRT是一个同时支持多个计算引擎的运行时加速库，目前已经
    同时FLAGS_strict_clustering=true时会导致合并的子图变小，可能导致后端引擎丧失一些优化机会。FLAGS_strict_clustering默认设为true。
- ...
+### 引擎无关优化
+暂未提供，后续可以加入一些图优化相关的pass。
 ### Executable的生成
-在runtime阶段，每个子图都可以被编译成一个与引擎相关的Executable。
+在runtime阶段，每个计算子图都可以被编译成一个与引擎相关的Executable。
 对于静态shape的子图，由于缓存机制，每个子图只需要在运行时编译一次。对于包含动态shape的子图，则可能每次运行时都需要编译一次，因此如果计算图中包含动态shape的节点，暂时不建议使用XRT。
@@ -86,13 +93,13 @@ OneFlow中XRT的使用默认是关闭的，可以通过前端的Python接口和
  ```python
  import oneflow as flow
+  config = flow.function_config()
  # 配置使用XLA
-  # True开启XLA，False关闭XLA，默认为未定义状态
+  config.use_xla_jit()
-  flow.config.use_xla_jit(True)
  # 配置使用TensorRT
-  # True开启TensorRT，False关闭TensorRT，默认为未定义状态
+  config.use_tensorrt()
-  flow.config.use_tensorrt(True)
  ```
 - 从环境变量配置
@@ -103,6 +110,19 @@ OneFlow中XRT的使用默认是关闭的，可以通过前端的Python接口和
  export FLAGS_use_tensorrt=true # true为开启，false为关闭
  ```
+- 低精度配置
+  ```python
+  # XLA自动混合精度(float16)
+  config.enable_auto_mixed_precision()
+  # TensorRT float16
+  config.tensorrt.use_fp16()
+  # TensorRT int8 (目前尚未支持)
+  config.tensorrt.use_int8()
+  ```
 ### BenchMark
 - Bert base (batch size = 60)