From 15558f7dca828da9beec357599c20c50116df194 Mon Sep 17 00:00:00 2001
From: Houjiang Chen <chenhoujiangcug@gmail.com>
Date: Wed, 5 Feb 2020 11:27:28 +0800
Subject: [PATCH] Update xrt readme (#2611)

---
 README.md             | 30 +++++++++++++++++-------------
 oneflow/xrt/README.md | 42 +++++++++++++++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index 5ce519d730..e51f7bddfa 100644
--- a/README.md
+++ b/README.md
@@ -6,13 +6,13 @@
 
 Building OneFlow from source requires a `BLAS libary` installed. On CentOS, if you have `Intel MKL` installed, please update the environment variable. 
 
-```
+```shell
     export LD_LIBRARY_PATH=/opt/intel/lib/intel64_lin:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
 ```
 
 Or you can install OpenBLAS and other tools through:
 
-```
+```shell
     sudo yum -y install epel-release && sudo yum -y install git gcc-c++ cmake3 openblas-devel kernel-devel-$(uname -r) nasm
 ```
 
@@ -20,26 +20,26 @@ Or you can install OpenBLAS and other tools through:
 
 > note: with `--recursive` flag to clone third_party submodules
 
-```
+```shell
     git clone https://github.com/Oneflow-Inc/oneflow --recursive
 ```
 
 or you can just clone source code and submodules step by step
 
-```
+```shell
     git clone https://github.com/Oneflow-Inc/oneflow
     git submodule update --init --recursive
 ```
 
 #### build third party from source
 
-```
+```shell
   cmake -DTHIRD_PARTY=ON .. && make -j
 ```
 
 #### build oneflow
 
-```
+```shell
     cmake -DTHIRD_PARTY=OFF .. && make -j
 ```
 
@@ -55,7 +55,7 @@ or you can just clone source code and submodules step by step
 
 - Update cmake
 
-  It is needed only if CMake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall CMake if you encountered a downloading error while building the third-parties.
+  It is needed only if cmake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall cmake if you encountered a downloading error while building the third-parties.
 
   Download cmake(>=3.7) from [here](https://cmake.org/download/) , configure and install it by the following command:
 
@@ -90,18 +90,14 @@ or you can just clone source code and submodules step by step
   make -j$(nproc)
   ```
 
-- XLA documents
-
-  You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA.
-
 ### Build with TensorRT
 
 - Build third-parties
 
-  Run the following command to build third-parties.
+  Download TensorRT(>=6.0) .tgz and unzip the package, then run the following command to build third-parties.
 
   ```shell
-  cd build && cmake -DWITH_TENSORRT=ON -DTHIRD_PARTY=ON ..
+  cd build && cmake -DWITH_TENSORRT=ON -DTENSORRT_ROOT=your_tensorrt_path -DTHIRD_PARTY=ON ..
   make -j$(nproc)
   ```
 - Build OneFlow
@@ -109,9 +105,17 @@ or you can just clone source code and submodules step by step
   ```shell
   cmake .. \
   -DWITH_TENSORRT=ON \
+  -DTENSORRT_ROOT=your_tensorrt_path \
   -DPYTHON_LIBRARY=your_python_lib_path \
   -DPYTHON_INCLUDE_DIR=your_python_include_dir \
   -DPython_NumPy_INCLUDE_DIRS=your_numpy_include_dir
 
   make -j$(nproc)
   ```
+
+### Documents
+
+ - XRT documents
+
+   You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA and TensorRT with OneFlow.
+
diff --git a/oneflow/xrt/README.md b/oneflow/xrt/README.md
index bd59a82d35..9ed3cc2ba2 100644
--- a/oneflow/xrt/README.md
+++ b/oneflow/xrt/README.md
@@ -2,16 +2,21 @@
 
 XRT是一个同时支持多个计算引擎的运行时加速库，目前已经集成了TensorFlow XLA和Nvidia TensorRT两个后端引擎。其中XLA全面支持训练和预测，TensorRT支持预测以及部分算子支持训练。对于同一个计算图，XRT允许多个计算引擎联合使用，以获得更好的加速效果。
 
+不同的后端引擎支持不同的后端硬件，比如XLA支持CPU和Nvidia GPU，但TensorRT仅支持Nvidia GPU。
+
 对于任意后端引擎，XRT的执行过程均分成以下四个步骤：
 
 1. 计算图的转换
-2. 引擎无关优化
-3. 生成引擎相关Executable
-4. 执行Executable
+2. 划分计算子图
+3. 引擎无关优化
+4. 生成引擎相关Executable
+5. 执行Executable
 
-### 引擎无关优化
+### 计算图的转换
+
+  将OneFlow Job转换成XRT的计算流图 (XrtGraph)，该计算流图经过一序列变换后，最终被编译成后端引擎相关的Executable。
 
-- 划分子图
+### 划分计算子图
 
   根据计算图中每个计算节点是否可编译、device、sbp policy等一系列属性，对节点进行聚合，被聚合的节点被新的节点（Launch节点）折叠后并在节点内进行子图重建，同时确定子图的后端执行引擎。
 
@@ -43,11 +48,13 @@ XRT是一个同时支持多个计算引擎的运行时加速库，目前已经
 
     同时FLAGS_strict_clustering=true时会导致合并的子图变小，可能导致后端引擎丧失一些优化机会。FLAGS_strict_clustering默认设为true。
 
-- ...
+### 引擎无关优化
+
+暂未提供，后续可以加入一些图优化相关的pass。
 
 ### Executable的生成
 
-在runtime阶段，每个子图都可以被编译成一个与引擎相关的Executable。
+在runtime阶段，每个计算子图都可以被编译成一个与引擎相关的Executable。
 
 对于静态shape的子图，由于缓存机制，每个子图只需要在运行时编译一次。对于包含动态shape的子图，则可能每次运行时都需要编译一次，因此如果计算图中包含动态shape的节点，暂时不建议使用XRT。
 
@@ -86,13 +93,13 @@ OneFlow中XRT的使用默认是关闭的，可以通过前端的Python接口和
   ```python
   import oneflow as flow
 
+  config = flow.function_config()
+
   # 配置使用XLA
-  # True开启XLA，False关闭XLA，默认为未定义状态
-  flow.config.use_xla_jit(True)
+  config.use_xla_jit()
 
   # 配置使用TensorRT
-  # True开启TensorRT，False关闭TensorRT，默认为未定义状态
-  flow.config.use_tensorrt(True)
+  config.use_tensorrt()
   ```
 
 - 从环境变量配置
@@ -103,6 +110,19 @@ OneFlow中XRT的使用默认是关闭的，可以通过前端的Python接口和
   export FLAGS_use_tensorrt=true # true为开启，false为关闭
   ```
 
+- 低精度配置
+
+  ```python
+  # XLA自动混合精度(float16)
+  config.enable_auto_mixed_precision()
+
+  # TensorRT float16
+  config.tensorrt.use_fp16()
+
+  # TensorRT int8 (目前尚未支持)
+  config.tensorrt.use_int8()
+  ```
+
 ### BenchMark
 
 - Bert base (batch size = 60)
-- 
GitLab