From 15558f7dca828da9beec357599c20c50116df194 Mon Sep 17 00:00:00 2001 From: Houjiang Chen Date: Wed, 5 Feb 2020 11:27:28 +0800 Subject: [PATCH] Update xrt readme (#2611) --- README.md | 30 +++++++++++++++++------------- oneflow/xrt/README.md | 42 +++++++++++++++++++++++++++++++----------- 2 files changed, 48 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 5ce519d730..e51f7bddfa 100644 --- a/README.md +++ b/README.md @@ -6,13 +6,13 @@ Building OneFlow from source requires a `BLAS libary` installed. On CentOS, if you have `Intel MKL` installed, please update the environment variable. -``` +```shell export LD_LIBRARY_PATH=/opt/intel/lib/intel64_lin:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH ``` Or you can install OpenBLAS and other tools through: -``` +```shell sudo yum -y install epel-release && sudo yum -y install git gcc-c++ cmake3 openblas-devel kernel-devel-$(uname -r) nasm ``` @@ -20,26 +20,26 @@ Or you can install OpenBLAS and other tools through: > note: with `--recursive` flag to clone third_party submodules -``` +```shell git clone https://github.com/Oneflow-Inc/oneflow --recursive ``` or you can just clone source code and submodules step by step -``` +```shell git clone https://github.com/Oneflow-Inc/oneflow git submodule update --init --recursive ``` #### build third party from source -``` +```shell cmake -DTHIRD_PARTY=ON .. && make -j ``` #### build oneflow -``` +```shell cmake -DTHIRD_PARTY=OFF .. && make -j ``` @@ -55,7 +55,7 @@ or you can just clone source code and submodules step by step - Update cmake - It is needed only if CMake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall CMake if you encountered a downloading error while building the third-parties. + It is needed only if cmake installed does not support downloading .tgz file from URL with https protocol. Skip this step, just go back here to reinstall cmake if you encountered a downloading error while building the third-parties. Download cmake(>=3.7) from [here](https://cmake.org/download/) , configure and install it by the following command: @@ -90,18 +90,14 @@ or you can just clone source code and submodules step by step make -j$(nproc) ``` -- XLA documents - - You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA. - ### Build with TensorRT - Build third-parties - Run the following command to build third-parties. + Download TensorRT(>=6.0) .tgz and unzip the package, then run the following command to build third-parties. ```shell - cd build && cmake -DWITH_TENSORRT=ON -DTHIRD_PARTY=ON .. + cd build && cmake -DWITH_TENSORRT=ON -DTENSORRT_ROOT=your_tensorrt_path -DTHIRD_PARTY=ON .. make -j$(nproc) ``` - Build OneFlow @@ -109,9 +105,17 @@ or you can just clone source code and submodules step by step ```shell cmake .. \ -DWITH_TENSORRT=ON \ + -DTENSORRT_ROOT=your_tensorrt_path \ -DPYTHON_LIBRARY=your_python_lib_path \ -DPYTHON_INCLUDE_DIR=your_python_include_dir \ -DPython_NumPy_INCLUDE_DIRS=your_numpy_include_dir make -j$(nproc) ``` + +### Documents + + - XRT documents + + You can check this [doc](./oneflow/xrt/README.md) to obtain more details about how to use XLA and TensorRT with OneFlow. + diff --git a/oneflow/xrt/README.md b/oneflow/xrt/README.md index bd59a82d35..9ed3cc2ba2 100644 --- a/oneflow/xrt/README.md +++ b/oneflow/xrt/README.md @@ -2,16 +2,21 @@ XRT是一个同时支持多个计算引擎的运行时加速库,目前已经集成了TensorFlow XLA和Nvidia TensorRT两个后端引擎。其中XLA全面支持训练和预测,TensorRT支持预测以及部分算子支持训练。对于同一个计算图,XRT允许多个计算引擎联合使用,以获得更好的加速效果。 +不同的后端引擎支持不同的后端硬件,比如XLA支持CPU和Nvidia GPU,但TensorRT仅支持Nvidia GPU。 + 对于任意后端引擎,XRT的执行过程均分成以下四个步骤: 1. 计算图的转换 -2. 引擎无关优化 -3. 生成引擎相关Executable -4. 执行Executable +2. 划分计算子图 +3. 引擎无关优化 +4. 生成引擎相关Executable +5. 执行Executable -### 引擎无关优化 +### 计算图的转换 + + 将OneFlow Job转换成XRT的计算流图 (XrtGraph),该计算流图经过一序列变换后,最终被编译成后端引擎相关的Executable。 -- 划分子图 +### 划分计算子图 根据计算图中每个计算节点是否可编译、device、sbp policy等一系列属性,对节点进行聚合,被聚合的节点被新的节点(Launch节点)折叠后并在节点内进行子图重建,同时确定子图的后端执行引擎。 @@ -43,11 +48,13 @@ XRT是一个同时支持多个计算引擎的运行时加速库,目前已经 同时FLAGS_strict_clustering=true时会导致合并的子图变小,可能导致后端引擎丧失一些优化机会。FLAGS_strict_clustering默认设为true。 -- ... +### 引擎无关优化 + +暂未提供,后续可以加入一些图优化相关的pass。 ### Executable的生成 -在runtime阶段,每个子图都可以被编译成一个与引擎相关的Executable。 +在runtime阶段,每个计算子图都可以被编译成一个与引擎相关的Executable。 对于静态shape的子图,由于缓存机制,每个子图只需要在运行时编译一次。对于包含动态shape的子图,则可能每次运行时都需要编译一次,因此如果计算图中包含动态shape的节点,暂时不建议使用XRT。 @@ -86,13 +93,13 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和 ```python import oneflow as flow + config = flow.function_config() + # 配置使用XLA - # True开启XLA,False关闭XLA,默认为未定义状态 - flow.config.use_xla_jit(True) + config.use_xla_jit() # 配置使用TensorRT - # True开启TensorRT,False关闭TensorRT,默认为未定义状态 - flow.config.use_tensorrt(True) + config.use_tensorrt() ``` - 从环境变量配置 @@ -103,6 +110,19 @@ OneFlow中XRT的使用默认是关闭的,可以通过前端的Python接口和 export FLAGS_use_tensorrt=true # true为开启,false为关闭 ``` +- 低精度配置 + + ```python + # XLA自动混合精度(float16) + config.enable_auto_mixed_precision() + + # TensorRT float16 + config.tensorrt.use_fp16() + + # TensorRT int8 (目前尚未支持) + config.tensorrt.use_int8() + ``` + ### BenchMark - Bert base (batch size = 60) -- GitLab