新特性
- 集成Intel MKLDNN加速推理 #1264,#1266, #1277
- C++ Serving支持HTTP 请求 #1321
- C++ Serving支持gPRC 和HTTP + Proto请求 #1345
- 新增C++ Client SDK #1370
性能优化
- C++ Serving优化Pybind传递数据方法 #1268, #1269
- C++ Serving增加GPU多流、异步任务队列,删除冗余加锁 #1289
- C++ Serving WebServer使用连接池和数据压缩 #1348
- C++ Serving框架新增异步批量合并,支持变长LOD输入 #1366
- C++ Serving stage并发执行 #1376
- C++ Serving增加各阶段处理耗时日志 #1390
功能变更
- 重写模型保存方案和命名规则,兼容旧版本 #1354,#1358
- 支持更多数据类型float64,int16, float16, uint16, uint8, int8, bool , complex64 , complex128 #1338
- 重写GPU id设置device的逻辑 #1303
- 指定Fetch list返回部分推理结果 #1359
- 设置XPU ID #1436
- 服务优雅关闭 #1470
- C++ Serving Client端pybind支持uint8、int8数据读写 #1378
- C++ Serving Client端pybind支持uint16、int16数据读写 #1420
- C++ Serving支持异步参数设置 #1483
- Python Pipeline增加While OP控制循环 #1338
- Python pipeline之间可使用gRPC交互 #1358
- Python Pipeline 支持Proto结构Tensor数据格式交互 #1369, #1384
- Python Pipeline仅获取最快的前置OP结果 #1380
- Python Pipeline 支持LoD类型输入 #1472
- Cube服务新增python http方式请求样例 #1399
- Cube服务增加读取RecordFile工具 #1336
- Cube-server和Cube-transfer上线部署优化 #1337
- 删除multi-lang相关代码 #1321
文档和示例变更
- 修改Doc目录结构,新增子目录 #1473, #1475
- 迁移Serving/python/examples路径到Serving/examples,重新设计目录 #1487
- 修改doc文件名称 #1487
- 新增C++ Serving Benchmark #1176
- 新增PaddleClas/DarkNet 加密模型部署示例 #1352
- 新增Model Zoo文档 #1492
- 新增Install文档 #1473
- 新增Quick_Start文档 #1473
- 新增Serving_Configure文档 #1495
- 新增C++_Serving/Inference_Protocols_CN.md #1500
- 新增C++_Serving/Introduction_CN.md #1497
- 新增C++_Serving/Performance_Tuning_CN.md #1497
- 新增Python_Pipeline/Performance_Tuning_CN.md #1503
- 更新Java SDK文档 #1357
- 更新Compile文档 #1502
- 更新Readme文档 #1473
- 更新Latest_Package_CN.md #1513
- 更新Run_On_Kubernetes_CN.md #1520
Bug修复
- 修复内存池使用问题 #1283
- 修复多线程中错误加锁问题 #1289
- 修复C++ Serving多模型组合场景,无法加载第二个模型问题 #1294
- 修复请求数据大时越界问题 #1308
- 修复Detection模型结果偏离问题 #1413
- 修复use_calib设置错误问题 #1414
- 修复C++ OCR示例结果不正确问题 #1415
- 修复并行推理出core问题 #1417
For English:
New Features
- Integrate Intel MKLDNN #1264,#1266, #1277
- C++ Serving supports HTTP requests #1321
- C++ Serving supports gPRC and HTTP + Proto requests #1345
- Added C++ Client SDK #1370
Performance optimization
- C++ Serving optimizes Pybind data transfer method #1268, #1269
- C++ Serving adds GPU multi-stream, asynchronous task queue, deletes redundant locks #1289
- C++ Serving webserver uses connection pool and data compression #1348
- C++ Serving framework adds asynchronous batch merge and supports variable length LOD input #1366
- C++ Serving stage concurrent execution #1376
- C++ Serving adds time-consuming log processing at each stage #1390
Function changes
- Rewrite model saving methods and naming rules, compatible with the old version #1354,#1358
- Support more data types float64, int16, float16, uint16, uint8, int8, bool, complex64, complex128 #1338
- Rewrite the method of GPU id binding device #1303
- Specify Fetch list to return partial inference results #1359
- Set XPU ID #1436
- Service closed gracefully #1470
- C++ Serving Client pybind supports uint8, int8 data #1378
- C++ Serving Client pybind supports uint16, int16 data #1420
- C++ Serving supports asynchronous parameter setting #1483
- Python Pipeline adds While OP control loop #1338
- GRPC interaction can be used between Python pipelines #1358
- Python Pipeline supports Proto structure Tensor data format interaction #1369, #1384
- Python Pipeline only gets the fastest pre-OP results #1380
- Python Pipeline supports LoD type input #1472
- Cube service adds python http request sample #1399
- Cube service adds a tool to read RecordFile #1336
- Cube-server and Cube-transfer online deployment optimization #1337
- Delete multi-lang related code #1321
Documentation and example changes
- Modify the Doc directory structure and add subdirectories #1473, #1475
- Move python/examples path to parent directory, and redesign directory #1487
- Modify the doc file name #1487
- Add C++ Serving Benchmark #1176
- Add one PaddleClas/DarkNet encryption model example #1352
- Add Model Zoo doc #1492
- Add Install doc #1473
- Add Quick Start doc #1473
- Add Serving Configure doc #1495
- Add C++_Serving/Inference_Protocols_CN.md#1500
- Add C++_Serving/Introduction_CN.md#1497
- Add C++_Serving/Performance_Tuning_CN.md#1497
- Add Python_Pipeline/Performance_Tuning_CN.md#1503
- Update Java SDK doc #1357
- Update Compile doc #1502
- Update Readme doc #1473
- Update Latest_Package_CN.md#1513
- Update Run_On_Kubernetes_CN.md#1520
Bug fix
- Fix one memory pool usage problem #1283
- Fix the wrong locking problem in multi-threading #1289
- Fix the problem of C++ Serving multi-model combination #1294
- Fix the problem of out of bounds when the requested data is large #1308
- Fix the problem of inaccurate prediction results of the Detection model #1413
- Fix the wrong setting of use_calib #1414
- Fix the problem of incorrect C++ OCR example results #1415
- Fix the core problem of parallel reasoning #1417