新特性

  • 集成Intel MKLDNN加速推理 #1264,#1266, #1277
  • C++ Serving支持HTTP 请求 #1321
  • C++ Serving支持gPRC 和HTTP + Proto请求 #1345
  • 新增C++ Client SDK #1370

性能优化

  • C++ Serving优化Pybind传递数据方法 #1268, #1269
  • C++ Serving增加GPU多流、异步任务队列,删除冗余加锁 #1289
  • C++ Serving WebServer使用连接池和数据压缩 #1348
  • C++ Serving框架新增异步批量合并,支持变长LOD输入 #1366
  • C++ Serving stage并发执行 #1376
  • C++ Serving增加各阶段处理耗时日志 #1390

功能变更

  • 重写模型保存方案和命名规则,兼容旧版本 #1354,#1358
  • 支持更多数据类型float64,int16, float16, uint16, uint8, int8, bool , complex64 , complex128 #1338
  • 重写GPU id设置device的逻辑 #1303
  • 指定Fetch list返回部分推理结果 #1359
  • 设置XPU ID #1436
  • 服务优雅关闭 #1470
  • C++ Serving Client端pybind支持uint8、int8数据读写 #1378
  • C++ Serving Client端pybind支持uint16、int16数据读写 #1420
  • C++ Serving支持异步参数设置 #1483
  • Python Pipeline增加While OP控制循环 #1338
  • Python pipeline之间可使用gRPC交互 #1358
  • Python Pipeline 支持Proto结构Tensor数据格式交互 #1369, #1384
  • Python Pipeline仅获取最快的前置OP结果 #1380
  • Python Pipeline 支持LoD类型输入 #1472
  • Cube服务新增python http方式请求样例 #1399
  • Cube服务增加读取RecordFile工具 #1336
  • Cube-server和Cube-transfer上线部署优化 #1337
  • 删除multi-lang相关代码 #1321

文档和示例变更

  • 修改Doc目录结构,新增子目录 #1473, #1475
  • 迁移Serving/python/examples路径到Serving/examples,重新设计目录 #1487
  • 修改doc文件名称 #1487
  • 新增C++ Serving Benchmark #1176
  • 新增PaddleClas/DarkNet 加密模型部署示例 #1352
  • 新增Model Zoo文档 #1492
  • 新增Install文档 #1473
  • 新增Quick_Start文档 #1473
  • 新增Serving_Configure文档 #1495
  • 新增C++_Serving/Inference_Protocols_CN.md #1500
  • 新增C++_Serving/Introduction_CN.md #1497
  • 新增C++_Serving/Performance_Tuning_CN.md #1497
  • 新增Python_Pipeline/Performance_Tuning_CN.md #1503
  • 更新Java SDK文档 #1357
  • 更新Compile文档 #1502
  • 更新Readme文档 #1473
  • 更新Latest_Package_CN.md #1513
  • 更新Run_On_Kubernetes_CN.md #1520

Bug修复

  • 修复内存池使用问题 #1283
  • 修复多线程中错误加锁问题 #1289
  • 修复C++ Serving多模型组合场景,无法加载第二个模型问题 #1294
  • 修复请求数据大时越界问题 #1308
  • 修复Detection模型结果偏离问题 #1413
  • 修复use_calib设置错误问题 #1414
  • 修复C++ OCR示例结果不正确问题 #1415
  • 修复并行推理出core问题 #1417

For English:

New Features

  • Integrate Intel MKLDNN #1264,#1266, #1277
  • C++ Serving supports HTTP requests #1321
  • C++ Serving supports gPRC and HTTP + Proto requests #1345
  • Added C++ Client SDK #1370

Performance optimization

  • C++ Serving optimizes Pybind data transfer method #1268, #1269
  • C++ Serving adds GPU multi-stream, asynchronous task queue, deletes redundant locks #1289
  • C++ Serving webserver uses connection pool and data compression #1348
  • C++ Serving framework adds asynchronous batch merge and supports variable length LOD input #1366
  • C++ Serving stage concurrent execution #1376
  • C++ Serving adds time-consuming log processing at each stage #1390

Function changes

  • Rewrite model saving methods and naming rules, compatible with the old version #1354,#1358
  • Support more data types float64, int16, float16, uint16, uint8, int8, bool, complex64, complex128 #1338
  • Rewrite the method of GPU id binding device #1303
  • Specify Fetch list to return partial inference results #1359
  • Set XPU ID #1436
  • Service closed gracefully #1470
  • C++ Serving Client pybind supports uint8, int8 data #1378
  • C++ Serving Client pybind supports uint16, int16 data #1420
  • C++ Serving supports asynchronous parameter setting #1483
  • Python Pipeline adds While OP control loop #1338
  • GRPC interaction can be used between Python pipelines #1358
  • Python Pipeline supports Proto structure Tensor data format interaction #1369, #1384
  • Python Pipeline only gets the fastest pre-OP results #1380
  • Python Pipeline supports LoD type input #1472
  • Cube service adds python http request sample #1399
  • Cube service adds a tool to read RecordFile #1336
  • Cube-server and Cube-transfer online deployment optimization #1337
  • Delete multi-lang related code #1321

Documentation and example changes

  • Modify the Doc directory structure and add subdirectories #1473, #1475
  • Move python/examples path to parent directory, and redesign directory #1487
  • Modify the doc file name #1487
  • Add C++ Serving Benchmark #1176
  • Add one PaddleClas/DarkNet encryption model example #1352
  • Add Model Zoo doc #1492
  • Add Install doc #1473
  • Add Quick Start doc #1473
  • Add Serving Configure doc #1495
  • Add C++_Serving/Inference_Protocols_CN.md#1500
  • Add C++_Serving/Introduction_CN.md#1497
  • Add C++_Serving/Performance_Tuning_CN.md#1497
  • Add Python_Pipeline/Performance_Tuning_CN.md#1503
  • Update Java SDK doc #1357
  • Update Compile doc #1502
  • Update Readme doc #1473
  • Update Latest_Package_CN.md#1513
  • Update Run_On_Kubernetes_CN.md#1520

Bug fix

  • Fix one memory pool usage problem #1283
  • Fix the wrong locking problem in multi-threading #1289
  • Fix the problem of C++ Serving multi-model combination #1294
  • Fix the problem of out of bounds when the requested data is large #1308
  • Fix the problem of inaccurate prediction results of the Detection model #1413
  • Fix the wrong setting of use_calib #1414
  • Fix the problem of incorrect C++ OCR example results #1415
  • Fix the core problem of parallel reasoning #1417

项目简介

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)

🚀 Github 镜像仓库 🚀

源项目地址

https://github.com/PaddlePaddle/Serving

发行版本 14

Release v0.9.0

全部发行版

贡献者 36

全部贡献者

开发语言

  • C++ 51.6 %
  • Python 27.0 %
  • Shell 8.0 %
  • CMake 6.0 %
  • Go 4.4 %