新功能

  • IndexingMultiAxisVec添加bool类型支持
  • 添加windows和macOS打包功能
  • 添加adaptive pooling算子
  • mlir jit添加llvm-lit支持
  • 添加weights preprocess开关控制是否在inference阶段缓存weights 预处理后的结果提升性能
  • 添加MGB_USE_ATLAS_ASYNC_API宏控制开启异步API调用
  • 按用户使用习惯 更新 dtype promotion的规则
  • 在broadcast增加参数检查
  • device的__repr__方法增加物理单元信息

问题修复

  • 修复cpuinfo在arm linux下的编译warning
  • 修复Imperative Runtime在出错情况下由于没有正确set exception导致卡住的问题
  • 修复dump_with_testcase脚本在打开--output-strip-info选项时如果文件不存在会crash的问题
  • 修复NCHW->NCHW4的pass对float类型的处理
  • 在float转io16c32的pass中添加对deconv的处理
  • 由于xcode的问题,在ios下关闭thread_local的支持
  • 修复多机训练中,ParamPackSplit出现的refcnt计数问题
  • 修复多线程下多个模型使用同一个compnode且开启record功能导致出错的问题
  • 修复NCHW→NCHWxx的pass在处理conv_bias 且bias为空情况下的问题
  • 修复jit.trace发生错误后使得后续trace完全不可用的问题
  • 修复bool.sum()
  • 修复graph binding 错误处理导致graph被错误回收的问题
  • 修复jit.trace 对topk/warp/nms等op的处理
  • 修复LocalConv2d算子对group的支持
  • 修复 dump 中使用 optimize_for_inference 时的bug
  • 修复NMSKeep、topk 、warp_perspective 被 trace 时的 bug

兼容性破坏

  • 调整部分Function API的命名、参数或 import 路径,删除重复API

New Features

  • Add bool dtype support for IndexingMultiAxisVec
  • Add windows and macOS packaging capabilities
  • Add adaptive pooling opr
  • Add llvm-lit support for jit mlir
  • Add the weights preprocess option to control whether the results of weights preprocessing are cached in the inference phase to improve performance
  • Add macro MGB_USE_ATLAS_ASYNC_API to control whether enables asynchronous API calls
  • Update dtype promotion rule
  • Add parameter check in tensor broadcast method
  • Update device repr method to show physical placement

Bug Fixes

  • Fix cpuinfo compiling warning under arm Linux
  • Fix the stuck problem due to incorrect set exception in case of error
  • Fix crash when enabling --output-strip-info in dump_with_testcase if the file does not exist
  • Fix nchw → nchw4 pass when handling float type
  • Handle deconv opr in the pass from float to io16c32
  • remove thread_local support in ios due to Xcode problem
  • Fix refcnt counting problem in ParamPackSplit during multi-machine training
  • Fix the crash problem that multiple models use the same compnode and enable record function under multithreading
  • Fix nchw → nchwxx pass in processing conv_bias opr in the case of bias being empty
  • Fix jit.trace when an error occurs that may make subsequent trace completely unavailable
  • Fix bool.sum()
  • Fix graph binding error handling that caused the graph to be malcollected
  • Fix topk/warp/nms op when using jit.trace
  • Fix group support for local conv2d operator
  • Fix bug of optimize_for_inference in dump
  • Fix bugs of NMSKeep, topk, warp_perspetive during trace

Compatibility violation

  • Adjust names, paramters or import path of some functional API; delete dumplicated API

项目简介

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

发行版本 15

MegEngine v1.4.0

全部发行版

贡献者 13

全部贡献者

开发语言

  • C++ 79.8 %
  • Cuda 13.8 %
  • Python 4.9 %
  • C 0.9 %
  • CMake 0.5 %