问题修复

  • 修复 trace 中的静态推导
  • 修复 trace 中io和opr的执行顺序避免死锁
  • 修复cd4在convolution中group=1时候转换错误以及elemwise转换错误
  • 修复fuse conv bias optpass中bias shape固定时,inference时shape匹配不上问题
  • 修复 mkl elemwise计算 LOG是异常
  • 修复load_and_run --input对于单输入未指定正确输入名字时的错误处理

新功能

  • 支持 scalar 类型的 Tensor
  • 使用async_level来控制异步执行中错误检查
  • 增加 group_norm、 instance_norm、layer_norm、conv1d、remap等算子
  • GradManger.attach 对 Tensor 使用 weak reference
  • 支持分布式量化训练
  • 支持inference weight-preprocess 释放原来weight的内存
  • jit mlir后端全面支持Elemwise, DimShuffle
  • 增加cv DCT 算子支持

性能优化

  • 减少了 batch normalization、elementwise、broadcast等算子在host CPU上的耗时
  • 优化了 optimizer 的 step() 性能
  • 优化量化训练性能
  • 优化arm64 int8X8X16_mk4_k8x8x8 matmul 算子

兼容性破坏

Bug Fixes

  • Fixed static shape inference in trace to allow training larger models
  • Link io-opr in trace to avoid deadlock
  • Fixed cd4 conversion error when group=1 in convolution and some cases in elemwise
  • Fixed the problem of shape matching when the bias shape is fixed in fuse conv bias optpass
  • Fixed LOG mode of elemwise in MKL calculation abnormal
  • Fix the error processing when load_and_run --input does not specify the correct input name for a single input

New Features

  • Support representation of scalar-type tensor
  • Enable users to control error check during asynchronous execution by parameter async_level
  • Add operators including group_norm, instance_norm and layer_norm, conv1d and remap
  • Use weakref for GradManger.attach
  • Support distributed quantize aware training
  • After weight preprocessing, release the original weight memory during inference
  • Support Elemwise and DimShuffle operators in JIT of mlir backend
  • Support DCT operator in cv

Optimization

  • Reduce host overhead for operators including batch normalization, elementwise, and broadcast
  • Improve performance of the step function in optimizers
  • Improve performance of quantization training
  • Optimize arm64 int8X8X16_mk4_k8x8x8 matmul operator

Compatibility violation

  • No

项目简介

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

发行版本 15

MegEngine v1.4.0

全部发行版

贡献者 13

全部贡献者

开发语言

  • C++ 79.8 %
  • Cuda 13.8 %
  • Python 4.9 %
  • C 0.9 %
  • CMake 0.5 %