v0.5rc1

Changelog

v0.5rc1 (13/09/2021)

Highlights

First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
nn.Module for eager execution
nn.Graph for lazy execution
DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

[feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
[enhancement][python][interface] Add GroupNorm #5175
[enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
[feature][eager][op][python][interface] Add deconv cpu impl #5224
[bug][eager][api][python][interface] Fix acosh bug #5221
[feature][eager][op][python][interface] Dev modules ctc loss #5168
[bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
[eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
[feature][eager][python][interface] Add meshgrid module #5205
[enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
[eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
[enhancement][eager][python] refine pow module and its test #5319
[enhancement][eager][op][python] Add triu op #5329
[enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
[bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
[bug][python][interface] tensor slice assign supports broadcasting #5344
[enhancement][op][python] add cpu group conv logic #5314
[enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
[enhancement][build][python] Remove ONNX from setup py #5297
[enhancement][python][interface] [add] zeropad2d #5278
[feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
[feature][python][interface] integrate nn.image.flip #5411
[bug][python] Fix issues in point of MultiClientSession #5469
[enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
[enhancement][python] Add in_top_k function #5428
[enhancement][python] Dev add docstring #5449
[feature][api][python] MultiClientSession #5407
[documentation][python] remove --user #5431
[feature][python][interface] nn.Graph python #5309
[feature][python][interface] Fea/nn graph/graph name #5413
[bug][python][interface] rm nn.Graph.train #5424
[op][documentation][api][python][interface] add bernoulli module #5353
[enhancement][python] flow.S/B/P #5306
[enhancement][documentation][python] Add instruction on upgrade pip #5400
[enhancement][python] Rm oneflow export and experimental #5589
[bug][python] Fix nn.graph.utils module conflict #5598
[feature][ci][python] Update autotest framework #5520
[enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
[enhancement][api][python] del default env init #5537
[enhancement][python] Fix single client using same glog file #5535
[bug][api][python] Fix Session TryClose #5531
[enhancement][feature][python] split vector-matrix norm #5478
[feature][eager][op][python][interface] Add more upsample kernel #5382
[enhancement][feature][test][python] add torchstyle unittest #5489
[feature][system][python] nn.Graph with training #5662
[enhancement][feature][python] Fea/nn graph/block proxy func #5727
[enhancement][api][python] consistent_tensor_to_api #5703
[feature][eager][op][python] Dev Align torch avgpool #5610
[enhancement][python] fix circular deps of sbp python module #5706
[documentation][python] [part5]Remove singleclient outdated api #5674
[enhancement][python] [part4]Remove singleclient outdated api #5672
[bug][op][python] remove outdated code in conv3d #5696
[enhancement][test][python] enlarge tolerance of dataloader test #5689
[enhancement][test][python] add autotest for some math ops #5646
[feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
[enhancement][python] Add clip_grad_norm #5299
[purge][python] Remove Single-Client API in oneflow default python #5827
[bug][python] Fix ddp grad size #5834
[enhancement][feature][python] Dev RMSprop graph conf #5768
[enhancement][purge][eager][python] remove scale arg in optimizer #5821
[enhancement][feature][python] graph/block io check #5803
[enhancement][feature][python] Dev adam graph conf #5709
[purge][python] [part10]Remove singleclient outdated api #5756
[feature][api][python] better repr of nn.Graph for debug #5762
[bug][python] fix weight decay in RMSprop #5755
[purge][python] [part9]Remove singleclient outdated api #5752
[purge][python] [part8]Remove singleclient outdated api #5750
[documentation][python] add first batch of methods in oneflow.nn.functional namespace #5693
[purge][python] [part6]Remove singleclient outdated api #5704
[bug][python] use default_generator.seed() as random_seed in init #5721
[bug][system][python] ddp broadcast params and buffers #5913
[enhancement][test][python] Add consistent tensor requires grad test #5925
[bug][python] wrap flow.nn.init.* with flow.no_grad() #5932
[feature][api][python][interface] add clip_grad to optimizer #5817
[enhancement][ci][op][test][python] add randperm with test and docs #5680
[feature][api][python] Fea/nn graph/ lr_schedule(and cosine lr_sch) and opt_group #5846
[bug][python] fix bug of SyncOnMasterFn atexit #5909
[purge][python] Delete single client nn modules #6061
[enhancement][python] Move framework.distribute to env #6022
[bug][python] skip sync when abnormally exiting #6025
[feature][python] Fea/nn graph/warmup amp config #5969
[documentation][python] add optimizer api docs #6131
[documentation][python] add_tensor_api_doc #6127
[bug][python] Fix test_grid_sample.py and test_affine_grid.py threshold #6125
[documentation][api][python] add doc of graph #6093
[bug][python] Fix make of_format fail in ubuntu #6120
[feature][api][python][interface] Fea/graph helpers #6088
[enhancement][eager][python][interface] Use flow.randint in dataloader #6086
[feature][eager][api][python][interface] Import oneflow as torch #6076
[enhancement][test][api][python][refactor] rename OfrecordReader to OFRcordReader #6090
[purge][python][need-single-client-tests] Delete single client nn modules #6082
[enhancement][python] flow.load tolerates FileNotFound fault #6083
[feature][python] Fea/pipeline in graph #6105
[enhancement][test][python] graph activation checkpointing #6192
[enhancement][feature][op][python] rnn test #6165

New in Ops:

[enhancement][op][api][refactor] [Functional] Part2: Add partial unary and math functional apis #5218
[enhancement][bug][op][interface] Refine deconv kernel #5229
[enhancement][op][api][interface] add ReflectionPad2d #5172
[feature][eager][op][api][interface] crossentropyloss and nllloss support ignore_index #5195
[feature][eager][op][api][interface] Yejiaojiao/dev bcewithlogitsloss #5173
[bug][ci][op] Dev user op set default is_dynamic #5223
[enhancement][op] add magic method for pow #5199
[enhancement][op][interface] add cpu version of upsampling #5194
[enhancement][bug][op][api][interface] add ReplicationPad2d #5148
[feature][eager][op][api][interface] add kldivloss module #5155
[feature][eager][op][documentation][build][api][interface] Add floor module and the corresponding testcases #4964
[enhancement][feature][op] Dev conv1d module #5280
[enhancement][op] Add ctc_greedy_decoder op #5294
[enhancement][op][system] Dev remove default grad func #5320
[enhancement][op][system] Add pad grad func. #5354
[enhancement][op][system] Add gradient funcs. #5348
[feature][purge][bug][eager][op][interface] fix upsample nearest bug #5347
[enhancement][op][system] [Functional] Part7: Migrate pooling ops #5253
[enhancement][op] nvjpeg hardware acc #5240
[enhancement][feature][ci][eager][op][api][interface] Add bmm module #5334
[enhancement][eager][op] Dev image decode eager #5333
[enhancement][op] Optimize softmax warp impl #4977
[enhancement][eager][op] Dev tensor buffer eager #5317
[enhancement][op][api][refactor] [Functional] Part6: Migrate conv op #5252
[enhancement][eager][op] Dev sort eager #5284
[enhancement][bug][op][api] fix bceloss bug in default weight and reduction #5303
[bug][eager][op] remove redundant assert and check #5264
[enhancement][bug][ci][op] fix bceloss bug about weight #5269
[enhancement][op][api][refactor] [Functional] Part5: Migrate nn ops #5249
[enhancement][eager][op] Dev argsort eager #5273
[enhancement][op][api][refactor] [Functional] Part4: Migrate array ops #5247
[enhancement][op][api][refactor] [Functional] Part3: Migrate binary and activation ops #5246
[bug][ci][op][test] Dev fix rmsprop ci fail #5481
[enhancement][op] add inplace method: Tensor.sin_ #5471
[bug][op] hotfix image_batch_align #5461
[enhancement][eager][op][interface] Dev maxpool series op 123d #5244
[bug][op] fix pool gpu kernel #5446
[feature][eager][op][api][interface] add pixelshufflev2 module #5383
[enhancement][feature][ci][eager][op][documentation][api][interface] Add flow xxx and tensor xxx autotest #5386
[enhancement][feature][eager][op][api][interface] Modules chunk #5324
[enhancement][eager][op] add image normalize for eager #5402
[enhancement][eager][op] Dev batch align module #5401
[enhancement][eager][op] add coco reader module #5391
[enhancement][wip][op] Restruct Elementwise kernel #4130
[bug][op] Fix DecodeRandom reuse mem #5606
[enhancement][op] Align pytorch maxpool #5525
[enhancement][bottleneck][eager][op][api] implementation of constantpad-3d op #5529
[enhancement][eager][op] Add scale size for resize #5509
[enhancement][op][api][refactor] Dev optimize tensor setitem #5501
[enhancement][op] register uint8 dtypeto support dataloader #5499
[enhancement][op] Add unique.cuh #5487
[enhancement][op][api][interface] Dev ofrecord auto truncating #5412
[feature][op][system][interface] Feat: LazyInterpret::ApplyImpl support SourceUserOpExpr and Copy #5711
[enhancement][op][interface] Dev logical_and/or modules #5636
[enhancement][op] support any number positional arguments for ones and zeros op #5698
[enhancement][feature][eager][op] Add conv3d Module #5327
[feature][eager][op][api][interface] add batchnorm3d module #5631
[bug][eager][op] fix reduce min max backward bug #5651
[enhancement][op] Debug dim scatter #5371
[enhancement][op][interface] Dev eye #5583
[enhancement][eager][op] Dev minimum maximum #5576
[enhancement][op] Restruct activation grad op #5669
[enhancement][feature][eager][op] Rewrite activation function #5465
[bug][op][documentation] add oneflow.cat for documentation #5621
[enhancement][op] Lcy logsoftmax #5746
[feature][op][need-simple-ci] Feat empty op #5659
[enhancement][eager][op] Dev split #5714
[enhancement][op][interface] add index_select op #5661
[bug][op] fix nvjpeg hw acc #5851
[enhancement][op] Remove move in conv_cudnn #5828
[enhancement][op][interface] Dev logical_xor module #5694
[bug][eager][op] fix squeeze #5808
[enhancement][op] Get parallel_id and parallel_num through rank and world size in DDP #5717
[bug][eager][op] delete interpolate int type #5805
[bug][op] Fix bug in scatter #5743
[enhancement][op] Refactor: remove module not required, call function directly #5754
[enhancement][op] Remove modules not required(tan, erfc, log1p, scatter_nd) #5791
[enhancement][op] Refactor scatter, clamp and pow in cpp instead of in python #5715
[enhancement][op] Rm useless code in gather files #5687
[enhancement][eager][op] change flip_code to scalar #5786
[enhancement][bug][op][interface] fix upsample bug #5753
[bug][op][interface] Quick fix Lazy nn.Graph input/output OpConf.BlobConf.is_dynamic #5767
[enhancement][bug][eager][op] fix argwhere 0-dim bug #5760
[enhancement][eager][op] delete unused code #5744
[feature][op] Export fused_scale_tril op #5933
[bug][op] Fix backward bug in 3d #5908
[bug][op] Fix one_hot api limit #5927
[enhancement][eager][op] Dev where scalar #5797
[bug][op] fix grad error #5914
[feature][bug][op] Fix inplace op circle reference bug #5910
[enhancement][op] Move the judgment content to c++， And add scalar fmod #5854
[enhancement][op] Support combined_margin_loss op in flow.nn.modules #5830
[enhancement][op][api][interface] functional_one_hot #5315
[enhancement][op] Dev scalar op #5778
[bug][eager][op] fix gather kernel 0 shape #5888
[enhancement][op] add l2_normalize for mutl-client interfaces #5859
[feature][op] Export function softmax_cross_entropy #6056
[enhancement][op] Add int attr for functional adaptive average pool #6059
[enhancement][op][interface] dev full op #5955
[bug][eager][op] fix 0dim inplace add #6029
[feature][op][system][interface] Feat: nn.Graph image gpu decoder #6014
[enhancement][op][interface] dev optim_optim_lr_scheduler_multisteplr #5975
[enhancement][op] NopKernel #6035
[enhancement][eager][op][api] Dev tril op #6005
[enhancement][op] dev unfold and fold #5675
[enhancement][op] ResNet CUDA Graphs #6018
[enhancement][feature][op] add broadcast pow #6013
[enhancement][op][interface] init of op diag #5298
[op][documentation][api] Fix api document bug #6009
[enhancement][op] Dev fused functional #5954
[bug][op][build] Add nvcc flag -Werror cross-execution-space-call #6002
[bug][op] Fix Normalization grad function #5993
[enhancement][feature][eager][op][test][interface] Add fused self attention #5966
[enhancement][bug][ci][eager][op][api][interface] Try to fix var bug #5973
[enhancement][feature][eager][op][interface] add prod op #5867
[enhancement][eager][op][api] add glu op #6065
[enhancement][op] Align Torch.nn.functional poolXd #6184
[bug][eager][op] fix backward index for gamma beta #6149
[bug][op][system] Fix BroadcastMatmulGrad bug #6168
[enhancement][op][api] Add Int support for functional.avg/maxpool #6174
[bug][eager][op][api][interface] align dropout api name with pytorch #6170
[enhancement][op] support inplace operation for hardsigmoid #6137
[enhancement][bug][op] Fix do bias correction in Adam/AdamW #5960
[bug][eager][op][api][interface] fix repeat 0-dim tensor bug #6150
[enhancement][bug][op] Fix select_first_grad bug #6142
[bug][ci][eager][op][documentation][interface] Add clipgrad doc and contiguous #6130
[bug][op] Fix eager optim dynamic attr bug #6111
[enhancement][op] Support grid_sample and affine_grid operator #6038
[op][documentation] Export apis for documentation #6068
[enhancement][feature][bug][ci][eager][op][documentation][interface] transfer python function to c++ method #6114
[op][documentation] Dev functional batch_gather #6233
[enhancement][op][test] fix cross_entropy_loss and its test #5799
[bug][op] Use attr nd_sbp to check consistent #6222
[enhancement][op] Dev fused bn functional #6077
[enhancement][op] support default value in intlist #6201
[bug][op] fix sparse_softmax get_nd_sbp #6203
[bug][op] Fix bug in model fused update #6197
[enhancement][op][system][refactor] Optimize tensor getitem. #5433

New in Eager:

[enhancement][eager][interface] Reconstruct module files #5251
[bug][eager][documentation][interface] Fix conv module bug #5245
[bug][ci][eager][interface] Fix bce withlogitloss ci error #5237
[feature][eager][api][interface] module BCELoss #5144
[enhancement][feature][eager][api][interface] Dev norm op #5178
[enhancement][bug][eager] Fix stack module #5222
[enhancement][feature][eager] Support different dtype of equal module #5214
[enhancement][bug][eager][documentation][api][interface] Add nllloss backward #5210
[enhancement][eager][api][upload-core] Decouple FileSystem and IOConf #5162
[enhancement][ci][eager] Set lower precision avoid ci failing #5200
[eager][documentation] Add hint when apply FunctionNode second time #5369
[enhancement][feature][bug][ci][eager][documentation][api] Fix upsample bilinear bug #5366
[bug][eager] Fix not contiguous ndarray to tensor bug #5351
[enhancement][eager][system] Infer consistent tensor meta #5118
[feature][eager] Feat graph autograd engine #5296
[enhancement][eager][interface] Dev type as module #5349
[feature][eager][documentation][api][interface] Add new ones module #5342
[enhancement][bug][eager] Fix logical slice assign dtype #5339
[bug][ci][eager][documentation][api][interface] Fix where module bug #5300
[bug][ci][eager][documentation][api] Fix l1loss ci error #5307
[enhancement][bug][eager][documentation][api][interface] Qi's First Edit of deleting "print" and ".numpy" #5129
[feature][eager][refactor] Separate autograd meta to tensor #5267
[feature][eager][api][interface] add tile module #5234
[enhancement][eager] Release lambda function to reuse tensor memory #5266
[feature][bug][eager][documentation] Fix default value not set bug #5483
[enhancement][eager][interface] [Add] gather_nd scatter_nd #5422
[enhancement][bug][eager] fix param #5473
[bug][eager] Fix Tensor.grad setter bug #5462
[enhancement][eager] Rename now_grad_arg to current_grad #5466
[eager][test][documentation][interface] Add autotest part1 #5436
[enhancement][eager] Use functional copy instead of op_builder #5460
[bottleneck][bug][eager][interface] fix -1 index not support bug #5448
[bug][ci][eager][documentation][api] Fix concat backward bug #5443
[enhancement][bug][ci][eager] Add autograd engine warning #5444
[feature][eager][api][interface] Smoothl1loss #5256
[enhancement][bottleneck][eager] remove device dtype params #5434
[bug][ci][eager][documentation][interface] Delete maxpool failed test #5409
[enhancement][eager][api] Add tensor grad assginment #5379
[enhancement][bug][eager] fix-abs #5398
[enhancement][bug][eager][interface] Fix bn track running stats #5393
[enhancement][bug][eager] Support uint dtype of constant op #5396
[enhancement][bug][eager][documentation][interface] Delete useless code upsample #5392
[enhancement][ci][eager][interface] add flow.view #5301
[enhancement][bug][ci][eager][api][interface] Add masked select module #5356
[bug][eager][interface] Fix batchnorm backward bug #5602
[enhancement][eager] Support weight_dacay(l2 actually) #5587
[feature][eager][documentation][api] Add new autotest #5588
[enhancement][eager][documentation][api] Dev fmod #5404
[feature][eager] Support inplace add #5432
[feature][eager][interface] Feat tensor stride property #5543
[enhancement][feature][eager][documentation][api] Add flip module #5541
[feature][eager] Feat module repr #5486
[enhancement][bottleneck][bug][eager][interface] Fix maxpool1d params #5493
[enhancement][feature][eager][interface] Dev flow.utils.data part1 #5406
[bug][eager][api] Fix tensor getitem bug #5474
[enhancement][eager][need-simple-ci] export datasets interface #5691
[enhancement][eager][system] rebase #5601
[enhancement][eager][test] added nn.RecordBytesDecoder with its test #5475
[enhancement][feature][eager][need-simple-ci] 0-dim tensor support #5552
[enhancement][bug][eager] rewrite slice_update backward #5677
[enhancement][bug][eager][interface] align view input style with torch #5676
[enhancement][eager][interface][need-simple-ci] add autotests for modules #5666
[enhancement][bottleneck][eager][interface] Dev constantpad1d op #5579
[enhancement][eager][api][interface] Restruct MathOps AutoTest #5654
[enhancement][bug][ci][eager] Fix flip bug #5657
[bug][eager][api][interface] Fix expand module bug #5650
[enhancement][bug][eager][documentation][api] Fix repeat bug #5633
[enhancement][eager][test][api][interface] Add new autotest #5617
[enhancement][eager][api][interface] Dev flow.utils.data part2 #5500
[enhancement][bug][eager] make setitem device match #5835
[bug][eager][api][interface] align reshape input param with pytorch #5804
[feature][bug][eager][api] Align where op with torch #5850
[enhancement][bug][eager][api] Restruct prelu op #5829
[bug][eager][need-simple-ci] fix pooling ceil_mode bug #5818
[enhancement][eager] stateful local kernel supports consistent #5789
[bug][eager][api][interface] Fix argwhere bug #5816
[enhancement][eager][documentation][api] dev-nonzero #5809
[enhancement][feature][eager][api] Add fake quantize op #5690
[enhancement][bug][eager][documentation][api] Add api #5663
[enhancement][eager] Refactor consistent infer result #5790
[bug][eager][need-simple-ci] skip dataloader test #5780
[bug][eager][need-simple-ci] fix 0-dim tensor.fill_ #5771
[enhancement][eager] Cpu mpi broadcast #5726
[feature][eager] Feat grad mode classes #5956
[enhancement][bug][eager] fix wrong names #5951
[enhancement][eager][system] Local dep object pool #5953
[enhancement][eager][interface] rename OpExprInterpState to AutoGradCaptureState #5918
[bug][eager] Fix linear bug #5945
[bug][eager] Fix tensor_meta update bug #5924
[enhancement][eager] use flow.randperm #5928
[enhancement][eager] consistent init/save/load #5896
[enhancement][bug][eager][documentation][interface] Restruct sort and argsort op #5911
[enhancement][bug][eager][interface] Try to fix the problem that the insightface cannot converge。 #5906
[enhancement][bug][eager][interface] Add autotest #5899
[enhancement][eager] The scheduler thread joins worker threads #5893
[enhancement][eager] Bugfix async callback #5881
[feature][eager] Feat tensor to bool #5836
[bug][eager] Remove inplace broadcast_add #5551
[enhancement][eager] Broadcast consistent shape and dtype #5784
[enhancement][eager] Fix optimizer list parameters input bug #5848
[enhancement][eager][interface] Dev flow.utils.data part3 #5644
[enhancement][eager][api] Normalize naming of modules #6066
[enhancement][feature][eager][api][interface] add truncnormal #6051
[enhancement][bug][eager] AutoMatedTest support test module.parameter.grad #6043
[enhancement][feature][bug][eager] add module call kwags #6069
[enhancement][eager][api][interface] add tensor.item tensor.tolist #6021
[enhancement][eager][api][interface] Export pool ops api #6047
[enhancement][bug][eager][test][documentation][interface] Add more autotest sample #6039
[enhancement][bug][eager][system] disable cuda_h2d stream #6020
[feature][eager][test][api][interface] Add autotest codegen #6019
[feature][eager][documentation] Refactor cosine lr scheduler #6000
[enhancement][eager][interface] tensor.cpu/tensor.cuda #5894
[enhancement][eager][api] Support consistent_tensor.to(dtype) #5991
[bug][eager][interface] remove redundant codes in ModuleDict #5961
[bug][eager] Fix LayerNorm check bug #6196
[enhancement][eager][api] Change dropout api #6182
[enhancement][good for pr][eager][api][interface] add: test convert dependency #6023
[enhancement][bug][eager][interface] Fix autotest codegen bug #6171
[bug][eager] restore instr_local_dep_object_pool_size for nccl #6160
[enhancement][eager][api][interface] Aligin pooling op functional api names with torch #6163
[feature][bug][eager][api][interface] delete file #6162
[bug][eager] Fix optim load_state_dict bug #6152
[enhancement][eager][api] add is_training to dropout functor #6148
[enhancement][eager] Decompose nd sbp boxing #5800
[enhancement][eager] support consistent_tensor.to(copy=True) #6122
[feature][eager] Static grad scaler #6135
[bug][eager] Fix LayerNorm expr bug #6121
[bug][eager][api] move numpy c api init in numpy.cpp, make np array contiguous before copying #6117
[enhancement][eager][refactor] Remove params from ParamGroup getitem #6096
[enhancement][feature][eager] Support tensor and optimizer serialization #6087
[enhancement][bug][eager] fix bug about tensor str in nonsymmetric cast and getitem in consist… #6239
[enhancement][eager] Cpu all reduce #5849
[feature][eager] Support assign copy interface #6228
[enhancement][eager][api][interface] Dev reconstruct pad ops #6223
[enhancement][eager][api][interface] support flow.cuda.is_available #6124
[bug][eager] make flow._C.local_all_reduce sync lanuched #6175
[enhancement][eager] Rename flow to oneflow in user hint #6190
[bug][eager][tooling][test][api][interface] Autotest generate input tensor #6206
[enhancement][eager] consistent tensor zeros_() #6202
[enhancement][eager] Cpu mpi #5865

Build enhancements:

[bug][build] Fix GRPC compilation failure on CMake 3.20 #5255
[bug][build] Refine header file copy #5254
[bug][build] Fix older version CMake doesn't support multiple targets in CLI #5248
[bug][build] Turn off NCCL_STATIC/CUDNN_STATIC when CUDA_STATIC is OFF #5243
[feature][build] Fix support for Ninja and add Ninja build in Simple CI #5236
[enhancement][build] Add cmake option CUDA_STATIC #5164
[bug][build] Fix protobuf debug postfix #5233
[enhancement][ci][build] Move default third party dir into build dir #5230
[enhancement][build] Refine protobuf cmake #5216
[enhancement][ci][build] Remove transport test main #5215
[enhancement][ci][build] Speedup opencv build #5213
[enhancement][build] Support clang #5015
[enhancement][documentation][build] Add prefix when creating git archive #5201
[enhancement][build] Add cmake option NCCL_STATIC #5160
[enhancement][build] Refine CMake CUDA version handling #5192
[enhancement][build] Use clang plugin to check Maybe variables are used #5358
[enhancement][build] Add BUILD_BYPRODUCTS for ExternalProject_Add #5316
[enhancement][build] Add cmake init cache to simplify user onboarding #5311
[feature][bug][build] Fix macOS support and run macOS build in Simple CI #4947
[enhancement][build] flatbuffers use mirror #5295
[enhancement][build] Don't build test by default #5302
[enhancement][build] Prevent building from scratch when toggle flag BUILD_GIT_VERSION #5259
[enhancement][build] Refine gRPC, glog, gflags cmake for conda #5276
[feature][build] Support XLA with CPU-only #5260
[enhancement][ci][onnx][build] Remove ONNX from CI #5257
[enhancement][build] Refactor build_wheel to support oneflowinc images #5427
[enhancement][build] Add arg skip_audit in build wheel #5423
[bug][build] hwloc disable shared #5388
[documentation][build] Update readme for autoconf and libtool #5376
[enhancement][build] remove dir python and compatible_single_client_python #5609
[bug][build][system] Fix pyyaml version #5594
[enhancement][ci][build] force release flags #5574
[bug][build] prevent endless loop #5534
[enhancement][build] Support sccache #5528
[enhancement][build] Add definition for CMAKE_BUILD_TYPE and print cmake_build_type in oneflow doctor #5505
[enhancement][ci][build][need-simple-ci] Fix macOS for recent changes #5705
[bug][build] fix return type error on gcc 4.8.5 #5660
[enhancement][build] Check CMAKE_BUILD_TYPE #5656
[enhancement][build] add -Werror=return-type #5655
[enhancement][build] Clean and fix for new py dir #5618
[enhancement][build] cmake: disable array-bounds check & treat warnings as errors for pyextobj and oneflow_internal & fix warnings #5838
[bug][build] set CMAKE_BUILD_TYPE to Release if undefined #5842
[enhancement][build][need-simple-ci] Fix all warnings & Add option TREAT_WARING_AS_ERROR to cmake #5751
[enhancement][build] add CMAKE_INTERPROCEDURAL_OPTIMIZATION in fast cmake cache #5970
[enhancement][build] add clang tidy target #5957
[bug][build] cmake: fix cmake cache args in opencv #5959
[enhancement][build] Add cmake option USE_SYSTEM_NCCL #5897
[enhancement][build] cmake: include third party headers as system headers to avoid warnings #5879
[enhancement][build] Ignore opencv-python on machine aarch64 #5884
[enhancement][build] enable CMake first class cuda support #5858
[bug][build] Fix compile warning (strict-aliasing) #5872
[enhancement][bug][build][need-simple-ci] Upgrade gtest and fix some errors raised by clang #6079
[bug][ci][build] cmake: fix ninja build in CI #6072
[bug][build] fix files not actually removed when building for multiple python versions #6060
[bug][build][api] functional_api: fix build error in mac os #6010
[bug][build][need-simple-ci][need-single-client-tests] Fix recompile from scratch #6036
[bug][build] Turn on NVCC's warnings #6011
[bug][build][need-single-client-tests] fix bundle .so of other python version #6034
[bug][ci][build][need-single-client-tests] use copy_all_files_in_dir to replace copy_files #6033
[enhancement][build] check compiler version in cmake #6026
[enhancement][build] Add CUDA_NVCC_THREADS_NUMBER #6017
[enhancement][build][need-simple-ci] optimize of_include_copy #5978
[enhancement][ci][build][need-single-client-tests] CI: remove -DTREAT_WARNINGS_AS_ERRORS=OFF #6008
[enhancement][build][xla] xrt: fix all warnings #5915
[enhancement][build] Prevent opencv compile failure with std 17 #5997
[enhancement][build] Use bundled cub #5998
[enhancement][ci][build] update clang tidy diff warnings-as-errors option #5989
[enhancement][build] Update run_clang_tidy.py to set return code and add warning-as-errors #5977
[enhancement][build] check: fix clang-tidy-diff commands #5972
[bug][build] Suppress NVCC warning #177-D #6094

XLA enhancements:

[bug][xla] Make the blob header memory aligned. #5286

System:

[enhancement][system] Refactor Memory Zone #5072
[enhancement][system] Add interface InferContext::OutputTensorDesc #5219
[bug][system] Lazy construct functor to make sure that the operators has already been registered. #5225
[enhancement][system] Refactor infer ctx output isdynamic #5220
[enhancement][system] Refactor infer ctx input isdynamic #5211
[enhancement][system] Wake up the heartbeat thread immediately #5081
[enhancement][system] Fix xla test case fail #5203
[enhancement][system] Add interface InferContext::InputDType #5153
[purge][system] delete const_cast in Output #5196
[feature][system] Add hwloc for topology detection #5291
[enhancement][system] fix registry may segment #5336
[enhancement][system] Use functional api instead of op_expr_helper::XXXOp. #5364
[enhancement][system] move btob to op #5274
[documentation][system] Add Latest News section in README #5361
[enhancement][bug][system] fix dropout module: return directly if not training #5346
[bug][system] add missing JUST #5357
[documentation][system] Add more communication outlets on README #5359
[enhancement][feature][system] CommNet dynamic register memory #5281
[enhancement][system] Use symbol device #5341
[enhancement][system] fix multithread bug in env #5283
[bug][system][api] fix bug in cfg_replacement #5335
[bug][system] Fix create log directory thread-unsafe #5326
[bug][system] fix_bug_in_make_parallel #5328
[enhancement][system][cfg] replace train_conf, job_conf using cfg::xx #5263
[enhancement][system][quantization] support tensorrt in qat #5287
[enhancement][system][api] Export functional apis for oneflow.experimental. #5313
[enhancement][system] fix bug check between cfg enum and proto enum #5285
[enhancement][system] replace CHECK_EQ using CHECK_EQ_OR_RETURN #5279
[enhancement][system] Refactor SbpXXX to cfg::SbpXXX #5120
[enhancement][system][api] add detach for LazyMirroredtensorImpl #5270
[enhancement][system] shorten XXIsDynamic4ArgNameAndIndex to be xxIsDynamic #5265
[enhancement][system][cfg] job_config to cfg #5235
[feature][system] Multi-Client LogicalRun degenerate to PhysicalRun #5479
[enhancement][system] fix ConstructOp without JUST #5480
[enhancement][system] Output arg modifier return maybe part 1 #5451
[feature][system][interface] Fea/nn graph/graph build ctx #5420
[enhancement][system] Throw exception if check failed #5457
[feature][system] multi client launch #5372
[enhancement][system][api] Optimize reduce mean #5452
[enhancement][system] export Tensor only to python #5440
[enhancement][system] Output arg modifier return maybe part_0 #5447
[enhancement][system] ThreadMgr support AddPlan #5450
[enhancement][system] Refactor infer ctx input tensordesc #5226
[enhancement][system][api] instruction builder return maybe #5442
[feature][system][interface] MultiClientSessionContext #5421
[enhancement][feature][system] add launcher, update multi client launch and exit #5414
[purge][system][refactor] Remove IOConf #5419
[enhancement][system] Dev refine generator #5426
[enhancement][system] Support inplace operations #5204
[enhancement][system][refactor] Dev refactor generator #5397
[enhancement][system] Add new placement init func #5408
[enhancement][system] NNGraphIf #5387
[enhancement][system][refactor] Cast explicitily in unpack call to avoid confilt with Optional. #5380
[enhancement][system][interface] [Random Generator] Part2: Migrate functional dropout #5378
[enhancement][system] replace ForeignJobInstance using JobInstance #5374
[enhancement][system][refactor] Speedup reshape module by 5x. #5381
[feature][system][interface] [Random Generator] Part1: Dev random generator #5360
[enhancement][system] Add ONEFLOW_STREAM_CUDA_EVENT_FLAG_BLOCKING_SYNC #5612
[enhancement][system] [part2]Remove singleclient outdated api #5568
[feature][system][interface] nn.Graph call and launch impl #5580
[enhancement][system] remove outdated doctest api and "@experimental_api" #5564
[feature][system][interface] Register ForeignCallback and Watcher in Multi-Client #5591
[enhancement][system] [Part-1]remove outdated api and files of multi-client on master branch #5556
[feature][system][interface] LazyInterpret build LocalTensor if input is local #5582
[enhancement][system] add job_pass MultiClientAutoSourceAndSinkTick #5507
[feature][system] Fea/nn graph/optimizer #5533
[feature][system][interface] New/CloseRuntimeBuffers and RunLazyJob impl #5571
[feature][system][refactor][interface] NNGraph interface and implement for CompileAndRuntime #5558
[feature][system] Fea/nn graph/forward graph #5516
[enhancement][system] Lazy job stream type #5389
[enhancement][system] Refactor single client autotick #5506
[enhancement][system] replace underline using dot in single client #5547
[bug][system] fix return type #5548
[feature][system][interface] LazyInterpret for UserOpExpr #5544
[enhancement][system] Add ProfilerStart/ProfilerStop API #5542
[feature][system][interface] LazyInterpreter for FetchOutputOpExpr and set op parallel_distribution #5527
[enhancement][system] Multi client push pull #5492
[enhancement][system] registry_callback_fn return maybe #5456
[enhancement][system] bw_gen_fn return maybe #5455
[enhancement][system] gen_bw_fn return maybe #5454
[enhancement][system] Compatible single client #5417
[feature][system][interface] GlobalMultiClientEnv and refine EagerExecution #5523
[enhancement][system] Job pass maybe system #5503
[enhancement][system] Remove Plan::net_topo #5502
[feature][system][interface] LazyInterpret for FeedVariableOpExpr #5490
[enhancement][system] Input arg modifier return maybe #5453
[feature][system][interface] Fea/nn graph/block scope #5498
[feature][system] jit_fuse_cast_scale #5332
[enhancement][system] Remove obsolete Profiler #5747
[enhancement][system][api] Dev fix batch norm not stats #5733
[enhancement][system] rename rpc_token to TransportToken #5735
[enhancement][system][api] Refacotr maximum minimum py2cpp #5724
[enhancement][system] Replace piece_id with comm_net_sequence_number #5731
[enhancement][system] beautify stack frame #5686
[enhancement][system] Add env ONEFLOW_KERNEL_DISABLE_BLOB_ACCESS_CHECKER #5728
[enhancement][system] Add env ONEFLOW_THREAD_ENABLE_LOCAL_MESSAGE_QUEUE #5720
[enhancement][system][api][refactor] Refactor functional sub, mul and div apis #5713
[feature][system] ddp #5008
[enhancement][system][api][refactor] Refactor functional matmul and add apis. #5697
[bug][system] Fix ClearKV("plan") #5710
[enhancement][system] Rename cpu to async cpu #5712
[enhancement][system] Support tensor.to()/to_local() #5271
[feature][system][refactor][interface] Multi-Runtime for multi nn.Graph #5683
[bug][system][refactor] Add tag for Optional inplace constructor #5619
[enhancement][system] Move Global to env scope #5670
[enhancement][system] add JUST wrapper #5681
[enhancement][system] New sync consistent meta info #5634
[enhancement][system][refactor][interface] Refactor RuntimeCtx for multi-runtime #5664
[feature][system][interface] Feat: memory shared between EagerTensor with VariableRegst #5649
[enhancement][system] Use functional call directly instead of construct a module and then call-Add #5613
[enhancement][system] disable eager_op consistent mode #5647
[enhancement][system] add msg_penddin_list in ibverbs_qp to optimize qp_init_attr.cap.max_send_wr #5485
[enhancement][system] IBVerbsCommNet add knobs #5626
[enhancement][system] Prune python tensor #5596
[feature][system][interface] Feat: LazyInterpret infer op / tensor ParallelDescScope #5625
[enhancement][system] Replace src tick with with wait and send ids #5603
[enhancement][system] Support symbol placement type in functional. #5627
[enhancement][system][api][refactor][interface] Dev advanced indexing #5559
[enhancement][system] Optimize maybe. #5839
[enhancement][system] Decorator 4 disable recursive boxing call #5796
[enhancement][system] add_eager_boxing_and_op_interpreter_dispatch_error_info #5819
[enhancement][system] Kernel CUDA Graphs Support #5725
[bug][system] Fix placement print bug #5853
[bug][system] when error msg formatting fails, return error->DebugString #5844
[enhancement][system][refactor] Rename variables named *parallel_distribution* to *nd_sbp* (1) #5815
[feature][system][interface] Support Free EagerTensor caught in nn.Graph build #5777
[enhancement][system] Reuse CUDA event / Refine BnInOp2Blob / Refine channel #5837
[enhancement][system][serving] fix bug in AddInputOutputOpsPass: check existence of key in HashMap(inferface_lbi2scope_sym_id) #5653
[enhancement][system][api] unpack_call: impl new unpack_call_dispatcher for better performance #5820
[feature][system] Feat consistent tensor python constructor #5812
[feature][system] Support 0shape tensor #5620
[documentation][system] fix launcher description #5770
[feature][system][interface] Multi-nn.Graph memory reuse by Chunk manager #5658
[bug][system] Fix naive b2p error #5806
[enhancement][system] set created generator with default rng seed #5801
[enhancement][system] enhance_local_to_consistent #5761
[feature][system] add flow.randn #5736
[enhancement][system] Refactor hierarchical parallel cast autograd #5764
[enhancement][system] Collective boxing executor add_plan delete_plan #5495
[enhancement][system] Fix throw abort #5795
[enhancement][system] DECORATE #5794
[enhancement][system] Inferface eager boxing #5682
[enhancement][system] extract_consistent_to_consistent_op_expr #5870
[enhancement][system] disable backward pass consistent tensor meta check. #5871
[enhancement][system] Add CudaStreamIndexGenerator::GenerateNamedStreamIndex #5940
[bug][system] Only query PCI bus id when CUDA version >= 11 #5937
[enhancement][system] maybe: add JUST_MSG and CHECK_JUST_MSG #5904
[bug][system] Fix bug scalar #5950
[enhancement][system] framework: fix rvalue reference warnings #5948
[purge][system] Remove CudaWorkType #5942
[enhancement][system] refactor_symbol #5941
[bug][system] consistent_tensor_infer_cache: fix memory leak #5938
[feature][system] support to print gpu #5936
[enhancement][system] Bugfix static check #5935
[bug][system] fix nccl_version log #5934
[bug][system] Fix bug of multi-GPU train nn.Graph extra mem cost in rank 0 #5930
[enhancement][system] Only gradient acc be scheduled in parallel. #5926
[enhancement][bug][system] fix_ddp_bug_on_8_process #5929
[enhancement][system] Fix bug error msg format #5866
[feature][system] print consistent tensor data #5902
[bug][system] Move parse env to the constructor #5922
[enhancement][system] Remove GlobalWorkStreamId/GlobalThrdId #5917
[bug][system] shared_or_scalar: fix alias warnings #5916
[purge][system] Remove CompActor #5919
[enhancement][system] Use symbol dtype #5641
[enhancement][feature][system] Control Graph / Session / Env's python c++ object destruction #5845
[enhancement][bug][system] Sync access and assign indexing tensor. #5907
[enhancement][system][api][refactor] Dev consistent arange #5883
[enhancement][system] Lazy interpreter for new ConsistentToConsistentOpExpr #5903
[bug][system] Fix BUG of LazyInterpret FreeEagerTensor memory shared with regst #5891
[bug][system] fix typo in raise RuntimeError #5890
[enhancement][system][refactor] Rename the ParallelDistribution class to NdSbp #5814
[feature][system] add flow.rand #5722
[feature][system] Lazy Interpret support infer default device cpu #5880
[enhancement][system] Tensor str #5783
[feature][system][interface] Lazy to_consistent #5774
[enhancement][system] wait vm empty before exiting #5860
[enhancement][system] Eager boxing n to 1 #5949
[enhancement][system] add kernel observer #6052
[enhancement][ci][system] Optimize ddp broadcast and add speed/memory test in ci #6044
[enhancement][system] add var to control only print warning once when blocked #6045
[enhancement][system][refactor] Rewrite pow and logical functional apis #6032
[enhancement][system] Token seq id #5964
[enhancement][documentation][system] Remove python function wrapper. #6012
[feature][system] Add timeout and loc for blocking calls #6007
[enhancement][system] Eager boxing 1 to n #5943
[enhancement][system] Boxing expr #6015
[enhancement][system] new_X_to_B #5987
[enhancement][system] Add unimplemented return information #5952
[enhancement][system] Revert "Faster decorator" #6006
[enhancement][system] Throw exception if using advanced indexing for tensor setitem #6001
[enhancement][system] Support eager boxing sm 2 sn #5869
[enhancement][system] Move framework/local_dep_object.* to the eager directory #5988
[enhancement][system] Fix builtin op arg tuple. #5464
[feature][system][refactor] Dev functional multiple signatures #5982
[enhancement][system] Faster decorator #5996
[enhancement][system] Placed nd sbp #5995
[feature][system] Support asymmetric input/output/variable tensors in nn.Graph #5983
[enhancement][system] LightActor #5868
[bug][system] Prevent running oneflow in forked subprocess #5976
[bug][system] common/error: fix build error in mac os #5971
[bug][system] fix_bug_test_tensor_str #5958
[enhancement][system] Refine StreamContext #6191
[enhancement][system] container_util: fix VectorAt, remove useless MutMapAt #6172
[enhancement][system] Typesafe KernelState #6198
[enhancement][system] Primitive based copy task node #6195
[feature][system][interface] Lazy support Scalar #6181
[enhancement][system] Disable implicit boxing when parallel num eq one #6188
[enhancement][system] Primitive #6183
[enhancement][system] Remove IDMgr::GetGpuPhyIdFromThrdId/IDMgr::GetDeviceTypeFromThrdId #6169
[enhancement][system] remove op_expr_helper inside gradient_funcs #6057
[feature][system][api] Add tensor yaml, support export tensor functional api. #6099
[feature][system] Plan memory log #6151
[feature][system] Add dtype bfloat16 #5304
[enhancement][system] StreamContext #6129
[bug][system] Fix wrong inplace acc grad #6146
[enhancement][system] UserKernel remove job_desc #6144
[enhancement][system][api] Fea/graph/add outputs buffer to enable pipeline #6126
[enhancement][system] not fuse request for nccl 2.10.3 #6136
[bug][system] NewUniqueId thread safe #6141
[enhancement][system] XRT remove job_desc #6139
[enhancement][system] SystemOpFillJobNamePass #6138
[enhancement][system] mv_boxing_folder_to_core #6140
[enhancement][system] Refactor boxing interpreter to boxing expr #6134
[enhancement][system] Eager boxing one to one #6048
[enhancement][system] Vm cpu efficiency #6110
[enhancement][system] Naive generic boxing #6116
[feature][system] send/recv #5992
[enhancement][system] disable_print_stack_in_tensor_numpy #6123
[feature][system] add all_reduce by to_consistent #5963
[enhancement][system] KernelContext #6084
[enhancement][bug][system] Fix sync nccl and async nccl deadlock #6071
[bug][system][refactor] Refactor to local #6098
[enhancement][system] Replace xor with hash combine (part 1) #6078
[enhancement][system] Optimize error message #6073
[enhancement][system] Rename Error::xx to Error::xxError #6049
[enhancement][system] send formatted msg to glog #5999
[feature][bottleneck][bug][system][interface] [Feat.] NNGraph new eager tensor for new variable created in JobPass #6091
[bug][system] Fix bug of multi-GPU eager copy D2H extra mem cost in rank 0 #6092
[enhancement][system][api] Rename module flow.F to flow._C #6053
[feature][system][interface] [Feat.] Eager consistent OFRecordReader #6089
[enhancement][system][api] Dev fix and align interface #6075
[feature][bottleneck][bug][system][interface] NNGraph input/output valid by register tensors #6240
[bug][system][interface] Fix bug of Multi-Client src tick output order #6221
[enhancement][bug][system] Add cast primitive #6234
[feature][bottleneck][system][interface] Auto FixPipelineStageIdPass #6204
[enhancement][system] move scalar to oneflow namespace. #6235
[enhancement][system] UserKernel init CUDA Graphs with state #6230
[feature][system] Comm broadcast #6213
[enhancement][system][refactor] Rename opname to optype_name in AutogradEngine #6154
[enhancement][system] Add memset primitive #6218
[enhancement][system] Add StreamContext::device_type()/DeviceCtx::device_type() #6217
[feature][system] add all_gather and fix bug of multi rank doctest #6189
[feature][system][interface] [Feat.] Lazy interpreter skip hierarchical_parallel_cast #6208
[purge][system] Cleanup KernelUtil #6212
[enhancement][system] StreamContextAdapter #6205
[enhancement][system] Dev eliminate gcc warnings #6199
[feature][bottleneck][system][interface] [Feat.] nn.Graph support grad acc with input/output tensor #6155
[enhancement][system] Cpu symetric s to s #6153
[enhancement][system][upload-core] Op expr infer tensor meta #5064
[enhancement][system] Infer consistent tensor meta #5362

CI enhancements:

[bug][ci][api][interface] Refine module test #5232
[enhancement][ci] Add Simple CI, runs CPU-only on GitHub hosted servers #5207
[enhancement][ci] Run exe test in CPU-only #5202
[enhancement][ci] Cancel all workflow runs but the latest #5206
[enhancement][ci] Fix master not running Simple CI #5368
[enhancement][ci] Refine Simple CI and Clang analysis #5367
[enhancement][feature][bug][ci][documentation][interface] Fix upsample bilinear bug #5363
[enhancement][ci] Build nightly for py39 #5318
[enhancement][ci] Try distributed run for 3 times to prevent failure #5305
[enhancement][ci] Upload Simple CI logs to cloud #5268
[enhancement][ci] Remove cpu_op_eager and cuda_op_eager #5470
[bug][ci] fix segfault in clang plugin #5437
[enhancement][ci] Refine Simple CI error output #5435
[enhancement][ci] Add conda env to Simple CI #5385
[enhancement][ci] Fix clang plugin core file not found #5390
[bug][ci] upload core when build with clang plugin #5384
[bug][ci] clang plugin skip more files #5373
[enhancement][ci] Use gh-action-scheduler-v2 #5370
[enhancement][ci] relax speed threshold #5569
[bug][ci] Fix wrong test path under compatible #5567
[enhancement][ci][need-simple-ci] Prevent upload logs automatically #5560
[enhancement][ci][interface] Add nn.AdaptiveAvgPool1d and nn.AdaptiveAvgPool3d #5445
[feature][ci] add speed test in ci #5496
[enhancement][ci] Reduce usage of Simple CI #5546
[feature][bug][ci][api] Restruct upsample module #5524
[feature][ci] multi client launcher test #5488
[enhancement][ci] Remove automerge if cuda_new_interface failed #5519
[enhancement][ci] Prevent adding subdir in python/test #5514
[enhancement][ci] piprepo->pipindex #5517
[enhancement][ci] add dynamic_loss_scale in ci tests #5337
[enhancement][ci] Add timeout for wait_gpu_slot #5497
[enhancement][feature][ci] new static check based on clang-tidy #5476
[enhancement][ci] Fix url not downloadable in some browers #5701
[feature][ci] multi client multi machine test #5685
[enhancement][ci] Add cpu new interface CI #5639
[enhancement][ci][need-simple-ci] Mv clangtidy to simple ci #5667
[enhancement][ci][need-simple-ci] use clang tidy appimage in ci #5841
[enhancement][ci] Use gcc 7 in release to prevent error #5840
[enhancement][ci] bn tol 1e-4 => 1e-3 #5811
[enhancement][ci] fix distributed run on built dir #5810
[enhancement][ci] fix third party mirror check_sum #5802
[ci][documentation] find more accurately which files need to be doctested #5782
[enhancement][ci] Print stack unconditionally #5779
[enhancement][ci][need-simple-ci] Enable more checkers for clang-tidy in CI #5738
[enhancement][ci] CI: add clang-tidy check to test.yaml #5920
[ci][documentation] fix docstring in oneflow.nn.functional namespace #5807
[enhancement][ci] disable TREAT_WARNINGS_AS_ERRORS in Release CI #5886
[enhancement][ci] Skip ci jobs by git diff #5863
[bug][ci] quick fix #5978 #6030
[enhancement][bug][ci] fix clang tidy diff options and file format #5990
[enhancement][ci] add flow.relu #5847
[enhancement][ci] equal => allclose #6164
[bug][ci][need-simple-ci] CI: fix clang tidy checks in simple ci #6161
[enhancement][bug][ci][documentation][api] add interpolate and layer_norm docs #6157
[bug][ci] update speed test #6113
[enhancement][bug][ci][documentation][api] speed import oneflow #6107
[bug][ci] Also try install dev deps for CODEGEN_PYTHON_EXECUTABLE #6115
[bug][ci][need-simple-ci] set gtest_CMAKE_DEBUG_POSTFIX "d" #6085
[enhancement][ci] add cache init file for clang and CI build with clang #6062
[enhancement][ci] add emoji in speed test output, make it continue-on-error #6214

Test enhancements:

[bug][test][interface] Fix acos ci bug #5217
[feature][test] implement automated test #5321
[enhancement][test] move generator test into ops folder to accelerate tests #5472
[feature][test][api] Add autotest part2 #5467
[enhancement][test][api][interface] Add some tests with the new framework for auto testing #5561
[bug][test] fix test error when do multi case test on graph #5590
[enhancement][test] Refine module test using auto test by yaochi #5484
[enhancement][test] Add autotest for BatchNorm2d #5734
[enhancement][test] RTH_update_op_test #5823
[enhancement][test] dev adamw graph config #5745
[feature][test][api][interface] Add new autotest #5562
[bug][test] restore test of alexnet graph #5798
[enhancement][test][interface] add zhangshen op-test #5600
[feature][bug][tooling][test][interface] Record autotest wrong code #5923
[enhancement][feature][test][api] add randint #5718
[bug][test] fix multi machine test #5984
[enhancement][test][interface] some op test #6095

Tooling enhancements:

[bug][tooling] user/summary: fix memory leak in FillImageInSummary #5742
[enhancement][tooling][cfg] cfg: add move assignment operator for performance #5962
[enhancement][tooling][api][refactor] refactor_all_device_placement_api #6080