前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
Created by: wangxicoding
Compress fp32 gradient to fp16 for communication, reduce communication size and bandwidth usage. Suitable for use on cards that do not support tensor core, such as P4.