--- template: hub1 title: BERT for Finetune summary: en_US: Bidirectional Encoder Representation from Transformers (BERT) zh_CN: BERT author: MegEngine Team tags: [nlp] github-link: https://github.com/megengine/models --- ```python import megengine.hub as hub model = megengine.hub.load("megengine/models", "wwm_cased_L-24_H-1024_A-16", pretrained=True) # or any of these variants # model = megengine.hub.load("megengine/models", "wwm_cased_L-24_H-1024_A-16", pretrained=True) # model = megengine.hub.load("megengine/models", "wwm_uncased_L-24_H-1024_A-16", pretrained=True) # model = megengine.hub.load("megengine/models", "cased_L-12_H-768_A-12", pretrained=True) # model = megengine.hub.load("megengine/models", "cased_L-24_H-1024_A-16", pretrained=True) # model = megengine.hub.load("megengine/models", "uncased_L-12_H-768_A-12", pretrained=True) # model = megengine.hub.load("megengine/models", "uncased_L-24_H-1024_A-16", pretrained=True) # model = megengine.hub.load("megengine/models", "chinese_L-12_H-768_A-12", pretrained=True) # model = megengine.hub.load("megengine/models", "multi_cased_L-12_H-768_A-12", pretrained=True) ``` 这个项目中, 我们用MegEngine重新实现了Google开源的BERT模型. 我们提供了以下预训练模型供用户在不同的下游任务中进行finetune. * `wwm_cased_L-24_H-1024_A-16` * `wwm_uncased_L-24_H-1024_A-16` * `cased_L-12_H-768_A-12` * `cased_L-24_H-1024_A-16` * `uncased_L-12_H-768_A-12` * `uncased_L-24_H-1024_A-16` * `chinese_L-12_H-768_A-12` * `multi_cased_L-12_H-768_A-12` 模型的权重来自Google的pre-trained models, 其含义也与其一致, 用户可以直接使用`megengine.hub`轻松的调用预训练的bert模型, 以及下载对应的`vocab.txt`与`bert_config.json`. 我们在[models](https://github.com/megengine/models/official/nlp/bert)中还提供了更加方便的脚本, 可以通过任务名直接获取到对应字典, 配置, 与预训练模型. ```python import megengine.hub as hub import urllib import urllib.request import os DATA_URL = 'https://data.megengine.org.cn/models/weights/bert' CONFIG_NAME = 'bert_config.json' VOCAB_NAME = 'vocab.txt' MODEL_NAME = { 'wwm_cased_L-24_H-1024_A-16': 'wwm_cased_L_24_H_1024_A_16', 'wwm_uncased_L-24_H-1024_A-16': 'wwm_uncased_L_24_H_1024_A_16', 'cased_L-12_H-768_A-12': 'cased_L_12_H_768_A_12', 'cased_L-24_H-1024_A-16': 'cased_L_24_H_1024_A_16', 'uncased_L-12_H-768_A-12': 'uncased_L_12_H_768_A_12', 'uncased_L-24_H-1024_A-16': 'uncased_L_24_H_1024_A_16', 'chinese_L-12_H-768_A-12': 'chinese_L_12_H_768_A_12', 'multi_cased_L-12_H-768_A-12': 'multi_cased_L_12_H_768_A_12' } def download_file(url, filename): try: urllib.URLopener().retrieve(url, filename) except: urllib.request.urlretrieve(url, filename) def create_hub_bert(model_name, pretrained): assert model_name in MODEL_NAME, '{} not in the valid models {}'.format(model_name, MODEL_NAME) data_dir = './{}'.format(model_name) if not os.path.exists(data_dir): os.makedirs(data_dir) vocab_url = '{}/{}/{}'.format(DATA_URL, model_name, VOCAB_NAME) config_url = '{}/{}/{}'.format(DATA_URL, model_name, CONFIG_NAME) vocab_file = './{}/{}'.format(model_name, VOCAB_NAME) config_file = './{}/{}'.format(model_name, CONFIG_NAME) download_file(vocab_url, vocab_file) download_file(config_url, config_file) config = BertConfig(config_file) model = hub.load( "megengine/models", MODEL_NAME[model_name], pretrained=pretrained, ) return model, config, vocab_file ``` 为了用户可以更加方便的使用预训练模型, 我们仅保留了模型的`BertModel`的部分, 在实际使用中, 可以将带有预训练的权重的`bert`模型作为其他模型的一部分, 在初始化函数中传入. ```python class BertForSequenceClassification(Module): def __init__(self, config, num_labels, bert): self.bert = bert self.num_labels = num_labels self.dropout = Dropout(config.hidden_dropout_prob) self.classifier = Linear(config.hidden_size, num_labels) def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): _, pooled_output = self.bert( input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) pooled_output = self.dropout(pooled_output) logits = self.classifier(pooled_output) if labels is not None: loss = cross_entropy_with_softmax( logits.reshape(-1, self.num_labels), labels.reshape(-1)) return logits, loss else: return logits, None bert, config, vocab_file = create_hub_bert('uncased_L-12_H-768_A-12', pretrained=True) model = BertForSequenceClassification(config, num_labels=2, bert=bert) ``` 所有预训练模型希望数据被正确预处理, 其要求与Google中的开源bert一致, 详细可以参考 [bert](https://github.com/google-research/bert), 或者参考在[models](https://github.com/megengine/models/official/nlp/bert)中提供的样例. ### 模型描述 我们在[models](https://github.com/megengine/models/official/nlp/bert)中提供了简单的示例代码. 此示例代码在Microsoft Research Paraphrase(MRPC)数据集上对预训练的`uncased_L-12_H-768_A-12`模型进行微调. 我们的样例代码中使用了原始的超参进行微调, 在测试集中可以得到84%到88%的正确率. ### 参考文献 - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova; This repository contains reimplemented Google's BERT by MegEngine. We provide the following pre-trained models for users to finetune in different tasks. * `wwm_cased_L-24_H-1024_A-16` * `wwm_uncased_L-24_H-1024_A-16` * `cased_L-12_H-768_A-12` * `cased_L-24_H-1024_A-16` * `uncased_L-12_H-768_A-12` * `uncased_L-24_H-1024_A-16` * `chinese_L-12_H-768_A-12` * `multi_cased_L-12_H-768_A-12` The weight of the model comes from Google's pre-trained models, and its meaning is also consistent with it. Users can use `megengine.hub` to easily use the pre-trained bert model, and download the corresponding` vocab.txt` and `bert_config.json`. We also provide a convenient script in [models] (https://github.com/megengine/models/official/nlp/bert), which can directly obtain the corresponding dictionary, configuration, and pre-trained model by task name. . ```python import megengine.hub as hub import urllib import urllib.request import os DATA_URL = 'https://data.megengine.org.cn/models/weights/bert' CONFIG_NAME = 'bert_config.json' VOCAB_NAME = 'vocab.txt' MODEL_NAME = { 'wwm_cased_L-24_H-1024_A-16': 'wwm_cased_L_24_H_1024_A_16', 'wwm_uncased_L-24_H-1024_A-16': 'wwm_uncased_L_24_H_1024_A_16', 'cased_L-12_H-768_A-12': 'cased_L_12_H_768_A_12', 'cased_L-24_H-1024_A-16': 'cased_L_24_H_1024_A_16', 'uncased_L-12_H-768_A-12': 'uncased_L_12_H_768_A_12', 'uncased_L-24_H-1024_A-16': 'uncased_L_24_H_1024_A_16', 'chinese_L-12_H-768_A-12': 'chinese_L_12_H_768_A_12', 'multi_cased_L-12_H-768_A-12': 'multi_cased_L_12_H_768_A_12' } def download_file(url, filename): try: urllib.URLopener().retrieve(url, filename) except: urllib.request.urlretrieve(url, filename) def create_hub_bert(model_name, pretrained): assert model_name in MODEL_NAME, '{} not in the valid models {}'.format(model_name, MODEL_NAME) data_dir = './{}'.format(model_name) if not os.path.exists(data_dir): os.makedirs(data_dir) vocab_url = '{}/{}/{}'.format(DATA_URL, model_name, VOCAB_NAME) config_url = '{}/{}/{}'.format(DATA_URL, model_name, CONFIG_NAME) vocab_file = './{}/{}'.format(model_name, VOCAB_NAME) config_file = './{}/{}'.format(model_name, CONFIG_NAME) download_file(vocab_url, vocab_file) download_file(config_url, config_file) config = BertConfig(config_file) model = hub.load( "megengine/models", MODEL_NAME[model_name], pretrained=pretrained, ) return model, config, vocab_file ``` In order to make it easier for the user to use the pre-trained model, we only keep the `BertModel` part of the original bert model. For example, The` bert` model with pre-trained weights can be used as a part of other models. ```python class BertForSequenceClassification(Module): def __init__(self, config, num_labels, bert): self.bert = bert self.num_labels = num_labels self.dropout = Dropout(config.hidden_dropout_prob) self.classifier = Linear(config.hidden_size, num_labels) def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): _, pooled_output = self.bert( input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) pooled_output = self.dropout(pooled_output) logits = self.classifier(pooled_output) if labels is not None: loss = cross_entropy_with_softmax( logits.reshape(-1, self.num_labels), labels.reshape(-1)) return logits, loss else: return logits, None bert, config, vocab_file = create_hub_bert('uncased_L-12_H-768_A-12', pretrained=True) model = BertForSequenceClassification(config, num_labels=2, bert=bert) ``` All pre-trained models expect the data to be pre-processed correctly. The requirements are consistent with the Google's bert. For details, please refer to original [bert] (https://github.com/google-research/bert), or refer to our example [models] ( https://github.com/megengine/models/official/nlp/bert). ### Model Description We provide example code in [models] (https://github.com/megengine/models/official/nlp/bert). This example code fine-tunes the pre-trained `uncased_L-12_H-768_A-12` model on the Microsoft Research Paraphrase (MRPC) dataset. Our test ran on the original implementation hyper-parameters gave evaluation results between 84% and 88%. ### References - [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805), Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova;