{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# E2. 使用 Bert + prompt 完成 SST2 分类\n", "\n", "  1   基础介绍:`prompt-based model`简介、与`fastNLP`的结合\n", "\n", "  2   准备工作:`P-Tuning v2`原理概述、`P-Tuning v2`模型搭建\n", "\n", "  3   模型训练:加载`tokenizer`、预处理`dataset`、模型训练与分析" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. 基础介绍:prompt-based model 简介、与 fastNLP 的结合\n", "\n", "  本示例使用`GLUE`评估基准中的`SST2`数据集,通过`prompt-based tuning`方式\n", "\n", "    微调`bert-base-uncased`模型,实现文本情感的二分类,在此之前本示例\n", "\n", "    将首先简单介绍提示学习模型的研究,以及与`fastNLP v0.8`结合的优势\n", "\n", "**`prompt`**,**提示词**,最早出自论文[Exploiting Cloze Questions for Few Shot TC and NLI](https://arxiv.org/pdf/2001.07676.pdf)中的**`PET`模型**\n", "\n", "    全称 **`Pattern-Exploiting Training`**,虽然文中并没有提到**`prompt`的说法,但仍视为其开山之作\n", "\n", "  其大致思路包括,对于文本分类任务,假定输入文本为,后来被称`prompt`,后来被称`verbalizer`,\n", "\n", "  其主要贡献在于,\n", "\n", "\n", "\n", "**`prompt-based tuning`**,**基于提示的微调**,\n", "\n", "  xxxx,更多参考[prompt综述](https://arxiv.org/pdf/2107.13586.pdf)\n", "\n", "    以下列举些经典的`prompt-based tuning`案例,简单地介绍下`prompt-based tuning`的脉络\n", "\n", "  案例一:**`P-Tuning v1`**,详细内容参考[P-Tuning-v1论文](https://arxiv.org/pdf/2103.10385.pdf)\n", "\n", "    其主要贡献在于,\n", "\n", "    其方法大致包括,\n", "\n", "  案例二:**`PromptTuning`**,详细内容参考[PromptTuning论文](https://arxiv.org/pdf/2104.08691.pdf)\n", "\n", "    其主要贡献在于,\n", "\n", "    其方法大致包括,\n", "\n", "  案例三:**`PrefixTuning`**,详细内容参考[PrefixTuning论文](https://arxiv.org/pdf/2101.00190.pdf)\n", "\n", "    其主要贡献在于,\n", "\n", "    其方法大致包括,\n", "\n", "通过上述介绍可以发现`prompt-based tuning`只是模型微调方式,独立于预训练模型基础`backbone`\n", "\n", "  目前,加载预训练模型的主流方法是使用`transformers`模块,而实现微调的框架则\n", "\n", "    可以是`pytorch`、`paddle`、`jittor`等,而不同框架间又存在不兼容的问题\n", "\n", "  因此,**使用`fastNLP v0.8`实现`prompt-based tuning`**,可以**很好地解决`paddle`等框架**\n", "\n", "    **和`transformers`模块之间的桥接**(`transformers`模块基于`pytorch`实现)\n", "\n", "本示例仍使用了`tutorial-E1`的`SST2`数据集、`distilbert-base-uncased`模型(便于比较\n", "\n", "  使用`pytorch`框架,通过将连续的`prompt`与`model`拼接,解决`SST2`二分类任务" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "4.18.0\n" ] } ], "source": [ "import torch\n", "import torch.nn as nn\n", "from torch.optim import AdamW\n", "from torch.utils.data import DataLoader, Dataset\n", "\n", "import transformers\n", "from transformers import AutoTokenizer\n", "from transformers import AutoModelForSequenceClassification\n", "\n", "import sys\n", "sys.path.append('..')\n", "\n", "import fastNLP\n", "from fastNLP import Trainer\n", "from fastNLP.core.metrics import Accuracy\n", "\n", "print(transformers.__version__)\n", "\n", "task = 'sst2'\n", "model_checkpoint = 'distilbert-base-uncased' # 'bert-base-uncased'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. 准备工作:P-Tuning v2 原理概述、P-Tuning v2 模型搭建\n", "\n", "  本示例使用`P-Tuning v2`作为`prompt-based tuning`与`fastNLP v0.8`结合的案例\n", "\n", "    以下首先简述`P-Tuning v2`的论文原理,并由此引出`fastNLP v0.8`的代码实践\n", "\n", "**`P-Tuning v2`**出自论文[Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)\n", "\n", "  其主要贡献在于,**在`PrefixTuning`等深度提示学习基础上**,**提升了其在分类标注等`NLU`任务的表现**\n", "\n", "    并使之在中等规模模型,主要是**参数量在`100M-1B`区间的模型上**,**获得与全参数微调相同的效果**\n", "\n", "  其结构如图所示,通过**在输入序列的分类符`[CLS]`之前**,**加入前缀序列**(**序号对应嵌入是待训练的连续值向量**\n", "\n", "    **刺激模型在新任务下**,从`[CLS]`对应位置,**输出符合微调任务的输出**,从而达到适应微调任务的目的\n", "\n", "\n", "\n", "本示例使用`bert-base-uncased`模型,作为`P-Tuning v2`的基础`backbone`,设置`requires_grad=False`\n", "\n", "    固定其参数不参与训练,**设置`pre_seq_len`长的`prefix_tokens`作为输入的提示前缀序列**\n", "\n", "  **使用基于`nn.Embedding`的`prefix_encoder`为提示前缀嵌入**,通过`get_prompt`函数获取,再将之\n", "\n", "    拼接至批量内每笔数据前得到`inputs_embeds`,同时更新自注意力掩模`attention_mask`\n", "\n", "  将`inputs_embeds`、`attention_mask`和`labels`输入`backbone`,**得到输出包括`loss`和`logits`**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class SeqClsModel(nn.Module):\n", " def __init__(self, model_checkpoint, num_labels, pre_seq_len):\n", " nn.Module.__init__(self)\n", " self.num_labels = num_labels\n", " self.back_bone = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, \n", " num_labels=num_labels)\n", " self.embeddings = self.back_bone.get_input_embeddings()\n", "\n", " for param in self.back_bone.parameters():\n", " param.requires_grad = False\n", " \n", " self.pre_seq_len = pre_seq_len\n", " self.prefix_tokens = torch.arange(self.pre_seq_len).long()\n", " self.prefix_encoder = nn.Embedding(self.pre_seq_len, self.embeddings.embedding_dim)\n", " \n", " def get_prompt(self, batch_size):\n", " prefix_tokens = self.prefix_tokens.unsqueeze(0).expand(batch_size, -1).to(self.back_bone.device)\n", " prompts = self.prefix_encoder(prefix_tokens)\n", " return prompts\n", "\n", " def forward(self, input_ids, attention_mask, labels=None):\n", " \n", " batch_size = input_ids.shape[0]\n", " raw_embedding = self.embeddings(input_ids)\n", " \n", " prompts = self.get_prompt(batch_size=batch_size)\n", " inputs_embeds = torch.cat((prompts, raw_embedding), dim=1)\n", " prefix_attention_mask = torch.ones(batch_size, self.pre_seq_len).to(self.back_bone.device)\n", " attention_mask = torch.cat((prefix_attention_mask, attention_mask), dim=1)\n", "\n", " outputs = self.back_bone(inputs_embeds=inputs_embeds, \n", " attention_mask=attention_mask, labels=labels)\n", " return outputs\n", "\n", " def train_step(self, input_ids, attention_mask, labels):\n", " loss = self(input_ids, attention_mask, labels).loss\n", " return {'loss': loss}\n", "\n", " def evaluate_step(self, input_ids, attention_mask, labels):\n", " pred = self(input_ids, attention_mask, labels).logits\n", " pred = torch.max(pred, dim=-1)[1]\n", " return {'pred': pred, 'target': labels}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接着,通过确定分类数量初始化模型实例,同时调用`torch.optim.AdamW`模块初始化优化器\n", "\n", "  根据`P-Tuning v2`论文:*`Generally, simple classification tasks prefer shorter prompts (less than 20)`*\n", "\n", "  此处`pre_seq_len`参数设定为`20`,学习率相应做出调整,其他内容和`tutorial-E1`中的内容一致" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_layer_norm.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_projector.bias']\n", "- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", "- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", "Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'pre_classifier.bias', 'classifier.bias']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] } ], "source": [ "model = SeqClsModel(model_checkpoint=model_checkpoint, num_labels=2, pre_seq_len=20)\n", "\n", "optimizers = AdamW(params=model.parameters(), lr=1e-2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. 模型训练:加载 tokenizer、预处理 dataset、模型训练与分析\n", "\n", "  本示例沿用`tutorial-E1`中的数据集,即使用`GLUE`评估基准中的`SST2`数据集\n", "\n", "    以`bert-base-uncased`模型作为基准,基于`P-Tuning v2`方式微调\n", "\n", "    数据集加载相关代码流程见下,内容和`tutorial-E1`中的内容基本一致\n", "\n", "首先,使用`datasets.load_dataset`加载数据集,使用`transformers.AutoTokenizer`\n", "\n", "  构建`tokenizer`实例,通过`dataset.map`使用`tokenizer`将文本替换为词素序号序列" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Reusing dataset glue (/remote-home/xrliu/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "21cbd92c3397497d84dc10f017ec96f4", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/3 [00:00[22:53:00] INFO Running evaluator sanity check for 2 batches. trainer.py:592\n", "\n" ], "text/plain": [ "\u001b[2;36m[22:53:00]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Running evaluator sanity check for \u001b[1;36m2\u001b[0m batches. \u001b]8;id=406635;file://../fastNLP/core/controllers/trainer.py\u001b\\\u001b[2mtrainer.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=951504;file://../fastNLP/core/controllers/trainer.py#592\u001b\\\u001b[2m592\u001b[0m\u001b]8;;\u001b\\\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:1, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m1\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.540625,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 173.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.540625\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m173.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:2, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m2\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.5,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 160.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.5\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m160.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:3, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m3\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.509375,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 163.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.509375\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m163.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:4, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m4\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.634375,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 203.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.634375\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m203.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:5, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m5\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.6125,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 196.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.6125\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m196.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:6, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m6\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.675,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 216.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.675\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m216.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:7, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m7\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.64375,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 206.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.64375\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m206.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:8, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m8\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.665625,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 213.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.665625\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m213.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
----------------------------- Eval. results on Epoch:9, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "----------------------------- Eval. results on Epoch:\u001b[1;36m9\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.659375,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 211.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.659375\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m211.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
---------------------------- Eval. results on Epoch:10, Batch:0 -----------------------------\n",
       "
\n" ], "text/plain": [ "---------------------------- Eval. results on Epoch:\u001b[1;36m10\u001b[0m, Batch:\u001b[1;36m0\u001b[0m -----------------------------\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\n",
       "  \"acc#acc\": 0.696875,\n",
       "  \"total#acc\": 320.0,\n",
       "  \"correct#acc\": 223.0\n",
       "}\n",
       "
\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\n", " \u001b[1;34m\"acc#acc\"\u001b[0m: \u001b[1;36m0.696875\u001b[0m,\n", " \u001b[1;34m\"total#acc\"\u001b[0m: \u001b[1;36m320.0\u001b[0m,\n", " \u001b[1;34m\"correct#acc\"\u001b[0m: \u001b[1;36m223.0\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "
\n",
       "
\n" ], "text/plain": [ "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trainer.run(num_eval_batch_per_dl=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以发现,其效果远远逊色于`fine-tuning`,这是因为`P-Tuning v2`虽然能够适应参数量\n", "\n", "  在`100M-1B`区间的模型,但是,**`distilbert-base`的参数量仅为`66M`**,无法触及其下限\n", "\n", "另一方面,**`fastNLP v0.8`不支持`jupyter`多卡**,所以无法在笔者的电脑/服务器上,完成\n", "\n", "  合适规模模型的学习,例如`110M`的`bert-base`模型,以及`340M`的`bert-large`模型" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n"
      ],
      "text/plain": []
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "{'acc#acc': 0.737385, 'total#acc': 872.0, 'correct#acc': 643.0}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trainer.evaluator.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.13"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}