 Dev0.4.0 (#149)
* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释
* BucketSampler增加一条错误检测
* 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释
* update MLP module
* 增加metric注释;修改trainer save过程中的bug
* Update README.md
fix tutorial link
* Add ENAS (Efficient Neural Architecture Search)
* add ignore_type in DataSet.add_field
* * AutoPadder will not pad when dtype is None
* add ignore_type in DataSet.apply
* 修复fieldarray中padder潜在bug
* 修复crf中typo; 以及可能导致数值不稳定的地方
* 修复CRF中可能存在的bug
* change two default init arguments of Trainer into None
* Changes to Callbacks:
* 给callback添加给定几个只读属性
* 通过manager设置这些属性
* 代码优化,减轻@transfer的负担
* * 将enas相关代码放到automl目录下
* 修复fast_param_mapping的一个bug
* Trainer添加自动创建save目录
* Vocabulary的打印,显示内容
* * 给vocabulary添加遍历方法
* 修复CRF为负数的bug
* add SQuAD metric
* add sigmoid activate function in MLP
* - add star transformer model
- add ConllLoader, for all kinds of conll-format files
- add JsonLoader, for json-format files
- add SSTLoader, for SST-2 & SST-5
- change Callback interface
- fix batch multi-process when killed
- add README to list models and their performance
* - fix test
* - fix callback & tests
* - update README
* 修改部分bug;调整callback
* 准备发布0.4.0版本“
* update readme
* support parallel loss
* 防止多卡的情况导致无法正确计算loss“
* update advance_tutorial jupyter notebook
* 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove.
2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。
3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。
4. callback中新增update_every属性
* 1.DataSet.apply()报错时提供错误的index
2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序
3.embedloader在embed读取时遇到不规则的数据跳过这一行.
* update attention
* doc tools
* fix some doc errors
* 修改为中文注释,增加viterbi解码方法
* 样例版本
* - add pad sequence for lstm
- add csv, conll, json filereader
- update dataloader
- remove useless dataloader
- fix trainer loss print
- fix tests
* - fix test_tutorial
* 注释增加
* 测试文档
* 本地暂存
* 本地暂存
* 修改文档的顺序
* - add document
* 本地暂存
* update pooling
* update bert
* update documents in MLP
* update documents in snli
* combine self attention module to attention.py
* update documents on losses.py
* 对DataSet的文档进行更新
* update documents on metrics
* 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档
* 增加对Trainer的注释
* 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏
* update char level encoder
* update documents on embedding.py
* - update doc
* 补充注释,并修改部分代码
* - update doc
- add get_embeddings
* 修改了文档配置项
* 修改embedding为init_embed初始化
* 1.增加对Trainer和Tester的多卡支持;
* - add test
- fix jsonloader
* 删除了注释教程
* 给 dataset 增加了get_field_names
* 修复bug
* - add Const
- fix bugs
* 修改部分注释
* - add model runner for easier test models
- add model tests
* 修改了 docs 的配置和架构
* 修改了核心部分的一大部分文档,TODO:
1. 完善 trainer 和 tester 部分的文档
2. 研究注释样例与测试
* core部分的注释基本检查完成
* 修改了 io 部分的注释
* 全部改为相对路径引用
* 全部改为相对路径引用
* small change
* 1. 从安装文件中删除api/automl的安装
2. metric中存在seq_len的bug
3. sampler中存在命名错误,已修改
* 修复 bug :兼容 cpu 版本的 PyTorch
TODO:其它地方可能也存在类似的 bug
* 修改文档中的引用部分
* 把 tqdm.autonotebook 换成tqdm.auto
* - fix batch & vocab
* 上传了文档文件 *.rst
* 上传了文档文件和若干 TODO
* 讨论并整合了若干模块
* core部分的测试和一些小修改
* 删除了一些冗余文档
* update init files
* update const files
* update const files
* 增加cnn的测试
* fix a little bug
* - update attention
- fix tests
* 完善测试
* 完成快速入门教程
* 修改了sequence_modeling 命名为 sequence_labeling 的文档
* 重新 apidoc 解决改名的遗留问题
* 修改文档格式
* 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask
* 增加了一行提示
* 在文档中展示 dataset_loader
* 提示 Dataset.read_csv 会被 CSVLoader 替换
* 完成 Callback 和 Trainer 之间的文档
* index更新了部分
* 删除冗余的print
* 删除用于分词的metric,因为有可能引起错误
* 修改文档中的中文名称
* 完成了详细介绍文档
* tutorial 的 ipynb 文件
* 修改了一些介绍文档
* 修改了 models 和 modules 的主页介绍
* 加上了 titlesonly 这个设置
* 修改了模块文档展示的标题
* 修改了 core 和 io 的开篇介绍
* 修改了 modules 和 models 开篇介绍
* 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释
* 修改了一些注释
* delete an old metric in test
* 修改 tutorials 的测试文件
* 把暂不发布的功能移到 legacy 文件夹
* 删除了不能运行的测试
* 修改 callback 的测试文件
* 删除了过时的教程和测试文件
* cache_results 参数的修改
* 修改 io 的测试文件; 删除了一些过时的测试
* 修复bug
* 修复无法通过test_utils.py的测试
* 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar
* 1. 修复metric中的bug; 2.增加metric测试
* add model summary
* 增加别名
* 删除encoder中的嵌套层
* 修改了 core 部分 import 的顺序,__all__ 暴露的内容
* 修改了 models 部分 import 的顺序,__all__ 暴露的内容
* 修改了文件名
* 修改了 modules 模块的__all__ 和 import
* fix var runn
* 增加vocab的clear方法
* 一些符合 PEP8 的微调
* 更新了cache_results的例子
* 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index
* 修改了一个typo
* 修改了 README.md
* update documents on bert
* update documents on encoder/bert
* 增加一个fitlog callback,实现与fitlog实验记录
* typo
* - update dataset_loader
* 增加了到 fitlog 文档的链接。
* 增加了 DataSet Loader 的文档
* - add star-transformer reproduction
6 years ago |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388 |
- # Code Modified from https://github.com/carpedm20/ENAS-pytorch
-
- """Module containing the shared RNN model."""
- import collections
-
- import numpy as np
- import torch
- import torch.nn.functional as F
- from torch import nn
- from torch.autograd import Variable
-
- import fastNLP.automl.enas_utils as utils
- from fastNLP.models.base_model import BaseModel
-
-
- def _get_dropped_weights(w_raw, dropout_p, is_training):
- """Drops out weights to implement DropConnect.
-
- Args:
- w_raw: Full, pre-dropout, weights to be dropped out.
- dropout_p: Proportion of weights to drop out.
- is_training: True iff _shared_ model is training.
-
- Returns:
- The dropped weights.
-
- Why does torch.nn.functional.dropout() return:
- 1. `torch.autograd.Variable()` on the training loop
- 2. `torch.nn.Parameter()` on the controller or eval loop, when
- training = False...
-
- Even though the call to `_setweights` in the Smerity repo's
- `weight_drop.py` does not have this behaviour, and `F.dropout` always
- returns `torch.autograd.Variable` there, even when `training=False`?
-
- The above TODO is the reason for the hacky check for `torch.nn.Parameter`.
- """
- dropped_w = F.dropout(w_raw, p=dropout_p, training=is_training)
-
- if isinstance(dropped_w, torch.nn.Parameter):
- dropped_w = dropped_w.clone()
-
- return dropped_w
-
- class EmbeddingDropout(torch.nn.Embedding):
- """Class for dropping out embeddings by zero'ing out parameters in the
- embedding matrix.
-
- This is equivalent to dropping out particular words, e.g., in the sentence
- 'the quick brown fox jumps over the lazy dog', dropping out 'the' would
- lead to the sentence '### quick brown fox jumps over ### lazy dog' (in the
- embedding vector space).
-
- See 'A Theoretically Grounded Application of Dropout in Recurrent Neural
- Networks', (Gal and Ghahramani, 2016).
- """
- def __init__(self,
- num_embeddings,
- embedding_dim,
- max_norm=None,
- norm_type=2,
- scale_grad_by_freq=False,
- sparse=False,
- dropout=0.1,
- scale=None):
- """Embedding constructor.
-
- Args:
- dropout: Dropout probability.
- scale: Used to scale parameters of embedding weight matrix that are
- not dropped out. Note that this is _in addition_ to the
- `1/(1 - dropout)` scaling.
-
- See `torch.nn.Embedding` for remaining arguments.
- """
- torch.nn.Embedding.__init__(self,
- num_embeddings=num_embeddings,
- embedding_dim=embedding_dim,
- max_norm=max_norm,
- norm_type=norm_type,
- scale_grad_by_freq=scale_grad_by_freq,
- sparse=sparse)
- self.dropout = dropout
- assert (dropout >= 0.0) and (dropout < 1.0), ('Dropout must be >= 0.0 '
- 'and < 1.0')
- self.scale = scale
-
- def forward(self, inputs): # pylint:disable=arguments-differ
- """Embeds `inputs` with the dropped out embedding weight matrix."""
- if self.training:
- dropout = self.dropout
- else:
- dropout = 0
-
- if dropout:
- mask = self.weight.data.new(self.weight.size(0), 1)
- mask.bernoulli_(1 - dropout)
- mask = mask.expand_as(self.weight)
- mask = mask / (1 - dropout)
- masked_weight = self.weight * Variable(mask)
- else:
- masked_weight = self.weight
- if self.scale and self.scale != 1:
- masked_weight = masked_weight * self.scale
-
- return F.embedding(inputs,
- masked_weight,
- max_norm=self.max_norm,
- norm_type=self.norm_type,
- scale_grad_by_freq=self.scale_grad_by_freq,
- sparse=self.sparse)
-
-
- class LockedDropout(nn.Module):
- # code from https://github.com/salesforce/awd-lstm-lm/blob/master/locked_dropout.py
- def __init__(self):
- super().__init__()
-
- def forward(self, x, dropout=0.5):
- if not self.training or not dropout:
- return x
- m = x.data.new(1, x.size(1), x.size(2)).bernoulli_(1 - dropout)
- mask = Variable(m, requires_grad=False) / (1 - dropout)
- mask = mask.expand_as(x)
- return mask * x
-
-
- class ENASModel(BaseModel):
- """Shared RNN model."""
- def __init__(self, embed_num, num_classes, num_blocks=4, cuda=False, shared_hid=1000, shared_embed=1000):
- super(ENASModel, self).__init__()
-
- self.use_cuda = cuda
-
- self.shared_hid = shared_hid
- self.num_blocks = num_blocks
- self.decoder = nn.Linear(self.shared_hid, num_classes)
- self.encoder = EmbeddingDropout(embed_num,
- shared_embed,
- dropout=0.1)
- self.lockdrop = LockedDropout()
- self.dag = None
-
- # Tie weights
- # self.decoder.weight = self.encoder.weight
-
- # Since W^{x, c} and W^{h, c} are always summed, there
- # is no point duplicating their bias offset parameter. Likewise for
- # W^{x, h} and W^{h, h}.
- self.w_xc = nn.Linear(shared_embed, self.shared_hid)
- self.w_xh = nn.Linear(shared_embed, self.shared_hid)
-
- # The raw weights are stored here because the hidden-to-hidden weights
- # are weight dropped on the forward pass.
- self.w_hc_raw = torch.nn.Parameter(
- torch.Tensor(self.shared_hid, self.shared_hid))
- self.w_hh_raw = torch.nn.Parameter(
- torch.Tensor(self.shared_hid, self.shared_hid))
- self.w_hc = None
- self.w_hh = None
-
- self.w_h = collections.defaultdict(dict)
- self.w_c = collections.defaultdict(dict)
-
- for idx in range(self.num_blocks):
- for jdx in range(idx + 1, self.num_blocks):
- self.w_h[idx][jdx] = nn.Linear(self.shared_hid,
- self.shared_hid,
- bias=False)
- self.w_c[idx][jdx] = nn.Linear(self.shared_hid,
- self.shared_hid,
- bias=False)
-
- self._w_h = nn.ModuleList([self.w_h[idx][jdx]
- for idx in self.w_h
- for jdx in self.w_h[idx]])
- self._w_c = nn.ModuleList([self.w_c[idx][jdx]
- for idx in self.w_c
- for jdx in self.w_c[idx]])
-
- self.batch_norm = None
- # if args.mode == 'train':
- # self.batch_norm = nn.BatchNorm1d(self.shared_hid)
- # else:
- # self.batch_norm = None
-
- self.reset_parameters()
- self.static_init_hidden = utils.keydefaultdict(self.init_hidden)
-
- def setDAG(self, dag):
- if self.dag is None:
- self.dag = dag
-
- def forward(self, word_seq, hidden=None):
- inputs = torch.transpose(word_seq, 0, 1)
-
- time_steps = inputs.size(0)
- batch_size = inputs.size(1)
-
-
- self.w_hh = _get_dropped_weights(self.w_hh_raw,
- 0.5,
- self.training)
- self.w_hc = _get_dropped_weights(self.w_hc_raw,
- 0.5,
- self.training)
-
- # hidden = self.static_init_hidden[batch_size] if hidden is None else hidden
- hidden = self.static_init_hidden[batch_size]
-
- embed = self.encoder(inputs)
-
- embed = self.lockdrop(embed, 0.65 if self.training else 0)
-
- # The norm of hidden states are clipped here because
- # otherwise ENAS is especially prone to exploding activations on the
- # forward pass. This could probably be fixed in a more elegant way, but
- # it might be exposing a weakness in the ENAS algorithm as currently
- # proposed.
- #
- # For more details, see
- # https://github.com/carpedm20/ENAS-pytorch/issues/6
- clipped_num = 0
- max_clipped_norm = 0
- h1tohT = []
- logits = []
- for step in range(time_steps):
- x_t = embed[step]
- logit, hidden = self.cell(x_t, hidden, self.dag)
-
- hidden_norms = hidden.norm(dim=-1)
- max_norm = 25.0
- if hidden_norms.data.max() > max_norm:
- # Just directly use the torch slice operations
- # in PyTorch v0.4.
- #
- # This workaround for PyTorch v0.3.1 does everything in numpy,
- # because the PyTorch slicing and slice assignment is too
- # flaky.
- hidden_norms = hidden_norms.data.cpu().numpy()
-
- clipped_num += 1
- if hidden_norms.max() > max_clipped_norm:
- max_clipped_norm = hidden_norms.max()
-
- clip_select = hidden_norms > max_norm
- clip_norms = hidden_norms[clip_select]
-
- mask = np.ones(hidden.size())
- normalizer = max_norm/clip_norms
- normalizer = normalizer[:, np.newaxis]
-
- mask[clip_select] = normalizer
-
- if self.use_cuda:
- hidden *= torch.autograd.Variable(
- torch.FloatTensor(mask).cuda(), requires_grad=False)
- else:
- hidden *= torch.autograd.Variable(
- torch.FloatTensor(mask), requires_grad=False)
- logits.append(logit)
- h1tohT.append(hidden)
-
- h1tohT = torch.stack(h1tohT)
- output = torch.stack(logits)
- raw_output = output
-
- output = self.lockdrop(output, 0.4 if self.training else 0)
-
- #Pooling
- output = torch.mean(output, 0)
-
- decoded = self.decoder(output)
-
- extra_out = {'dropped': decoded,
- 'hiddens': h1tohT,
- 'raw': raw_output}
- return {'pred': decoded, 'hidden': hidden, 'extra_out': extra_out}
-
- def cell(self, x, h_prev, dag):
- """Computes a single pass through the discovered RNN cell."""
- c = {}
- h = {}
- f = {}
-
- f[0] = self.get_f(dag[-1][0].name)
- c[0] = torch.sigmoid(self.w_xc(x) + F.linear(h_prev, self.w_hc, None))
- h[0] = (c[0]*f[0](self.w_xh(x) + F.linear(h_prev, self.w_hh, None)) +
- (1 - c[0])*h_prev)
-
- leaf_node_ids = []
- q = collections.deque()
- q.append(0)
-
- # Computes connections from the parent nodes `node_id`
- # to their child nodes `next_id` recursively, skipping leaf nodes. A
- # leaf node is a node whose id == `self.num_blocks`.
- #
- # Connections between parent i and child j should be computed as
- # h_j = c_j*f_{ij}{(W^h_{ij}*h_i)} + (1 - c_j)*h_i,
- # where c_j = \sigmoid{(W^c_{ij}*h_i)}
- #
- # See Training details from Section 3.1 of the paper.
- #
- # The following algorithm does a breadth-first (since `q.popleft()` is
- # used) search over the nodes and computes all the hidden states.
- while True:
- if len(q) == 0:
- break
-
- node_id = q.popleft()
- nodes = dag[node_id]
-
- for next_node in nodes:
- next_id = next_node.id
- if next_id == self.num_blocks:
- leaf_node_ids.append(node_id)
- assert len(nodes) == 1, ('parent of leaf node should have '
- 'only one child')
- continue
-
- w_h = self.w_h[node_id][next_id]
- w_c = self.w_c[node_id][next_id]
-
- f[next_id] = self.get_f(next_node.name)
- c[next_id] = torch.sigmoid(w_c(h[node_id]))
- h[next_id] = (c[next_id]*f[next_id](w_h(h[node_id])) +
- (1 - c[next_id])*h[node_id])
-
- q.append(next_id)
-
- # Instead of averaging loose ends, perhaps there should
- # be a set of separate unshared weights for each "loose" connection
- # between each node in a cell and the output.
- #
- # As it stands, all weights W^h_{ij} are doing double duty by
- # connecting both from i to j, as well as from i to the output.
-
- # average all the loose ends
- leaf_nodes = [h[node_id] for node_id in leaf_node_ids]
- output = torch.mean(torch.stack(leaf_nodes, 2), -1)
-
- # stabilizing the Updates of omega
- if self.batch_norm is not None:
- output = self.batch_norm(output)
-
- return output, h[self.num_blocks - 1]
-
- def init_hidden(self, batch_size):
- zeros = torch.zeros(batch_size, self.shared_hid)
- return utils.get_variable(zeros, self.use_cuda, requires_grad=False)
-
- def get_f(self, name):
- name = name.lower()
- if name == 'relu':
- f = torch.relu
- elif name == 'tanh':
- f = torch.tanh
- elif name == 'identity':
- f = lambda x: x
- elif name == 'sigmoid':
- f = torch.sigmoid
- return f
-
-
- @property
- def num_parameters(self):
- def size(p):
- return np.prod(p.size())
- return sum([size(param) for param in self.parameters()])
-
-
- def reset_parameters(self):
- init_range = 0.025
- # init_range = 0.025 if self.args.mode == 'train' else 0.04
- for param in self.parameters():
- param.data.uniform_(-init_range, init_range)
- self.decoder.bias.data.fill_(0)
-
- def predict(self, word_seq):
- """
-
- :param word_seq: torch.LongTensor, [batch_size, seq_len]
- :return predict: dict of torch.LongTensor, [batch_size, seq_len]
- """
- output = self(word_seq)
- _, predict = output['pred'].max(dim=1)
- return {'pred': predict}
|