You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

t_nas.rst 17 kB

5 years ago
4 years ago
4 years ago
4 years ago
5 years ago
4 years ago
4 years ago
5 years ago
5 years ago
4 years ago
4 years ago
4 years ago
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. .. _nas:
  2. Neural Architecture Search
  3. ============================
  4. We support different neural architecture search algorithm in variant search space.
  5. Neural architecture search is usually constructed by three modules: search space, search strategy and estimation strategy.
  6. The search space describes all possible architectures to be searched. There are mainly two parts of the space formulated, the operations(e.g. GCNconv, GATconv) and the input-ouput relations.
  7. A large space may have better optimal architecture but demands more effect to explore.
  8. Human knowledge can help to design a reasonable search space to reduce the efforts of search strategy.
  9. The search strategy controls how to explore the search space.
  10. It encompasses the classical exploration-exploitation trade-off since.
  11. On the one hand, it is desirable to find well-performing architectures quickly,
  12. while on the other hand, premature convergence to a region of suboptimal architectures should be avoided.
  13. The estimation strategy gives the performance of certain architectures when it is explored.
  14. The simplest option is to perform a standard training and validation of the architecture on data.
  15. Since there are lots of architectures need estimating in the whole searching process, estimation strategy is desired to be very efficient to save computational resources.
  16. .. image:: ../../../resources/nas.svg
  17. :align: center
  18. To be more flexible, we modulize NAS process with three part: algorithm, space and estimator, corresponding to the three module search space, search strategy and estimation strategy.
  19. Different models in different parts can be composed in some certain constrains.
  20. If you want to design your own NAS process, you can change any of those parts according to your demand.
  21. Usage
  22. -----
  23. You can directly enable architecture search for node classification tasks by passing the algorithms, spaces and estimators to
  24. solver. Following shows an example:
  25. .. code-block:: python
  26. # Use graphnas to solve cora
  27. from autogl.datasets import build_dataset_from_name
  28. from autogl.solver import AutoNodeClassifier
  29. solver = AutoNodeClassifier(
  30. feature = 'PYGNormalizeFeatures',
  31. graph_models = (),
  32. hpo = 'tpe',
  33. ensemble = None,
  34. nas_algorithms=['rl'],
  35. nas_spaces='graphnasmacro',
  36. nas_estimators=['scratch']
  37. )
  38. cora = build_dataset_from_name('cora')
  39. solver.fit(cora)
  40. The code above will first find the best architecture in space ``graphnasmacro`` using ``rl`` search algorithm.
  41. Then the searched architecture will be further optimized through hyperparameter-optimization ``tpe``.
  42. .. note:: The ``graph_models`` argument is not conflict with nas module. You can set ``graph_models`` to
  43. other hand-crafted models beside the ones found by nas. Once the architectures are derived from nas module,
  44. they act in the same way as hand-crafted models directly passed through graph_models.
  45. Search Space
  46. ------------
  47. The space definition is base on mutable fashion used in NNI, which is defined as a model inheriting BaseSpace
  48. There are mainly two ways to define your search space, one can be performed with one-shot fashion while the other cannot.
  49. Currently, we support following search space:
  50. +------------------------+-----------------------------------------------------------------+
  51. | Space | Description |
  52. +========================+=================================================================+
  53. | ``singlepath`` [4]_ | Architectures with several sequential layers with each layer |
  54. | | choosing only one path |
  55. +------------------------+-----------------------------------------------------------------+
  56. | ``graphnas`` [1]_ | The graph nas micro search space designed for fully supervised |
  57. | | node classification models |
  58. +------------------------+-----------------------------------------------------------------+
  59. | ``graphnasmacro`` [1]_ | The graph nas macro search space designed for semi-superwised |
  60. | | node classification models |
  61. +------------------------+-----------------------------------------------------------------+
  62. You can also define your own nas search space.
  63. If you need one-shot fashion, you should use the function ``setLayerChoice`` and ``setInputChoice`` to construct the super network.
  64. Here is an example.
  65. .. code-block:: python
  66. # For example, create an NAS search space by yourself
  67. from autogl.module.nas.space.base import BaseSpace
  68. from autogl.module.nas.space.operation import gnn_map
  69. class YourOneShotSpace(BaseSpace):
  70. # Get essential parameters at initialization
  71. def __init__(self, input_dim = None, output_dim = None):
  72. super().__init__()
  73. # must contain input_dim and output_dim in space, or you can initialize these two parameters in function `instantiate`
  74. self.input_dim = input_dim
  75. self.output_dim = output_dim
  76. # Instantiate the super network
  77. def instantiate(self, input_dim = None, output_dim = None):
  78. # must call super in this function
  79. super().instantiate()
  80. self.input_dim = input_dim or self.input_dim
  81. self.output_dim = output_dim or self.output_dim
  82. # define two layers with order 0 and 1
  83. setattr(self, 'layer0', self.setLayerChoice(0, [gnn_map(op,self.input_dim,self.output_dim)for op in ['gcn', 'gat']], key = 'layer0')
  84. setattr(self, 'layer1', self.setLayerChoice(1, [gnn_map(op,self.input_dim,self.output_dim)for op in ['gcn', 'gat']], key = 'layer1')
  85. # define an input choice to choose from the result of the two layer
  86. setattr(self, 'input_layer', self.setInputChoice(2, choose_from = ['layer0', 'layer1'], n_chosen = 1, returen_mask = False, key = 'input_layer'))
  87. self._initialized = True
  88. # Define the forward process
  89. def forward(self, data):
  90. x, edges = data.x, data.edge_index
  91. x_0 = self.layer0(x, edges)
  92. x_1 = self.layer1(x, edges)
  93. y = self.input_layer([x_0, x_1])
  94. y = F.log_fostmax(y, dim = 1)
  95. return y
  96. # For one-shot fashion, you can directly use following scheme in ``parse_model``
  97. def parse_model(self, selection, device) -> BaseModel:
  98. return self.wrap().fix(selection)
  99. Also, you can use the way which does not support one shot fashion.
  100. In this way, you can directly copy you model with few changes.
  101. But you can only use sample-based search strategy.
  102. .. code-block:: python
  103. # For example, create an NAS search space by yourself
  104. from autogl.module.nas.space.base import BaseSpace, map_nn
  105. from autogl.module.nas.space.operation import gnn_map
  106. # here we search from three types of graph convolution with `head` as a parameter
  107. # we should search `heads` at the same time with the convolution
  108. from torch_geometric.nn import GATConv, FeaStConv, TransformerConv
  109. class YourNonOneShotSpace(BaseSpace):
  110. # Get essential parameters at initialization
  111. def __init__(self, input_dim = None, output_dim = None):
  112. super().__init__()
  113. # must contain input_dim and output_dim in space, or you can initialize these two parameters in function `instantiate`
  114. self.input_dim = input_dim
  115. self.output_dim = output_dim
  116. # Instantiate the super network
  117. def instantiate(self, input_dim, output_dim):
  118. # must call super in this function
  119. super().instantiate()
  120. self.input_dim = input_dim or self.input_dim
  121. self.output_dim = output_dim or self.output_dim
  122. # set your choices as LayerChoices
  123. self.choice0 = self.setLayerChoice(0, map_nn(["gat", "feast", "transformer"]), key="conv")
  124. self.choice1 = self.setLayerChoice(1, map_nn([1, 2, 4, 8]), key="head")
  125. # You do not need to define forward process here
  126. # For non-one-shot fashion, you can directly return your model based on the choices
  127. # ``YourModel`` must inherit BaseSpace.
  128. def parse_model(self, selection, device) -> BaseModel:
  129. model = YourModel(selection, self.input_dim, self.output_dim).wrap()
  130. return model
  131. # YourModel can be defined as follows
  132. class YourModel(BaseSpace):
  133. def __init__(self, selection, input_dim, output_dim):
  134. self.input_dim = input_dim
  135. self.output_dim = output_dim
  136. if selection["conv"] == "gat":
  137. conv = GATConv
  138. elif selection["conv"] == "feast":
  139. conv = FeaStConv
  140. elif selection["conv"] == "transformer":
  141. conv = TransformerConv
  142. self.layer = conv(input_dim, output_dim, selection["head"])
  143. def forward(self, data):
  144. x, edges = data.x, data.edge_index
  145. y = self.layer(x, edges)
  146. return y
  147. Performance Estimator
  148. ---------------------
  149. The performance estimator estimates the performance of an architecture. Currently we support following estimators:
  150. +-------------------------+-------------------------------------------------------+
  151. | Estimator | Description |
  152. +=========================+=======================================================+
  153. | ``oneshot`` | Directly evaluating the given models without training |
  154. +-------------------------+-------------------------------------------------------+
  155. | ``scratch`` | Train the models from scratch and then evaluate them |
  156. +-------------------------+-------------------------------------------------------+
  157. You can also write your own estimator. Here is an example of estimating an architecture without training (used in one-shot space).
  158. .. code-block:: python
  159. # For example, create an NAS estimator by yourself
  160. from autogl.module.nas.estimator.base import BaseEstimator
  161. class YourOneShotEstimator(BaseEstimator):
  162. # The only thing you should do is defining ``infer`` function
  163. def infer(self, model: BaseSpace, dataset, mask="train"):
  164. device = next(model.parameters()).device
  165. dset = dataset[0].to(device)
  166. # Forward the architecture
  167. pred = model(dset)[getattr(dset, f"{mask}_mask")]
  168. y = dset.y[getattr(dset, f'{mask}_mask')]
  169. # Use default loss function and metrics to evaluate the architecture
  170. loss = getattr(F, self.loss_f)(pred, y)
  171. probs = F.softmax(pred, dim = 1)
  172. metrics = [eva.evaluate(probs, y) for eva in self.evaluation]
  173. return metrics, loss
  174. Search Strategy
  175. ---------------
  176. The space strategy defines how to find an architecture. We currently support following search strategies:
  177. +-------------------------+-------------------------------------------------------+
  178. | Strategy | Description |
  179. +=========================+=======================================================+
  180. | ``random`` | Random search by uniform sampling |
  181. +-------------------------+-------------------------------------------------------+
  182. | ``rl`` [1]_ | Use rl as architecture generator agent |
  183. +-------------------------+-------------------------------------------------------+
  184. | ``enas`` [2]_ | efficient neural architecture search |
  185. +-------------------------+-------------------------------------------------------+
  186. | ``darts`` [3]_ | differentiable neural architecture search |
  187. +-------------------------+-------------------------------------------------------+
  188. Sample-based strategy without weight sharing is simpler than strategies with weight sharing.
  189. We show how to define your strategy here with DFS as an example.
  190. If you want to define more complex strategy, you can refer to Darts, Enas or other strategies in NNI.
  191. .. code-block:: python
  192. from autogl.module.nas.algorithm.base import BaseNAS
  193. class RandomSearch(BaseNAS):
  194. # Get the number of samples at initialization
  195. def __init__(self, n_sample):
  196. super().__init__()
  197. self.n_sample = n_sample
  198. # The key process in NAS algorithm, search for an architecture given space, dataset and estimator
  199. def search(self, space: BaseSpace, dset, estimator):
  200. self.estimator=estimator
  201. self.dataset=dset
  202. self.space=space
  203. self.nas_modules = []
  204. k2o = get_module_order(self.space)
  205. # collect all mutables in the space
  206. replace_layer_choice(self.space, PathSamplingLayerChoice, self.nas_modules)
  207. replace_input_choice(self.space, PathSamplingInputChoice, self.nas_modules)
  208. # sort all mutables with given orders
  209. self.nas_modules = sort_replaced_module(k2o, self.nas_modules)
  210. # get a dict cantaining all chioces
  211. selection_range={}
  212. for k,v in self.nas_modules:
  213. selection_range[k]=len(v)
  214. self.selection_dict=selection_range
  215. arch_perfs=[]
  216. # define DFS process
  217. self.selection = {}
  218. last_k = list(self.selection_dict.keys())[-1]
  219. def dfs():
  220. for k,v in self.selection_dict.items():
  221. if not k in self.selection:
  222. for i in range(v):
  223. self.selection[k] = i
  224. if k == last_k:
  225. # evaluate an architecture
  226. self.arch=space.parse_model(self.selection,self.device)
  227. metric,loss=self._infer(mask='val')
  228. arch_perfs.append([metric, self.selection.copy()])
  229. else:
  230. dfs()
  231. del self.selection[k]
  232. break
  233. dfs()
  234. # get the architecture with the best performance
  235. selection=arch_perfs[np.argmax([x[0] for x in arch_perfs])][1]
  236. arch=space.parse_model(selection,self.device)
  237. return arch
  238. Different search strategies should be combined with different search spaces and estimators in usage.
  239. +----------------+-------------+-------------+------------------+
  240. | Space | single path | GraphNAS[1] | GraphNAS-macro[1]|
  241. +================+=============+=============+==================+
  242. | Random | ✓ | ✓ | ✓ |
  243. +----------------+-------------+-------------+------------------+
  244. | RL | ✓ | ✓ | ✓ |
  245. +----------------+-------------+-------------+------------------+
  246. | GraphNAS [1]_ | ✓ | ✓ | ✓ |
  247. +----------------+-------------+-------------+------------------+
  248. | ENAS [2]_ | ✓ | | |
  249. +----------------+-------------+-------------+------------------+
  250. | DARTS [3]_ | ✓ | | |
  251. +----------------+-------------+-------------+------------------+
  252. +----------------+-------------+-------------+
  253. | Estimator | one-shot | Train |
  254. +================+=============+=============+
  255. | Random | | ✓ |
  256. +----------------+-------------+-------------+
  257. | RL | | ✓ |
  258. +----------------+-------------+-------------+
  259. | GraphNAS [1]_ | | ✓ |
  260. +----------------+-------------+-------------+
  261. | ENAS [2]_ | ✓ | |
  262. +----------------+-------------+-------------+
  263. | DARTS [3]_ | ✓ | |
  264. +----------------+-------------+-------------+
  265. .. [1] Gao, Yang, et al. "Graph neural architecture search." IJCAI. Vol. 20. 2020.
  266. .. [2] Pham, Hieu, et al. "Efficient neural architecture search via parameters sharing." International Conference on Machine Learning. PMLR, 2018.
  267. .. [3] Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "DARTS: Differentiable Architecture Search." International Conference on Learning Representations. 2018.
  268. .. [4] Guo, Zichao, et al. “Single Path One-Shot Neural Architecture Search with Uniform Sampling.” European Conference on Computer Vision, 2019, pp. 544–560.