You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 6.7 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166
  1. <!--TOC -->
  2. - [Graph Attention Networks Description](#graph-attention-networks-description)
  3. - [Model architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Data Preparation](#data-preparation)
  6. - [Features](#features)
  7. - [Mixed Precision](#mixed-precision)
  8. - [Environment Requirements](#environment-requirements)
  9. - [Structure](#structure)
  10. - [Parameter configuration](#parameter-configuration)
  11. - [Running the example](#running-the-example)
  12. - [Usage](#usage)
  13. - [Result](#result)
  14. - [Description of random situation](#description-of-random-situation)
  15. - [Others](#others)
  16. <!--TOC -->
  17. # Graph Attention Networks Description
  18. Graph Attention Networks(GAT) was proposed in 2017 by Petar Veličković et al. By leveraging masked self-attentional layers to address shortcomings of prior graph based method, GAT achieved or matched state of the art performance on both transductive datasets like Cora and inductive dataset like PPI. This is an example of training GAT with Cora dataset in MindSpore.
  19. [Paper](https://arxiv.org/abs/1710.10903): Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
  20. # Model architecture
  21. An illustration of multi- head attention (with K = 3 heads) by node 1 on its neighborhood can be found below:
  22. ![](https://camo.githubusercontent.com/4fe1a90e67d17a2330d7cfcddc930d5f7501750c/68747470733a2f2f7777772e64726f70626f782e636f6d2f732f71327a703170366b37396a6a6431352f6761745f6c617965722e706e673f7261773d31)
  23. Note that according to whether this attention layer is the output layer of the network or not, the node update function can be concatenate or average.
  24. # Dataset
  25. Statistics of dataset used are summerized as below:
  26. | | Cora | Citeseer |
  27. | ------------------ | -------------: | -------------: |
  28. | Task | Transductive | Transductive |
  29. | # Nodes | 2708 (1 graph) | 3327 (1 graph) |
  30. | # Edges | 5429 | 4732 |
  31. | # Features/Node | 1433 | 3703 |
  32. | # Classes | 7 | 6 |
  33. | # Training Nodes | 140 | 120 |
  34. | # Validation Nodes | 500 | 500 |
  35. | # Test Nodes | 1000 | 1000 |
  36. ## Data Preparation
  37. Download the dataset Cora or Citeseer provided by /kimiyoung/planetoid from github.
  38. > Place the dataset to any path you want, the folder should include files as follows(we use Cora dataset as an example):
  39. ```
  40. .
  41. └─data
  42. ├─ind.cora.allx
  43. ├─ind.cora.ally
  44. ├─ind.cora.graph
  45. ├─ind.cora.test.index
  46. ├─ind.cora.tx
  47. ├─ind.cora.ty
  48. ├─ind.cora.x
  49. └─ind.cora.y
  50. ```
  51. > Generate dataset in mindrecord format for cora or citeseer.
  52. >> Usage
  53. ```buildoutcfg
  54. cd ./scripts
  55. # SRC_PATH is the dataset file path you downloaded, DATASET_NAME is cora or citeseer
  56. sh run_process_data.sh [SRC_PATH] [DATASET_NAME]
  57. ```
  58. >> Launch
  59. ```
  60. #Generate dataset in mindrecord format for cora
  61. ./run_process_data.sh ./data cora
  62. #Generate dataset in mindrecord format for citeseer
  63. ./run_process_data.sh ./data citeseer
  64. ```
  65. # Features
  66. ## Mixed Precision
  67. To ultilize the strong computation power of Ascend chip, and accelerate the training process, the mixed training method is used. MindSpore is able to cope with FP32 inputs and FP16 operators. In GAT example, the model is set to FP16 mode except for the loss calculation part.
  68. # Environment Requirements
  69. - Hardward (Ascend)
  70. - Install [MindSpore](https://www.mindspore.cn/install/en).
  71. # Structure
  72. ```shell
  73. .
  74. └─gat
  75. ├─README.md
  76. ├─scripts
  77. | ├─run_process_data.sh # Generate dataset in mindrecord format
  78. | └─run_train.sh # Launch training
  79. |
  80. ├─src
  81. | ├─config.py # Training configurations
  82. | ├─dataset.py # Data preprocessing
  83. | ├─gat.py # GAT model
  84. | └─utils.py # Utils for training gat
  85. |
  86. └─train.py # Train net
  87. ```
  88. ## Parameter configuration
  89. Parameters for training can be set in config.py.
  90. ```
  91. "learning_rate": 0.005, # Learning rate
  92. "num_epochs": 200, # Epoch sizes for training
  93. "hid_units": [8], # Hidden units for attention head at each layer
  94. "n_heads": [8, 1], # Num heads for each layer
  95. "early_stopping": 100, # Early stop patience
  96. "l2_coeff": 0.0005 # l2 coefficient
  97. "attn_dropout": 0.6 # Attention dropout ratio
  98. "feature_dropout":0.6 # Feature dropout ratio
  99. ```
  100. # Running the example
  101. ## Usage
  102. After Dataset is correctly generated.
  103. ```
  104. # run train with cora dataset, DATASET_NAME is cora
  105. sh run_train.sh [DATASET_NAME]
  106. ```
  107. ## Result
  108. Training result will be stored in the scripts path, whose folder name begins with "train". You can find the result like the followings in log.
  109. ```
  110. Epoch:0, train loss=1.98498 train acc=0.17143 | val loss=1.97946 val acc=0.27200
  111. Epoch:1, train loss=1.98345 train acc=0.15000 | val loss=1.97233 val acc=0.32600
  112. Epoch:2, train loss=1.96968 train acc=0.21429 | val loss=1.96747 val acc=0.37400
  113. Epoch:3, train loss=1.97061 train acc=0.20714 | val loss=1.96410 val acc=0.47600
  114. Epoch:4, train loss=1.96864 train acc=0.13571 | val loss=1.96066 val acc=0.59600
  115. ...
  116. Epoch:195, train loss=1.45111 train_acc=0.56429 | val_loss=1.44325 val_acc=0.81200
  117. Epoch:196, train loss=1.52476 train_acc=0.52143 | val_loss=1.43871 val_acc=0.81200
  118. Epoch:197, train loss=1.35807 train_acc=0.62857 | val_loss=1.43364 val_acc=0.81400
  119. Epoch:198, train loss=1.47566 train_acc=0.51429 | val_loss=1.42948 val_acc=0.81000
  120. Epoch:199, train loss=1.56411 train_acc=0.55000 | val_loss=1.42632 val_acc=0.80600
  121. Test loss=1.5366285, test acc=0.84199995
  122. ...
  123. ```
  124. Results on Cora dataset is shown by table below:
  125. | | MindSpore + Ascend910 | Tensorflow + V100 |
  126. | ------------------------------------ | --------------------: | ----------------: |
  127. | Accuracy | 0.830933271 | 0.828649968 |
  128. | Training Cost(200 epochs) | 27.62298311s | 36.711862s |
  129. | End to End Training Cost(200 epochs) | 39.074s | 50.894s |
  130. # Description of random situation
  131. GAT model contains lots of dropout operations, if you want to disable dropout, set the attn_dropout and feature_dropout to 0 in src/config.py. Note that this operation will cause the accuracy drop to approximately 80%.
  132. # Others
  133. GAT model is verified on Ascend environment, not on CPU or GPU.