You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

t_dataset.rst 6.5 kB

4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
  1. .. _dataset:
  2. AutoGL Dataset
  3. ==============
  4. We provide various common datasets based on ``PyTorch-Geometric``, ``Deep Graph Library`` and ``OGB``.
  5. Besides, users are able to leverage a unified abstraction provided in AutoGL, ``GeneralStaticGraph``, which is towards both static homogeneous graph and static heterogeneous graph.
  6. A basic example to construct an instance of ``GeneralStaticGraph`` is shown as follows.
  7. .. code-block:: python
  8. from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator
  9. ''' Construct a custom homogeneous graph '''
  10. custom_static_homogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
  11. {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
  12. )
  13. ''' Construct a custom heterogemneous graph '''
  14. custom_static_heterogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
  15. {
  16. 'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
  17. 'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
  18. },
  19. {
  20. ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
  21. ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
  22. }
  23. )
  24. Supporting datasets
  25. -------------------
  26. AutoGL now supports the following benchmarks for different tasks:
  27. Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Coauthor CS, Coauthor Physics, Reddit, etc.
  28. +------------------+------------+-----------+--------------------------------+
  29. | Dataset | PyG | DGL | default train/val/test split |
  30. +==================+============+===========+================================+
  31. | Cora | ✓ | ✓ | ✓ |
  32. +------------------+------------+-----------+--------------------------------+
  33. | Citeseer | ✓ | ✓ | ✓ |
  34. +------------------+------------+-----------+--------------------------------+
  35. | Pubmed | ✓ | ✓ | ✓ |
  36. +------------------+------------+-----------+--------------------------------+
  37. | Amazon Computers | ✓ | ✓ | |
  38. +------------------+------------+-----------+--------------------------------+
  39. | Amazon Photo | ✓ | ✓ | |
  40. +------------------+------------+-----------+--------------------------------+
  41. | Coauthor CS | ✓ | ✓ | |
  42. +------------------+------------+-----------+--------------------------------+
  43. | Coauthor Physics | ✓ | ✓ | |
  44. +------------------+------------+-----------+--------------------------------+
  45. | Reddit | ✓ | ✓ | ✓ |
  46. +------------------+------------+-----------+--------------------------------+
  47. | ogbn-products | ✓ | ✓ | ✓ |
  48. +------------------+------------+-----------+--------------------------------+
  49. | ogbn-proteins | ✓ | ✓ | ✓ |
  50. +------------------+------------+-----------+--------------------------------+
  51. | ogbn-arxiv | ✓ | ✓ | ✓ |
  52. +------------------+------------+-----------+--------------------------------+
  53. | ogbn-papers100M | ✓ | ✓ | ✓ |
  54. +------------------+------------+-----------+--------------------------------+
  55. Graph classification: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB, etc.
  56. +-------------+------------+------------+--------------+------------+--------------------+
  57. | Dataset | PyG | DGL | Node Feature | Label | Edge Features |
  58. +=============+============+============+==============+============+====================+
  59. | MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ |
  60. +-------------+------------+------------+--------------+------------+--------------------+
  61. | IMDB-Binary | ✓ | ✓ | | ✓ | |
  62. +-------------+------------+------------+--------------+------------+--------------------+
  63. | IMDB-Multi | ✓ | ✓ | | ✓ | |
  64. +-------------+------------+------------+--------------+------------+--------------------+
  65. | PROTEINS | ✓ | ✓ | ✓ | ✓ | |
  66. +-------------+------------+------------+--------------+------------+--------------------+
  67. | COLLAB | ✓ | ✓ | | ✓ | |
  68. +-------------+------------+------------+--------------+------------+--------------------+
  69. | ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ |
  70. +-------------+------------+------------+--------------+------------+--------------------+
  71. | ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ |
  72. +-------------+------------+------------+--------------+------------+--------------------+
  73. | ogbg-ppa | ✓ | ✓ | | ✓ | ✓ |
  74. +-------------+------------+------------+--------------+------------+--------------------+
  75. | ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ |
  76. +-------------+------------+------------+--------------+------------+--------------------+
  77. Link Prediction: At present, AutoGL utilizes various homogeneous graphs towards node classification to conduct automatic link prediction.
  78. Construct custom dataset by instances of GeneralStaticGraph
  79. ------------------------------------------------------------
  80. The following example shows the way to compose a custom dataset by a sequence of instances of ``GeneralStaticGraph``.
  81. .. code-block:: python
  82. from autogl.data import InMemoryDataset
  83. ''' Suppose the graphs is a sequence of instances of GeneralStaticGraph '''
  84. graphs = [ ... ]
  85. custom_dataset = InMemoryDataset(graphs)