Browse Source

Revise tutorial for backend and datasets

develop/0.4/predevelop
CoreLeader 4 years ago
parent
commit
dde3d3aef5
4 changed files with 221 additions and 134 deletions
  1. +33
    -0
      docs/docfile/tutorial/t_backend-cn.rst
  2. +2
    -2
      docs/docfile/tutorial/t_backend.rst
  3. +100
    -0
      docs/docfile/tutorial/t_dataset-cn.rst
  4. +86
    -132
      docs/docfile/tutorial/t_dataset.rst

+ 33
- 0
docs/docfile/tutorial/t_backend-cn.rst View File

@@ -0,0 +1,33 @@
.. _backend:

Backend Support
===============

目前,AutoGL支持使用PyTorch-Geometric或Deep Graph Library作为后端,以便熟悉两者之一的用户均可受益于自动图学习。

为指定特定的后端,用户可以使用环境变量``AUTOGL_BACKEND``进行声明,例如:

.. code-block:: python

AUTOGL_BACKEND=pyg python xxx.py


.. code-block:: python

import os
os.environ["AUTOGL_BACKEND"] = "pyg"
import autogl

...


如果环境变量``AUTOGL_BACKEND``未声明,AutoGL会根据用户的Python运行环境中所安装的图学习库自动选择。
如果PyTorch-Geometric和Deep Graph Library均已安装,则Deep Graph Library将被作为默认的后端。

可以以编程方式获得当前使用的后端:

.. code-block:: python

from autogl.backend import DependentBackend
print(DependentBackend.get_backend_name())

+ 2
- 2
docs/docfile/tutorial/t_backend.rst View File

@@ -9,13 +9,13 @@ enable users from both end benifiting the automation of graph learning.
To specify one specific backend, you can declare the backend using environment variables To specify one specific backend, you can declare the backend using environment variables
``AUTOGL_BACKEND``. For example: ``AUTOGL_BACKEND``. For example:


.. code-block :: shell
.. code-block:: python


AUTOGL_BACKEND=pyg python xxx.py AUTOGL_BACKEND=pyg python xxx.py


or or


.. code-block :: python
.. code-block:: python


import os import os
os.environ["AUTOGL_BACKEND"] = "pyg" os.environ["AUTOGL_BACKEND"] = "pyg"


+ 100
- 0
docs/docfile/tutorial/t_dataset-cn.rst View File

@@ -0,0 +1,100 @@
.. _dataset:

AutoGL 数据集
==============

我们基于PyTorch-Geometric (PyG),Deep Graph Learning (DGL)及Open Graph Benchmark (OGB)等图学习库提供了多种多样的常用数据集。
同时,用户可以使用AutoGL所提供的统一静态图容器``GeneralStaticGraph``自定义静态同构图及异构图,例如:

.. code-block:: python
from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

''' 创建同构图 '''
custom_static_homogeneous_graph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
{'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
)

''' 创建异构图 '''
custom_static_heterogeneous_graph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
{
'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
},
{
('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
}
)


提供的常用数据集
----------------
AutoGL目前提供如下多种常用基准数据集:

半监督节点分类:

+------------------+------------+-----------+--------------------------------+
| 数据集 | PyG | DGL | 默认train/val/test划分 |
+==================+============+===========+================================+
| Cora | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Citeseer | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Pubmed | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Amazon Computers | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Amazon Photo | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor CS | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor Physics | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Reddit | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-products | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-proteins | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-arxiv | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-papers100M | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+


图分类任务: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB等

+-------------+------------+------------+--------------+------------+--------------------+
| 数据集 | PyG | DGL | 节点特征 | 标签 | 边特征 |
+=============+============+============+==============+============+====================+
| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Binary | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Multi | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| PROTEINS | ✓ | ✓ | ✓ | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| COLLAB | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+


链接预测任务:目前AutoGL可以使用针对节点分类任务的多种图数据进行自动链接预测。

通过GeneralStaticGraph序列构建自定义数据集
----------------------------------------------------------------
如下代码片段展示了通过一个由``GeneralStaticGraph``序列构建自定义数据集的方法。

.. code-block:: python
from autogl.data import InMemoryDataset
''' graphs变量是一个由GeneralStaticGraph实例所构成的序列 '''
graphs = [ ... ]
custom_dataset = InMemoryDataset(graphs)

+ 86
- 132
docs/docfile/tutorial/t_dataset.rst View File

@@ -3,144 +3,98 @@
AutoGL Dataset AutoGL Dataset
============== ==============


We import the module of datasets from `CogDL` and `PyTorch Geometric` and add support for datasets from `OGB`. One can refer to the usage of creating and building datasets via the tutorial of `CogDL`_, `PyTorch Geometric`_, and `OGB`_.
We provide various common datasets based on ``PyTorch-Geometric``, ``Deep Graph Library`` and ``OGB``.
Besides, users are able to leverage a unified abstraction provided in AutoGL, ``GeneralStaticGraph``, which is towards both static homogeneous graph and static heterogeneous graph.


.. _CogDL: https://cogdl.readthedocs.io/en/latest/tutorial.html
.. _PyTorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
.. _OGB: https://ogb.stanford.edu/docs/dataset_overview/


A basic example to construct an instance of ``GeneralStaticGraph`` is shown as follows.

.. code-block:: python
from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

''' Construct a custom homogeneous graph '''
custom_static_homogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
{'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
)

''' Construct a custom heterogemneous graph '''
custom_static_heterogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
{
'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
},
{
('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
}
)


Supporting datasets Supporting datasets
------------------- -------------------
AutoGL now supports the following benchmarks for different tasks: AutoGL now supports the following benchmarks for different tasks:


Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers\*, Amazon Photo\*, Coauthor CS\*, Coauthor Physics\*, Reddit (\*: using `utils.random_splits_mask_class` for splitting dataset is recommended.).
For detailed information for supporting datasets, please kindly refer to `PyTorch Geometric Dataset`_.

.. _PyTorch Geometric Dataset: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html

+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Dataset | PyG | CogDL | x | y | edge_index| edge_attr | train/val/test node | train/val/test mask |
+==================+============+===========+============+============+===========+============+====================+=====================+
| Cora | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Citeseer | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Pubmed | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Amazon Computers | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Amazon Photo | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Coauthor CS | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Coauthor Physics | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Reddit | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+

Graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB

+-----------+------------+------------+-----------+------------+------------+-----------+
| Dataset | PyG | CogDL | x | y | edge_index | edge_attr |
+===========+============+============+===========+============+============+===========+
| MUTAG | ✓ | | ✓ | ✓ | ✓ | ✓ |
+-----------+------------+------------+-----------+------------+------------+-----------+
| IMDB-B | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| IMDB-M | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| PROTEINS | ✓ | | ✓ | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| COLLAB | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+

TODO: Supporting all datasets from `PyTorch Geometric`.

OGB datasets
------------
AutoGL also supports the popular benchmark on `OGB` for node classification and graph classification tasks. For the summary of `OGB` datasets, please kindly refer to the their `docs`_.

.. _docs: https://ogb.stanford.edu/docs/nodeprop/

Since the loss and evaluation metric used for `OGB` datasets vary among different tasks, we also add `string` properties of datasets for identification:

+-----------------+----------------+-------------------+
| Dataset | dataset.metric | datasets.loss |
+=================+================+===================+
| ogbn-products | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-proteins | ROC-AUC | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbn-arxiv | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-papers100M | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-mag | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbg-molhiv | ROC-AUC | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbg-molpcba | AP | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbg-ppa | Accuracy | CrossEntropyLoss |
+-----------------+----------------+-------------------+
| ogbg-code | F1 score | CrossEntropyLoss |
+-----------------+----------------+-------------------+


Create a dataset via URL
------------------------

If your dataset is the same as the 'ppi' dataset, which contains two matrices: 'network' and 'group', you can register your dataset directly use the above code. The default root for downloading dataset is `~/.cache-autogl`, you can also specify the root by passing the string to the `path` in `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)`.

.. code-block:: python

# following code-snippet is from autogl/datasets/matlab_matrix.py

@register_dataset("ppi")
class PPIDataset(MatlabMatrix):
def __init__(self, path):
dataset, filename = "ppi", "Homo_sapiens"
url = "http://snap.stanford.edu/node2vec/"
super(PPIDataset, self).__init__(path, filename, url)

You should declare the name of the dataset, the name of the file, and the URL, where our script can download the resource. Then you can use either `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)` in your task to build a dataset with corresponding parameters.

Create a dataset locally
------------------------

If you want to test your local dataset, we recommend you to refer to the docs on `creating PyTorch Geometric dataset`_.

.. _creating PyTorch Geometric dataset: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html


You can simply inherit from `torch_geometric.data.InMemoryDataset` to create an empty `dataset`, then create some `torch_geometric.data.Data` objects for your data and pass a regular python list holding them, then pass them to `torch_geometric.data.Dataset` or `torch_geometric.data.DataLoader`.
Let’s see this process in a simplified example:
Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Coauthor CS, Coauthor Physics, Reddit, etc.

+------------------+------------+-----------+--------------------------------+
| Dataset | PyG | DGL | default train/val/test split |
+==================+============+===========+================================+
| Cora | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Citeseer | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Pubmed | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Amazon Computers | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Amazon Photo | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor CS | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor Physics | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Reddit | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-products | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-proteins | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-arxiv | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-papers100M | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+

Graph classification: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB, etc.

+-------------+------------+------------+--------------+------------+--------------------+
| Dataset | PyG | DGL | Node Feature | Label | Edge Features |
+=============+============+============+==============+============+====================+
| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Binary | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Multi | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| PROTEINS | ✓ | ✓ | ✓ | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| COLLAB | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+

Link Prediction: At present, AutoGL utilizes various homogeneous graphs towards node classification to conduct automatic link prediction.

Construct custom dataset by instances of GeneralStaticGraph
------------------------------------------------------------
The following example shows the way to compose a custom dataset by a sequence of instances of ``GeneralStaticGraph``.


.. code-block:: python .. code-block:: python

from typing import Iterable
from torch_geometric.data.data import Data
from autogl.datasets import build_dataset_from_name
from torch_geometric.data import InMemoryDataset

class MyDataset(InMemoryDataset):
def __init__(self, datalist) -> None:
super().__init__()
self.data, self.slices = self.collate(datalist)

# Create your own Data objects

# for example, if you have edge_index, features and labels
# you can create a Data as follows
# See pytorch geometric more info of Data
data = Data()
data.edge_index = edge_index
data.x = features
data.y = labels

# create a list of Data object
data_list = [data, Data(...), ..., Data(...)]

# Initialize AutoGL Dataset with your own data
myData = MyDataset(data_list)
from autogl.data import InMemoryDataset
''' Suppose the graphs is a sequence of instances of GeneralStaticGraph '''
graphs = [ ... ]
custom_dataset = InMemoryDataset(graphs)

Loading…
Cancel
Save