Revise tutorial for backend and datasets

4 years ago · dde3d3aef5
--- a/docs/docfile/tutorial/t_backend-cn.rst
+++ b/docs/docfile/tutorial/t_backend-cn.rst
@@ -0,0 +1,33 @@
 .. _backend:

 Backend Support
 ===============

 目前，AutoGL支持使用PyTorch-Geometric或Deep Graph Library作为后端，以便熟悉两者之一的用户均可受益于自动图学习。

 为指定特定的后端，用户可以使用环境变量``AUTOGL_BACKEND``进行声明，例如：

 .. code-block:: python

    AUTOGL_BACKEND=pyg python xxx.py

 或

 .. code-block:: python

    import os
    os.environ["AUTOGL_BACKEND"] = "pyg"
    import autogl

    ...


 如果环境变量``AUTOGL_BACKEND``未声明，AutoGL会根据用户的Python运行环境中所安装的图学习库自动选择。
 如果PyTorch-Geometric和Deep Graph Library均已安装，则Deep Graph Library将被作为默认的后端。

 可以以编程方式获得当前使用的后端：

 .. code-block:: python

    from autogl.backend import DependentBackend
    print(DependentBackend.get_backend_name())
--- a/docs/docfile/tutorial/t_backend.rst
+++ b/docs/docfile/tutorial/t_backend.rst
@@ -9,13 +9,13 @@ enable users from both end benifiting the automation of graph learning.
 To specify one specific backend, you can declare the backend using environment variables
 ``AUTOGL_BACKEND``. For example:

 .. code-block :: shell
 .. code-block:: python

    AUTOGL_BACKEND=pyg python xxx.py

 or

 .. code-block :: python
 .. code-block:: python

    import os
    os.environ["AUTOGL_BACKEND"] = "pyg"
--- a/docs/docfile/tutorial/t_dataset-cn.rst
+++ b/docs/docfile/tutorial/t_dataset-cn.rst
@@ -0,0 +1,100 @@
 .. _dataset:

 AutoGL 数据集
 ==============

 我们基于PyTorch-Geometric (PyG)，Deep Graph Learning (DGL)及Open Graph Benchmark (OGB)等图学习库提供了多种多样的常用数据集。
 同时，用户可以使用AutoGL所提供的统一静态图容器``GeneralStaticGraph``自定义静态同构图及异构图，例如：

 .. code-block:: python
    from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

    ''' 创建同构图 '''
    custom_static_homogeneous_graph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
        {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
    )

    ''' 创建异构图 '''
    custom_static_heterogeneous_graph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
        {
            'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
            'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
        },
        {
            ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
            ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
        }
    )


 提供的常用数据集
 ----------------
 AutoGL目前提供如下多种常用基准数据集：

 半监督节点分类：

 +------------------+------------+-----------+--------------------------------+
 | 数据集            |  PyG       |  DGL      |  默认train/val/test划分         |
 +==================+============+===========+================================+
 | Cora             | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Citeseer         | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Pubmed           | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Amazon Computers | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Amazon Photo     | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Coauthor CS      | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Coauthor Physics | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Reddit           | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-products    | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-proteins    | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-arxiv       | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-papers100M  | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+


 图分类任务： MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB等

 +-------------+------------+------------+--------------+------------+--------------------+
 |  数据集      | PyG        | DGL        | 节点特征      | 标签        | 边特征             |
 +=============+============+============+==============+============+====================+
 | MUTAG       | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | IMDB-Binary | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | IMDB-Multi  | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | PROTEINS    | ✓          | ✓          |  ✓           | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | COLLAB      | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-molhiv | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-molpcba| ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-ppa    | ✓          | ✓          |              | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-code2  | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+


 链接预测任务：目前AutoGL可以使用针对节点分类任务的多种图数据进行自动链接预测。

 通过GeneralStaticGraph序列构建自定义数据集
 ----------------------------------------------------------------
 如下代码片段展示了通过一个由``GeneralStaticGraph``序列构建自定义数据集的方法。

 .. code-block:: python
    from autogl.data import InMemoryDataset
    ''' graphs变量是一个由GeneralStaticGraph实例所构成的序列 '''
    graphs = [ ... ]
    custom_dataset = InMemoryDataset(graphs)
--- a/docs/docfile/tutorial/t_dataset.rst
+++ b/docs/docfile/tutorial/t_dataset.rst
@@ -3,144 +3,98 @@
 AutoGL Dataset
 ==============

 We import the module of datasets from `CogDL` and `PyTorch Geometric` and add support for datasets from `OGB`. One can refer to the usage of creating and building datasets via the tutorial of `CogDL`_, `PyTorch Geometric`_, and `OGB`_.
 We provide various common datasets based on ``PyTorch-Geometric``, ``Deep Graph Library`` and ``OGB``.
 Besides, users are able to leverage a unified abstraction provided in AutoGL, ``GeneralStaticGraph``, which is towards both static homogeneous graph and static heterogeneous graph.

 .. _CogDL: https://cogdl.readthedocs.io/en/latest/tutorial.html
 .. _PyTorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
 .. _OGB: https://ogb.stanford.edu/docs/dataset_overview/

 A basic example to construct an instance of ``GeneralStaticGraph`` is shown as follows.

 .. code-block:: python
    from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

    ''' Construct a custom homogeneous graph '''
    custom_static_homogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
        {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
    )

    ''' Construct a custom heterogemneous graph '''
    custom_static_heterogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
        {
            'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
            'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
        },
        {
            ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
            ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
        }
    )

 Supporting datasets
 -------------------
 AutoGL now supports the following benchmarks for different tasks:

 Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers\*, Amazon Photo\*, Coauthor CS\*, Coauthor Physics\*, Reddit （\*: using `utils.random_splits_mask_class` for splitting dataset is recommended.).
 For detailed information for supporting datasets, please kindly refer to `PyTorch Geometric Dataset`_.

 .. _PyTorch Geometric Dataset: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html

 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 |  Dataset         |  PyG       |  CogDL    | x          | y          | edge_index| edge_attr | train/val/test node | train/val/test mask |
 +==================+============+===========+============+============+===========+============+====================+=====================+
 | Cora             | ✓          |           |  ✓         | ✓          |  ✓        |  ✓         |                    | ✓                   |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Citeseer         |  ✓         |           |         ✓  |      ✓     |     ✓     |         ✓  |                    |               ✓     |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Pubmed           |        ✓   |           |         ✓  |          ✓ |        ✓  |         ✓  |                    |                   ✓ |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Amazon Computers |         ✓  |           |  ✓         | ✓          |  ✓        |  ✓         |                    |                     |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Amazon Photo     | ✓          |           |  ✓         | ✓          |  ✓        |  ✓         |                    |                     |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Coauthor CS      | ✓          |           |  ✓         | ✓          |  ✓        |  ✓         |                    |                     |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Coauthor Physics | ✓          |           |  ✓         | ✓          |  ✓        |  ✓         |                    |                     |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
 | Reddit           | ✓          |           |  ✓         | ✓          |  ✓        |  ✓         |                    | ✓                   |
 +------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+

 Graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB

 +-----------+------------+------------+-----------+------------+------------+-----------+
 |  Dataset  |  PyG       |  CogDL     | x         | y          | edge_index | edge_attr |
 +===========+============+============+===========+============+============+===========+
 | MUTAG     | ✓          |            |  ✓        | ✓          |  ✓         |  ✓        |
 +-----------+------------+------------+-----------+------------+------------+-----------+
 | IMDB-B    | ✓          |            |           | ✓          | ✓          |           |
 +-----------+------------+------------+-----------+------------+------------+-----------+
 | IMDB-M    | ✓          |            |           | ✓          | ✓          |           |
 +-----------+------------+------------+-----------+------------+------------+-----------+
 | PROTEINS  | ✓          |            |  ✓        | ✓          | ✓          |           |
 +-----------+------------+------------+-----------+------------+------------+-----------+
 | COLLAB    | ✓          |            |           | ✓          | ✓          |           |
 +-----------+------------+------------+-----------+------------+------------+-----------+

 TODO: Supporting all datasets from `PyTorch Geometric`. 

 OGB datasets
 ------------
 AutoGL also supports the popular benchmark on `OGB` for node classification and graph classification tasks. For the summary of `OGB` datasets, please kindly refer to the their `docs`_.

 .. _docs: https://ogb.stanford.edu/docs/nodeprop/

 Since the loss and evaluation metric used for `OGB` datasets vary among different tasks, we also add `string` properties of datasets for identification:

 +-----------------+----------------+-------------------+
 |    Dataset      | dataset.metric |   datasets.loss   |
 +=================+================+===================+
 | ogbn-products   |    Accuracy    |    nll_loss       |
 +-----------------+----------------+-------------------+
 | ogbn-proteins   | ROC-AUC        | BCEWithLogitsLoss |
 +-----------------+----------------+-------------------+
 | ogbn-arxiv      |       Accuracy |          nll_loss |
 +-----------------+----------------+-------------------+
 | ogbn-papers100M |     Accuracy   |      nll_loss     |
 +-----------------+----------------+-------------------+
 |    ogbn-mag     |    Accuracy    |     nll_loss      |
 +-----------------+----------------+-------------------+
 |   ogbg-molhiv   |    ROC-AUC     | BCEWithLogitsLoss |
 +-----------------+----------------+-------------------+
 | ogbg-molpcba    |      AP        | BCEWithLogitsLoss |
 +-----------------+----------------+-------------------+
 |    ogbg-ppa     |     Accuracy   |  CrossEntropyLoss |
 +-----------------+----------------+-------------------+
 |    ogbg-code    |     F1 score   |  CrossEntropyLoss |
 +-----------------+----------------+-------------------+


 Create a dataset via URL
 ------------------------

 If your dataset is the same as the 'ppi' dataset, which contains two matrices: 'network' and 'group', you can register your dataset directly use the above code. The default root for downloading dataset is `~/.cache-autogl`, you can also specify the root by passing the string to the `path` in `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)`.

 .. code-block:: python

    # following code-snippet is from autogl/datasets/matlab_matrix.py

    @register_dataset("ppi")
    class PPIDataset(MatlabMatrix):
        def __init__(self, path):
            dataset, filename = "ppi", "Homo_sapiens"
            url = "http://snap.stanford.edu/node2vec/"
            super(PPIDataset, self).__init__(path, filename, url)

 You should declare the name of the dataset, the name of the file, and the URL, where our script can download the resource. Then you can use either `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)` in your task to build a dataset with corresponding parameters.

 Create a dataset locally
 ------------------------

 If you want to test your local dataset, we recommend you to refer to the docs on `creating PyTorch Geometric dataset`_. 

 .. _creating PyTorch Geometric dataset: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html


 You can simply inherit from `torch_geometric.data.InMemoryDataset` to create an empty `dataset`, then create some `torch_geometric.data.Data` objects for your data and pass a regular python list holding them, then pass them to `torch_geometric.data.Dataset` or `torch_geometric.data.DataLoader`.
 Let’s see this process in a simplified example:
 Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Coauthor CS, Coauthor Physics, Reddit, etc.

 +------------------+------------+-----------+--------------------------------+
 |  Dataset         |  PyG       |  DGL      |  default train/val/test split  |
 +==================+============+===========+================================+
 | Cora             | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Citeseer         | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Pubmed           | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | Amazon Computers | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Amazon Photo     | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Coauthor CS      | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Coauthor Physics | ✓          | ✓         |                                |
 +------------------+------------+-----------+--------------------------------+
 | Reddit           | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-products    | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-proteins    | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-arxiv       | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+
 | ogbn-papers100M  | ✓          | ✓         | ✓                              |
 +------------------+------------+-----------+--------------------------------+

 Graph classification: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB, etc.

 +-------------+------------+------------+--------------+------------+--------------------+
 |  Dataset    |  PyG       |  DGL       | Node Feature | Label      |  Edge Features     |
 +=============+============+============+==============+============+====================+
 | MUTAG       | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | IMDB-Binary | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | IMDB-Multi  | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | PROTEINS    | ✓          | ✓          |  ✓           | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | COLLAB      | ✓          | ✓          |              | ✓          |                    |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-molhiv | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-molpcba| ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-ppa    | ✓          | ✓          |              | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+
 | ogbg-code2  | ✓          | ✓          |  ✓           | ✓          | ✓                  |
 +-------------+------------+------------+--------------+------------+--------------------+

 Link Prediction: At present, AutoGL utilizes various homogeneous graphs towards node classification to conduct automatic link prediction.

 Construct custom dataset by instances of GeneralStaticGraph
 ------------------------------------------------------------
 The following example shows the way to compose a custom dataset by a sequence of instances of ``GeneralStaticGraph``.

 .. code-block:: python

    from typing import Iterable
    from torch_geometric.data.data import Data
    from autogl.datasets import build_dataset_from_name
    from torch_geometric.data import InMemoryDataset

    class MyDataset(InMemoryDataset):
        def __init__(self, datalist) -> None:
            super().__init__()
            self.data, self.slices = self.collate(datalist)

    # Create your own Data objects

    # for example, if you have edge_index, features and labels
    # you can create a Data as follows
    # See pytorch geometric more info of Data
    data = Data()
    data.edge_index = edge_index
    data.x = features
    data.y = labels

    # create a list of Data object
    data_list = [data, Data(...), ..., Data(...)]

    # Initialize AutoGL Dataset with your own data
    myData = MyDataset(data_list)
    from autogl.data import InMemoryDataset
    ''' Suppose the graphs is a sequence of instances of GeneralStaticGraph '''
    graphs = [ ... ]
    custom_dataset = InMemoryDataset(graphs)