Browse Source

Revise tutorial for backend and datasets

develop/0.4/predevelop
CoreLeader 4 years ago
parent
commit
dde3d3aef5
4 changed files with 221 additions and 134 deletions
  1. +33
    -0
      docs/docfile/tutorial/t_backend-cn.rst
  2. +2
    -2
      docs/docfile/tutorial/t_backend.rst
  3. +100
    -0
      docs/docfile/tutorial/t_dataset-cn.rst
  4. +86
    -132
      docs/docfile/tutorial/t_dataset.rst

+ 33
- 0
docs/docfile/tutorial/t_backend-cn.rst View File

@@ -0,0 +1,33 @@
.. _backend:

Backend Support
===============

目前,AutoGL支持使用PyTorch-Geometric或Deep Graph Library作为后端,以便熟悉两者之一的用户均可受益于自动图学习。

为指定特定的后端,用户可以使用环境变量``AUTOGL_BACKEND``进行声明,例如:

.. code-block:: python

AUTOGL_BACKEND=pyg python xxx.py


.. code-block:: python

import os
os.environ["AUTOGL_BACKEND"] = "pyg"
import autogl

...


如果环境变量``AUTOGL_BACKEND``未声明,AutoGL会根据用户的Python运行环境中所安装的图学习库自动选择。
如果PyTorch-Geometric和Deep Graph Library均已安装,则Deep Graph Library将被作为默认的后端。

可以以编程方式获得当前使用的后端:

.. code-block:: python

from autogl.backend import DependentBackend
print(DependentBackend.get_backend_name())

+ 2
- 2
docs/docfile/tutorial/t_backend.rst View File

@@ -9,13 +9,13 @@ enable users from both end benifiting the automation of graph learning.
To specify one specific backend, you can declare the backend using environment variables
``AUTOGL_BACKEND``. For example:

.. code-block :: shell
.. code-block:: python

AUTOGL_BACKEND=pyg python xxx.py

or

.. code-block :: python
.. code-block:: python

import os
os.environ["AUTOGL_BACKEND"] = "pyg"


+ 100
- 0
docs/docfile/tutorial/t_dataset-cn.rst View File

@@ -0,0 +1,100 @@
.. _dataset:

AutoGL 数据集
==============

我们基于PyTorch-Geometric (PyG),Deep Graph Learning (DGL)及Open Graph Benchmark (OGB)等图学习库提供了多种多样的常用数据集。
同时,用户可以使用AutoGL所提供的统一静态图容器``GeneralStaticGraph``自定义静态同构图及异构图,例如:

.. code-block:: python
from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

''' 创建同构图 '''
custom_static_homogeneous_graph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
{'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
)

''' 创建异构图 '''
custom_static_heterogeneous_graph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
{
'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
},
{
('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
}
)


提供的常用数据集
----------------
AutoGL目前提供如下多种常用基准数据集:

半监督节点分类:

+------------------+------------+-----------+--------------------------------+
| 数据集 | PyG | DGL | 默认train/val/test划分 |
+==================+============+===========+================================+
| Cora | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Citeseer | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Pubmed | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Amazon Computers | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Amazon Photo | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor CS | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor Physics | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Reddit | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-products | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-proteins | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-arxiv | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-papers100M | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+


图分类任务: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB等

+-------------+------------+------------+--------------+------------+--------------------+
| 数据集 | PyG | DGL | 节点特征 | 标签 | 边特征 |
+=============+============+============+==============+============+====================+
| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Binary | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Multi | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| PROTEINS | ✓ | ✓ | ✓ | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| COLLAB | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+


链接预测任务:目前AutoGL可以使用针对节点分类任务的多种图数据进行自动链接预测。

通过GeneralStaticGraph序列构建自定义数据集
----------------------------------------------------------------
如下代码片段展示了通过一个由``GeneralStaticGraph``序列构建自定义数据集的方法。

.. code-block:: python
from autogl.data import InMemoryDataset
''' graphs变量是一个由GeneralStaticGraph实例所构成的序列 '''
graphs = [ ... ]
custom_dataset = InMemoryDataset(graphs)

+ 86
- 132
docs/docfile/tutorial/t_dataset.rst View File

@@ -3,144 +3,98 @@
AutoGL Dataset
==============

We import the module of datasets from `CogDL` and `PyTorch Geometric` and add support for datasets from `OGB`. One can refer to the usage of creating and building datasets via the tutorial of `CogDL`_, `PyTorch Geometric`_, and `OGB`_.
We provide various common datasets based on ``PyTorch-Geometric``, ``Deep Graph Library`` and ``OGB``.
Besides, users are able to leverage a unified abstraction provided in AutoGL, ``GeneralStaticGraph``, which is towards both static homogeneous graph and static heterogeneous graph.

.. _CogDL: https://cogdl.readthedocs.io/en/latest/tutorial.html
.. _PyTorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
.. _OGB: https://ogb.stanford.edu/docs/dataset_overview/

A basic example to construct an instance of ``GeneralStaticGraph`` is shown as follows.

.. code-block:: python
from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator

''' Construct a custom homogeneous graph '''
custom_static_homogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_homogeneous_static_graph(
{'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556))
)

''' Construct a custom heterogemneous graph '''
custom_static_heterogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph(
{
'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)},
'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)}
},
{
('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)),
('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)),
}
)

Supporting datasets
-------------------
AutoGL now supports the following benchmarks for different tasks:

Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers\*, Amazon Photo\*, Coauthor CS\*, Coauthor Physics\*, Reddit (\*: using `utils.random_splits_mask_class` for splitting dataset is recommended.).
For detailed information for supporting datasets, please kindly refer to `PyTorch Geometric Dataset`_.

.. _PyTorch Geometric Dataset: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html

+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Dataset | PyG | CogDL | x | y | edge_index| edge_attr | train/val/test node | train/val/test mask |
+==================+============+===========+============+============+===========+============+====================+=====================+
| Cora | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Citeseer | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Pubmed | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Amazon Computers | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Amazon Photo | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Coauthor CS | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Coauthor Physics | ✓ | | ✓ | ✓ | ✓ | ✓ | | |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+
| Reddit | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ |
+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+

Graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB

+-----------+------------+------------+-----------+------------+------------+-----------+
| Dataset | PyG | CogDL | x | y | edge_index | edge_attr |
+===========+============+============+===========+============+============+===========+
| MUTAG | ✓ | | ✓ | ✓ | ✓ | ✓ |
+-----------+------------+------------+-----------+------------+------------+-----------+
| IMDB-B | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| IMDB-M | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| PROTEINS | ✓ | | ✓ | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+
| COLLAB | ✓ | | | ✓ | ✓ | |
+-----------+------------+------------+-----------+------------+------------+-----------+

TODO: Supporting all datasets from `PyTorch Geometric`.

OGB datasets
------------
AutoGL also supports the popular benchmark on `OGB` for node classification and graph classification tasks. For the summary of `OGB` datasets, please kindly refer to the their `docs`_.

.. _docs: https://ogb.stanford.edu/docs/nodeprop/

Since the loss and evaluation metric used for `OGB` datasets vary among different tasks, we also add `string` properties of datasets for identification:

+-----------------+----------------+-------------------+
| Dataset | dataset.metric | datasets.loss |
+=================+================+===================+
| ogbn-products | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-proteins | ROC-AUC | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbn-arxiv | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-papers100M | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbn-mag | Accuracy | nll_loss |
+-----------------+----------------+-------------------+
| ogbg-molhiv | ROC-AUC | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbg-molpcba | AP | BCEWithLogitsLoss |
+-----------------+----------------+-------------------+
| ogbg-ppa | Accuracy | CrossEntropyLoss |
+-----------------+----------------+-------------------+
| ogbg-code | F1 score | CrossEntropyLoss |
+-----------------+----------------+-------------------+


Create a dataset via URL
------------------------

If your dataset is the same as the 'ppi' dataset, which contains two matrices: 'network' and 'group', you can register your dataset directly use the above code. The default root for downloading dataset is `~/.cache-autogl`, you can also specify the root by passing the string to the `path` in `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)`.

.. code-block:: python

# following code-snippet is from autogl/datasets/matlab_matrix.py

@register_dataset("ppi")
class PPIDataset(MatlabMatrix):
def __init__(self, path):
dataset, filename = "ppi", "Homo_sapiens"
url = "http://snap.stanford.edu/node2vec/"
super(PPIDataset, self).__init__(path, filename, url)

You should declare the name of the dataset, the name of the file, and the URL, where our script can download the resource. Then you can use either `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)` in your task to build a dataset with corresponding parameters.

Create a dataset locally
------------------------

If you want to test your local dataset, we recommend you to refer to the docs on `creating PyTorch Geometric dataset`_.

.. _creating PyTorch Geometric dataset: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html


You can simply inherit from `torch_geometric.data.InMemoryDataset` to create an empty `dataset`, then create some `torch_geometric.data.Data` objects for your data and pass a regular python list holding them, then pass them to `torch_geometric.data.Dataset` or `torch_geometric.data.DataLoader`.
Let’s see this process in a simplified example:
Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Coauthor CS, Coauthor Physics, Reddit, etc.

+------------------+------------+-----------+--------------------------------+
| Dataset | PyG | DGL | default train/val/test split |
+==================+============+===========+================================+
| Cora | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Citeseer | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Pubmed | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| Amazon Computers | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Amazon Photo | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor CS | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Coauthor Physics | ✓ | ✓ | |
+------------------+------------+-----------+--------------------------------+
| Reddit | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-products | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-proteins | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-arxiv | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+
| ogbn-papers100M | ✓ | ✓ | ✓ |
+------------------+------------+-----------+--------------------------------+

Graph classification: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB, etc.

+-------------+------------+------------+--------------+------------+--------------------+
| Dataset | PyG | DGL | Node Feature | Label | Edge Features |
+=============+============+============+==============+============+====================+
| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Binary | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| IMDB-Multi | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| PROTEINS | ✓ | ✓ | ✓ | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| COLLAB | ✓ | ✓ | | ✓ | |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+
| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ |
+-------------+------------+------------+--------------+------------+--------------------+

Link Prediction: At present, AutoGL utilizes various homogeneous graphs towards node classification to conduct automatic link prediction.

Construct custom dataset by instances of GeneralStaticGraph
------------------------------------------------------------
The following example shows the way to compose a custom dataset by a sequence of instances of ``GeneralStaticGraph``.

.. code-block:: python

from typing import Iterable
from torch_geometric.data.data import Data
from autogl.datasets import build_dataset_from_name
from torch_geometric.data import InMemoryDataset

class MyDataset(InMemoryDataset):
def __init__(self, datalist) -> None:
super().__init__()
self.data, self.slices = self.collate(datalist)

# Create your own Data objects

# for example, if you have edge_index, features and labels
# you can create a Data as follows
# See pytorch geometric more info of Data
data = Data()
data.edge_index = edge_index
data.x = features
data.y = labels

# create a list of Data object
data_list = [data, Data(...), ..., Data(...)]

# Initialize AutoGL Dataset with your own data
myData = MyDataset(data_list)
from autogl.data import InMemoryDataset
''' Suppose the graphs is a sequence of instances of GeneralStaticGraph '''
graphs = [ ... ]
custom_dataset = InMemoryDataset(graphs)

Loading…
Cancel
Save