diff --git a/docs/docfile/tutorial/t_backend-cn.rst b/docs/docfile/tutorial/t_backend-cn.rst new file mode 100644 index 0000000..574abe8 --- /dev/null +++ b/docs/docfile/tutorial/t_backend-cn.rst @@ -0,0 +1,33 @@ +.. _backend: + +Backend Support +=============== + +目前,AutoGL支持使用PyTorch-Geometric或Deep Graph Library作为后端,以便熟悉两者之一的用户均可受益于自动图学习。 + +为指定特定的后端,用户可以使用环境变量``AUTOGL_BACKEND``进行声明,例如: + +.. code-block:: python + + AUTOGL_BACKEND=pyg python xxx.py + +或 + +.. code-block:: python + + import os + os.environ["AUTOGL_BACKEND"] = "pyg" + import autogl + + ... + + +如果环境变量``AUTOGL_BACKEND``未声明,AutoGL会根据用户的Python运行环境中所安装的图学习库自动选择。 +如果PyTorch-Geometric和Deep Graph Library均已安装,则Deep Graph Library将被作为默认的后端。 + +可以以编程方式获得当前使用的后端: + +.. code-block:: python + + from autogl.backend import DependentBackend + print(DependentBackend.get_backend_name()) diff --git a/docs/docfile/tutorial/t_backend.rst b/docs/docfile/tutorial/t_backend.rst index ae68c6e..4ec39ec 100644 --- a/docs/docfile/tutorial/t_backend.rst +++ b/docs/docfile/tutorial/t_backend.rst @@ -9,13 +9,13 @@ enable users from both end benifiting the automation of graph learning. To specify one specific backend, you can declare the backend using environment variables ``AUTOGL_BACKEND``. For example: -.. code-block :: shell +.. code-block:: python AUTOGL_BACKEND=pyg python xxx.py or -.. code-block :: python +.. code-block:: python import os os.environ["AUTOGL_BACKEND"] = "pyg" diff --git a/docs/docfile/tutorial/t_dataset-cn.rst b/docs/docfile/tutorial/t_dataset-cn.rst new file mode 100644 index 0000000..a2106b5 --- /dev/null +++ b/docs/docfile/tutorial/t_dataset-cn.rst @@ -0,0 +1,100 @@ +.. _dataset: + +AutoGL 数据集 +============== + +我们基于PyTorch-Geometric (PyG),Deep Graph Learning (DGL)及Open Graph Benchmark (OGB)等图学习库提供了多种多样的常用数据集。 +同时,用户可以使用AutoGL所提供的统一静态图容器``GeneralStaticGraph``自定义静态同构图及异构图,例如: + +.. code-block:: python + from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator + + ''' 创建同构图 ''' + custom_static_homogeneous_graph = GeneralStaticGraphGenerator.create_homogeneous_static_graph( + {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556)) + ) + + ''' 创建异构图 ''' + custom_static_heterogeneous_graph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph( + { + 'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)}, + 'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)} + }, + { + ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)), + ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)), + } + ) + + +提供的常用数据集 +---------------- +AutoGL目前提供如下多种常用基准数据集: + +半监督节点分类: + ++------------------+------------+-----------+--------------------------------+ +| 数据集 | PyG | DGL | 默认train/val/test划分 | ++==================+============+===========+================================+ +| Cora | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Citeseer | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Pubmed | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Amazon Computers | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Amazon Photo | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Coauthor CS | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Coauthor Physics | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Reddit | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-products | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-proteins | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-arxiv | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-papers100M | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ + + +图分类任务: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB等 + ++-------------+------------+------------+--------------+------------+--------------------+ +| 数据集 | PyG | DGL | 节点特征 | 标签 | 边特征 | ++=============+============+============+==============+============+====================+ +| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| IMDB-Binary | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| IMDB-Multi | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| PROTEINS | ✓ | ✓ | ✓ | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| COLLAB | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ + + +链接预测任务:目前AutoGL可以使用针对节点分类任务的多种图数据进行自动链接预测。 + +通过GeneralStaticGraph序列构建自定义数据集 +---------------------------------------------------------------- +如下代码片段展示了通过一个由``GeneralStaticGraph``序列构建自定义数据集的方法。 + +.. code-block:: python + from autogl.data import InMemoryDataset + ''' graphs变量是一个由GeneralStaticGraph实例所构成的序列 ''' + graphs = [ ... ] + custom_dataset = InMemoryDataset(graphs) diff --git a/docs/docfile/tutorial/t_dataset.rst b/docs/docfile/tutorial/t_dataset.rst index 6cb6bd4..7ec0df7 100644 --- a/docs/docfile/tutorial/t_dataset.rst +++ b/docs/docfile/tutorial/t_dataset.rst @@ -3,144 +3,98 @@ AutoGL Dataset ============== -We import the module of datasets from `CogDL` and `PyTorch Geometric` and add support for datasets from `OGB`. One can refer to the usage of creating and building datasets via the tutorial of `CogDL`_, `PyTorch Geometric`_, and `OGB`_. +We provide various common datasets based on ``PyTorch-Geometric``, ``Deep Graph Library`` and ``OGB``. +Besides, users are able to leverage a unified abstraction provided in AutoGL, ``GeneralStaticGraph``, which is towards both static homogeneous graph and static heterogeneous graph. -.. _CogDL: https://cogdl.readthedocs.io/en/latest/tutorial.html -.. _PyTorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html -.. _OGB: https://ogb.stanford.edu/docs/dataset_overview/ +A basic example to construct an instance of ``GeneralStaticGraph`` is shown as follows. + +.. code-block:: python + from autogl.data.graph import GeneralStaticGraph, GeneralStaticGraphGenerator + + ''' Construct a custom homogeneous graph ''' + custom_static_homogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_homogeneous_static_graph( + {'x': torch.rand(2708, 3), 'y': torch.rand(2708, 1)}, torch.randint(0, 1024, (2, 10556)) + ) + + ''' Construct a custom heterogemneous graph ''' + custom_static_heterogeneous_graph: GeneralStaticGraph = GeneralStaticGraphGenerator.create_heterogeneous_static_graph( + { + 'author': {'x': torch.rand(1024, 3), 'y': torch.rand(1024, 1)}, + 'paper': {'feat': torch.rand(2048, 10), 'z': torch.rand(2048, 13)} + }, + { + ('author', 'writing', 'paper'): (torch.randint(0, 1024, (2, 5120)), torch.rand(5120, 10)), + ('author', 'reading', 'paper'): torch.randint(0, 1024, (2, 3840)), + } + ) Supporting datasets ------------------- AutoGL now supports the following benchmarks for different tasks: -Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers\*, Amazon Photo\*, Coauthor CS\*, Coauthor Physics\*, Reddit (\*: using `utils.random_splits_mask_class` for splitting dataset is recommended.). -For detailed information for supporting datasets, please kindly refer to `PyTorch Geometric Dataset`_. - -.. _PyTorch Geometric Dataset: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html - -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Dataset | PyG | CogDL | x | y | edge_index| edge_attr | train/val/test node | train/val/test mask | -+==================+============+===========+============+============+===========+============+====================+=====================+ -| Cora | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Citeseer | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Pubmed | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Amazon Computers | ✓ | | ✓ | ✓ | ✓ | ✓ | | | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Amazon Photo | ✓ | | ✓ | ✓ | ✓ | ✓ | | | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Coauthor CS | ✓ | | ✓ | ✓ | ✓ | ✓ | | | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Coauthor Physics | ✓ | | ✓ | ✓ | ✓ | ✓ | | | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ -| Reddit | ✓ | | ✓ | ✓ | ✓ | ✓ | | ✓ | -+------------------+------------+-----------+------------+------------+-----------+------------+--------------------+---------------------+ - -Graph classification: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB - -+-----------+------------+------------+-----------+------------+------------+-----------+ -| Dataset | PyG | CogDL | x | y | edge_index | edge_attr | -+===========+============+============+===========+============+============+===========+ -| MUTAG | ✓ | | ✓ | ✓ | ✓ | ✓ | -+-----------+------------+------------+-----------+------------+------------+-----------+ -| IMDB-B | ✓ | | | ✓ | ✓ | | -+-----------+------------+------------+-----------+------------+------------+-----------+ -| IMDB-M | ✓ | | | ✓ | ✓ | | -+-----------+------------+------------+-----------+------------+------------+-----------+ -| PROTEINS | ✓ | | ✓ | ✓ | ✓ | | -+-----------+------------+------------+-----------+------------+------------+-----------+ -| COLLAB | ✓ | | | ✓ | ✓ | | -+-----------+------------+------------+-----------+------------+------------+-----------+ - -TODO: Supporting all datasets from `PyTorch Geometric`. - -OGB datasets ------------- -AutoGL also supports the popular benchmark on `OGB` for node classification and graph classification tasks. For the summary of `OGB` datasets, please kindly refer to the their `docs`_. - -.. _docs: https://ogb.stanford.edu/docs/nodeprop/ - -Since the loss and evaluation metric used for `OGB` datasets vary among different tasks, we also add `string` properties of datasets for identification: - -+-----------------+----------------+-------------------+ -| Dataset | dataset.metric | datasets.loss | -+=================+================+===================+ -| ogbn-products | Accuracy | nll_loss | -+-----------------+----------------+-------------------+ -| ogbn-proteins | ROC-AUC | BCEWithLogitsLoss | -+-----------------+----------------+-------------------+ -| ogbn-arxiv | Accuracy | nll_loss | -+-----------------+----------------+-------------------+ -| ogbn-papers100M | Accuracy | nll_loss | -+-----------------+----------------+-------------------+ -| ogbn-mag | Accuracy | nll_loss | -+-----------------+----------------+-------------------+ -| ogbg-molhiv | ROC-AUC | BCEWithLogitsLoss | -+-----------------+----------------+-------------------+ -| ogbg-molpcba | AP | BCEWithLogitsLoss | -+-----------------+----------------+-------------------+ -| ogbg-ppa | Accuracy | CrossEntropyLoss | -+-----------------+----------------+-------------------+ -| ogbg-code | F1 score | CrossEntropyLoss | -+-----------------+----------------+-------------------+ - - -Create a dataset via URL ------------------------- - -If your dataset is the same as the 'ppi' dataset, which contains two matrices: 'network' and 'group', you can register your dataset directly use the above code. The default root for downloading dataset is `~/.cache-autogl`, you can also specify the root by passing the string to the `path` in `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)`. - -.. code-block:: python - - # following code-snippet is from autogl/datasets/matlab_matrix.py - - @register_dataset("ppi") - class PPIDataset(MatlabMatrix): - def __init__(self, path): - dataset, filename = "ppi", "Homo_sapiens" - url = "http://snap.stanford.edu/node2vec/" - super(PPIDataset, self).__init__(path, filename, url) - -You should declare the name of the dataset, the name of the file, and the URL, where our script can download the resource. Then you can use either `build_dataset(args, path)` or `build_dataset_from_name(dataset, path)` in your task to build a dataset with corresponding parameters. - -Create a dataset locally ------------------------- - -If you want to test your local dataset, we recommend you to refer to the docs on `creating PyTorch Geometric dataset`_. - -.. _creating PyTorch Geometric dataset: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html - - -You can simply inherit from `torch_geometric.data.InMemoryDataset` to create an empty `dataset`, then create some `torch_geometric.data.Data` objects for your data and pass a regular python list holding them, then pass them to `torch_geometric.data.Dataset` or `torch_geometric.data.DataLoader`. -Let’s see this process in a simplified example: +Semi-supervised node classification: Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Coauthor CS, Coauthor Physics, Reddit, etc. + ++------------------+------------+-----------+--------------------------------+ +| Dataset | PyG | DGL | default train/val/test split | ++==================+============+===========+================================+ +| Cora | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Citeseer | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Pubmed | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| Amazon Computers | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Amazon Photo | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Coauthor CS | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Coauthor Physics | ✓ | ✓ | | ++------------------+------------+-----------+--------------------------------+ +| Reddit | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-products | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-proteins | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-arxiv | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ +| ogbn-papers100M | ✓ | ✓ | ✓ | ++------------------+------------+-----------+--------------------------------+ + +Graph classification: MUTAG, IMDB-Binary, IMDB-Multi, PROTEINS, COLLAB, etc. + ++-------------+------------+------------+--------------+------------+--------------------+ +| Dataset | PyG | DGL | Node Feature | Label | Edge Features | ++=============+============+============+==============+============+====================+ +| MUTAG | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| IMDB-Binary | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| IMDB-Multi | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| PROTEINS | ✓ | ✓ | ✓ | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| COLLAB | ✓ | ✓ | | ✓ | | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-molhiv | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-molpcba| ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-ppa | ✓ | ✓ | | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ +| ogbg-code2 | ✓ | ✓ | ✓ | ✓ | ✓ | ++-------------+------------+------------+--------------+------------+--------------------+ + +Link Prediction: At present, AutoGL utilizes various homogeneous graphs towards node classification to conduct automatic link prediction. + +Construct custom dataset by instances of GeneralStaticGraph +------------------------------------------------------------ +The following example shows the way to compose a custom dataset by a sequence of instances of ``GeneralStaticGraph``. .. code-block:: python - - from typing import Iterable - from torch_geometric.data.data import Data - from autogl.datasets import build_dataset_from_name - from torch_geometric.data import InMemoryDataset - - class MyDataset(InMemoryDataset): - def __init__(self, datalist) -> None: - super().__init__() - self.data, self.slices = self.collate(datalist) - - # Create your own Data objects - - # for example, if you have edge_index, features and labels - # you can create a Data as follows - # See pytorch geometric more info of Data - data = Data() - data.edge_index = edge_index - data.x = features - data.y = labels - - # create a list of Data object - data_list = [data, Data(...), ..., Data(...)] - - # Initialize AutoGL Dataset with your own data - myData = MyDataset(data_list) + from autogl.data import InMemoryDataset + ''' Suppose the graphs is a sequence of instances of GeneralStaticGraph ''' + graphs = [ ... ] + custom_dataset = InMemoryDataset(graphs)