|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226 |
- {
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Node Classification\n",
- "AutoGL supports multiple graph related tasks, including node classification. \n",
- "\n",
- "In this file we will give you a simple example to show how to use AutoGL to do the node classification task.\n",
- "\n",
- "## Import libraries\n",
- "First, you should import some libraries and you can set the random seed before you split the dataset and train the model."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "import yaml\n",
- "import random\n",
- "import torch.backends.cudnn\n",
- "import numpy as np\n",
- "\n",
- "from autogl.datasets import build_dataset_from_name\n",
- "from autogl.solver import AutoNodeClassifier\n",
- "from autogl.module import Acc\n",
- "from autogl.backend import DependentBackend\n",
- "\n",
- "# set random seed\n",
- "random.seed(2022)\n",
- "np.random.seed(2022)\n",
- "torch.manual_seed(2022)\n",
- "if torch.cuda.is_available():\n",
- " torch.cuda.manual_seed(2022)\n",
- " torch.backends.cudnn.deterministic = True\n",
- " torch.backends.cudnn.benchmark = False"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Load Dataset\n",
- "AutoGL provides a very convenient interface to obtain and partition common datasets, such as cora, citeseer, and amazon_computers, etc.\n",
- "\n",
- "You just need to give the name of the dataset you want and AutoGL will return the dataset.\n",
- "\n",
- "In this example, we evaluate model on Cora dataset in the semi-supervised node classification task."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " NumNodes: 2708\n",
- " NumEdges: 10556\n",
- " NumFeats: 1433\n",
- " NumClasses: 7\n",
- " NumTrainingSamples: 140\n",
- " NumValidationSamples: 500\n",
- " NumTestSamples: 1000\n",
- "Done loading data from cached files.\n"
- ]
- }
- ],
- "source": [
- "dataset = build_dataset_from_name('cora')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Initialize Solver\n",
- "After obtaining the dataset, we need to initialize the model.\n",
- "\n",
- "However, as AutoGL provides a convenient method to use HPO to better optimize the model, we can train the model through the solver class provided by AutoGL.\n",
- "\n",
- "Solver in AutoGL usually uses a config file for lazy initialization. The format of the config file can be found in the `../config` folder for examples, or you can read our tutorial for some help."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "label = dataset[0].nodes.data['y' if DependentBackend.is_pyg() else 'label']\n",
- "num_classes = len(np.unique(label.numpy()))\n",
- "\n",
- "configs = yaml.load(open('../configs/nodeclf_gcn_benchmark_small.yml', \"r\").read(), Loader=yaml.FullLoader)\n",
- "autoClassifier = AutoNodeClassifier.from_config(configs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Train\n",
- "After the initialization is finished, you can use the interface provided by AutoGL to optimize the model through HPO."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[2022-10-24 18:55:27] INFO (NodeClassifier/MainThread) Use the default train/val/test ratio in given dataset\n",
- "HPO Search Phase:\n",
- "\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:09<00:00, 1.40s/it]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[2022-10-24 18:56:37] INFO (HPO/MainThread) Best Parameter:\n",
- "[2022-10-24 18:56:37] INFO (HPO/MainThread) Parameter: {'trainer': {'max_epoch': 165, 'early_stopping_round': 18, 'lr': 0.014545893271287733, 'weight_decay': 0.0001682578213292401}, 'encoder': {'num_layers': 2, 'hidden': [42], 'dropout': 0.6019468841551312, 'act': 'tanh'}, 'decoder': {}} acc: 0.806 higher_better\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "<autogl.solver.classifier.node_classifier.AutoNodeClassifier at 0x7fa319a65cd0>"
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "# time limit is the seconds limited for training the model\n",
- "# evaluation method is the metric to evaluate the performance\n",
- "autoClassifier.fit(dataset, time_limit=3600, evaluation_method=[Acc])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Evaluation\n",
- "After training, you can evaluate the model."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "+-------------------------------------------------------------------------+-------+\n",
- "| name | acc |\n",
- "+=========================================================================+=======+\n",
- "| decoder: None | 0.806 |\n",
- "| early_stopping_round: 18 | |\n",
- "| encoder: <autogl.module.model.dgl.gcn.AutoGCN object at 0x7fa3186d0b50> | |\n",
- "| learning_rate: 0.014545893271287733 | |\n",
- "| max_epoch: 165 | |\n",
- "| optimizer: !!python/name:torch.optim.adam.Adam '' | |\n",
- "| trainer_name: NodeClassificationFullTrainer | |\n",
- "| _idx0 | |\n",
- "+-------------------------------------------------------------------------+-------+\n",
- "[2022-10-24 18:56:50] INFO (NodeClassifier/MainThread) Ensemble argument on, will try using ensemble model.\n",
- "[2022-10-24 18:56:50] WARNING (NodeClassifier/MainThread) Cannot use ensemble because no ensebmle module is given. Will use best model instead.\n",
- "test acc: 0.8060\n"
- ]
- }
- ],
- "source": [
- "autoClassifier.get_leaderboard().show()\n",
- "# you can also provided the metric here!\n",
- "acc = autoClassifier.evaluate(metric=\"acc\")\n",
- "print(\"test acc: {:.4f}\".format(acc))"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.7.11"
- },
- "vscode": {
- "interpreter": {
- "hash": "ceaf47f872914ebc119c31eaf5650b5ee907a61565d128a6607ed80bbe5b2670"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
- }
|