jjfraaa
/
AutoGL

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Node Classification\n",
    "AutoGL supports multiple graph related tasks, including node classification. \n",
    "\n",
    "In this file we will give you a simple example to show how to use AutoGL to do the node classification task.\n",
    "\n",
    "## Import libraries\n",
    "First, you should import some libraries and you can set the random seed before you split the dataset and train the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "import random\n",
    "import torch.backends.cudnn\n",
    "import numpy as np\n",
    "\n",
    "from autogl.datasets import build_dataset_from_name\n",
    "from autogl.solver import AutoNodeClassifier\n",
    "from autogl.module import Acc\n",
    "from autogl.backend import DependentBackend\n",
    "\n",
    "# set random seed\n",
    "random.seed(2022)\n",
    "np.random.seed(2022)\n",
    "torch.manual_seed(2022)\n",
    "if torch.cuda.is_available():\n",
    "    torch.cuda.manual_seed(2022)\n",
    "    torch.backends.cudnn.deterministic = True\n",
    "    torch.backends.cudnn.benchmark = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Dataset\n",
    "AutoGL provides a very convenient interface to obtain and partition common datasets, such as cora, citeseer, and amazon_computers, etc.\n",
    "\n",
    "You just need to give the name of the dataset you want and AutoGL will return the dataset.\n",
    "\n",
    "In this example, we evaluate model on Cora dataset in the semi-supervised node classification task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  NumNodes: 2708\n",
      "  NumEdges: 10556\n",
      "  NumFeats: 1433\n",
      "  NumClasses: 7\n",
      "  NumTrainingSamples: 140\n",
      "  NumValidationSamples: 500\n",
      "  NumTestSamples: 1000\n",
      "Done loading data from cached files.\n"
     ]
    }
   ],
   "source": [
    "dataset = build_dataset_from_name('cora')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialize Solver\n",
    "After obtaining the dataset, we need to initialize the model.\n",
    "\n",
    "However, as AutoGL provides a convenient method to use HPO to better optimize the model, we can train the model through the solver class provided by AutoGL.\n",
    "\n",
    "Solver in AutoGL usually uses a config file for lazy initialization. The format of the config file can be found in the `../config` folder for examples, or you can read our tutorial for some help."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "label = dataset[0].nodes.data['y' if DependentBackend.is_pyg() else 'label']\n",
    "num_classes = len(np.unique(label.numpy()))\n",
    "\n",
    "configs = yaml.load(open('../configs/nodeclf_gcn_benchmark_small.yml', \"r\").read(), Loader=yaml.FullLoader)\n",
    "autoClassifier = AutoNodeClassifier.from_config(configs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train\n",
    "After the initialization is finished, you can use the interface provided by AutoGL to optimize the model through HPO."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2022-10-24 18:55:27] INFO (NodeClassifier/MainThread) Use the default train/val/test ratio in given dataset\n",
      "HPO Search Phase:\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:09<00:00,  1.40s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2022-10-24 18:56:37] INFO (HPO/MainThread) Best Parameter:\n",
      "[2022-10-24 18:56:37] INFO (HPO/MainThread) Parameter: {'trainer': {'max_epoch': 165, 'early_stopping_round': 18, 'lr': 0.014545893271287733, 'weight_decay': 0.0001682578213292401}, 'encoder': {'num_layers': 2, 'hidden': [42], 'dropout': 0.6019468841551312, 'act': 'tanh'}, 'decoder': {}} acc: 0.806 higher_better\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<autogl.solver.classifier.node_classifier.AutoNodeClassifier at 0x7fa319a65cd0>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# time limit is the seconds limited for training the model\n",
    "# evaluation method is the metric to evaluate the performance\n",
    "autoClassifier.fit(dataset, time_limit=3600, evaluation_method=[Acc])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation\n",
    "After training, you can evaluate the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+-------------------------------------------------------------------------+-------+\n",
      "| name                                                                    |   acc |\n",
      "+=========================================================================+=======+\n",
      "| decoder: None                                                           | 0.806 |\n",
      "| early_stopping_round: 18                                                |       |\n",
      "| encoder: <autogl.module.model.dgl.gcn.AutoGCN object at 0x7fa3186d0b50> |       |\n",
      "| learning_rate: 0.014545893271287733                                     |       |\n",
      "| max_epoch: 165                                                          |       |\n",
      "| optimizer: !!python/name:torch.optim.adam.Adam ''                       |       |\n",
      "| trainer_name: NodeClassificationFullTrainer                             |       |\n",
      "| _idx0                                                                   |       |\n",
      "+-------------------------------------------------------------------------+-------+\n",
      "[2022-10-24 18:56:50] INFO (NodeClassifier/MainThread) Ensemble argument on, will try using ensemble model.\n",
      "[2022-10-24 18:56:50] WARNING (NodeClassifier/MainThread) Cannot use ensemble because no ensebmle module is given. Will use best model instead.\n",
      "test acc: 0.8060\n"
     ]
    }
   ],
   "source": [
    "autoClassifier.get_leaderboard().show()\n",
    "# you can also provided the metric here!\n",
    "acc = autoClassifier.evaluate(metric=\"acc\")\n",
    "print(\"test acc: {:.4f}\".format(acc))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  },
  "vscode": {
   "interpreter": {
    "hash": "ceaf47f872914ebc119c31eaf5650b5ee907a61565d128a6607ed80bbe5b2670"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}