{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "SHARE MLSpring2021 - HW2-1.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OYlaRwNu7ojq"
      },
      "source": [
        "# **Homework 2-1 Phoneme Classification**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "emUd7uS7crTz"
      },
      "source": [
        "## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)\n",
        "The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.\n",
        "\n",
        "This homework is a multiclass classification task, \n",
        "we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.\n",
        "\n",
        "link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KVUGfWTo7_Oj"
      },
      "source": [
        "## Download Data\n",
        "Download data from google drive, then unzip it.\n",
        "\n",
        "You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.<br><br>\n",
        "`timit_11/`\n",
        "- `train_11.npy`: training data<br>\n",
        "- `train_label_11.npy`: training label<br>\n",
        "- `test_11.npy`:  testing data<br><br>\n",
        "\n",
        "**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OzkiMEcC3Foq",
        "outputId": "4308c64c-6885-4d1c-8eb7-a2d9b8038401"
      },
      "source": [
        "!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip\n",
        "!unzip data.zip\n",
        "!ls "
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Downloading...\n",
            "From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR\n",
            "To: /content/data.zip\n",
            "372MB [00:03, 121MB/s]\n",
            "Archive:  data.zip\n",
            "   creating: timit_11/\n",
            "  inflating: timit_11/train_11.npy   \n",
            "  inflating: timit_11/test_11.npy    \n",
            "  inflating: timit_11/train_label_11.npy  \n",
            "data.zip  sample_data  timit_11\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_L_4anls8Drv"
      },
      "source": [
        "## Preparing Data\n",
        "Load the training and testing data from the `.npy` file (NumPy array)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IJjLT8em-y9G",
        "outputId": "8edc6bfe-7511-447f-f239-00b96dba6dcf"
      },
      "source": [
        "import numpy as np\n",
        "\n",
        "print('Loading data ...')\n",
        "\n",
        "data_root='./timit_11/'\n",
        "train = np.load(data_root + 'train_11.npy')\n",
        "train_label = np.load(data_root + 'train_label_11.npy')\n",
        "test = np.load(data_root + 'test_11.npy')\n",
        "\n",
        "print('Size of training data: {}'.format(train.shape))\n",
        "print('Size of testing data: {}'.format(test.shape))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Loading data ...\n",
            "Size of training data: (1229932, 429)\n",
            "Size of testing data: (451552, 429)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "us5XW_x6udZQ"
      },
      "source": [
        "## Create Dataset"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fjf5EcmJtf4e"
      },
      "source": [
        "import torch\n",
        "from torch.utils.data import Dataset\n",
        "\n",
        "class TIMITDataset(Dataset):\n",
        "    def __init__(self, X, y=None):\n",
        "        self.data = torch.from_numpy(X).float()\n",
        "        if y is not None:\n",
        "            y = y.astype(np.int)\n",
        "            self.label = torch.LongTensor(y)\n",
        "        else:\n",
        "            self.label = None\n",
        "\n",
        "    def __getitem__(self, idx):\n",
        "        if self.label is not None:\n",
        "            return self.data[idx], self.label[idx]\n",
        "        else:\n",
        "            return self.data[idx]\n",
        "\n",
        "    def __len__(self):\n",
        "        return len(self.data)\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "otIC6WhGeh9v"
      },
      "source": [
        "Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "sYqi_lAuvC59",
        "outputId": "13dabe63-4849-47ee-fe04-57427b9d601c"
      },
      "source": [
        "VAL_RATIO = 0.2\n",
        "\n",
        "percent = int(train.shape[0] * (1 - VAL_RATIO))\n",
        "train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]\n",
        "print('Size of training set: {}'.format(train_x.shape))\n",
        "print('Size of validation set: {}'.format(val_x.shape))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Size of training set: (983945, 429)\n",
            "Size of validation set: (245987, 429)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nbCfclUIgMTX"
      },
      "source": [
        "Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RUCbQvqJurYc"
      },
      "source": [
        "BATCH_SIZE = 64\n",
        "\n",
        "from torch.utils.data import DataLoader\n",
        "\n",
        "train_set = TIMITDataset(train_x, train_y)\n",
        "val_set = TIMITDataset(val_x, val_y)\n",
        "train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data\n",
        "val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_SY7X0lUgb50"
      },
      "source": [
        "Cleanup the unneeded variables to save memory.<br>\n",
        "\n",
        "**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later<br>the data size is quite huge, so be aware of memory usage in colab**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "y8rzkGraeYeN",
        "outputId": "dc790996-a43c-4a99-90d4-e7928892a899"
      },
      "source": [
        "import gc\n",
        "\n",
        "del train, train_label, train_x, train_y, val_x, val_y\n",
        "gc.collect()"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "50"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IRqKNvNZwe3V"
      },
      "source": [
        "## Create Model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FYr1ng5fh9pA"
      },
      "source": [
        "Define model architecture, you are encouraged to change and experiment with the model architecture."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lbZrwT6Ny0XL"
      },
      "source": [
        "import torch\n",
        "import torch.nn as nn\n",
        "\n",
        "class Classifier(nn.Module):\n",
        "    def __init__(self):\n",
        "        super(Classifier, self).__init__()\n",
        "        self.layer1 = nn.Linear(429, 1024)\n",
        "        self.layer2 = nn.Linear(1024, 512)\n",
        "        self.layer3 = nn.Linear(512, 128)\n",
        "        self.out = nn.Linear(128, 39) \n",
        "\n",
        "        self.act_fn = nn.Sigmoid()\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = self.layer1(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.layer2(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.layer3(x)\n",
        "        x = self.act_fn(x)\n",
        "\n",
        "        x = self.out(x)\n",
        "        \n",
        "        return x"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VRYciXZvPbYh"
      },
      "source": [
        "## Training"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "y114Vmm3Ja6o"
      },
      "source": [
        "#check device\n",
        "def get_device():\n",
        "  return 'cuda' if torch.cuda.is_available() else 'cpu'"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sEX-yjHjhGuH"
      },
      "source": [
        "Fix random seeds for reproducibility."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "88xPiUnm0tAd"
      },
      "source": [
        "# fix random seed\n",
        "def same_seeds(seed):\n",
        "    torch.manual_seed(seed)\n",
        "    if torch.cuda.is_available():\n",
        "        torch.cuda.manual_seed(seed)\n",
        "        torch.cuda.manual_seed_all(seed)  \n",
        "    np.random.seed(seed)  \n",
        "    torch.backends.cudnn.benchmark = False\n",
        "    torch.backends.cudnn.deterministic = True"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KbBcBXkSp6RA"
      },
      "source": [
        "Feel free to change the training parameters here."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QTp3ZXg1yO9Y"
      },
      "source": [
        "# fix random seed for reproducibility\n",
        "same_seeds(0)\n",
        "\n",
        "# get device \n",
        "device = get_device()\n",
        "print(f'DEVICE: {device}')\n",
        "\n",
        "# training parameters\n",
        "num_epoch = 20               # number of training epoch\n",
        "learning_rate = 0.0001       # learning rate\n",
        "\n",
        "# the path where checkpoint saved\n",
        "model_path = './model.ckpt'\n",
        "\n",
        "# create model, define a loss function, and optimizer\n",
        "model = Classifier().to(device)\n",
        "criterion = nn.CrossEntropyLoss() \n",
        "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CdMWsBs7zzNs",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c5ed561e-610d-4a35-d936-fd97adf342a0"
      },
      "source": [
        "# start training\n",
        "\n",
        "best_acc = 0.0\n",
        "for epoch in range(num_epoch):\n",
        "    train_acc = 0.0\n",
        "    train_loss = 0.0\n",
        "    val_acc = 0.0\n",
        "    val_loss = 0.0\n",
        "\n",
        "    # training\n",
        "    model.train() # set the model to training mode\n",
        "    for i, data in enumerate(train_loader):\n",
        "        inputs, labels = data\n",
        "        inputs, labels = inputs.to(device), labels.to(device)\n",
        "        optimizer.zero_grad() \n",
        "        outputs = model(inputs) \n",
        "        batch_loss = criterion(outputs, labels)\n",
        "        _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n",
        "        batch_loss.backward() \n",
        "        optimizer.step() \n",
        "\n",
        "        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()\n",
        "        train_loss += batch_loss.item()\n",
        "\n",
        "    # validation\n",
        "    if len(val_set) > 0:\n",
        "        model.eval() # set the model to evaluation mode\n",
        "        with torch.no_grad():\n",
        "            for i, data in enumerate(val_loader):\n",
        "                inputs, labels = data\n",
        "                inputs, labels = inputs.to(device), labels.to(device)\n",
        "                outputs = model(inputs)\n",
        "                batch_loss = criterion(outputs, labels) \n",
        "                _, val_pred = torch.max(outputs, 1) \n",
        "            \n",
        "                val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability\n",
        "                val_loss += batch_loss.item()\n",
        "\n",
        "            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(\n",
        "                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)\n",
        "            ))\n",
        "\n",
        "            # if the model improves, save a checkpoint at this epoch\n",
        "            if val_acc > best_acc:\n",
        "                best_acc = val_acc\n",
        "                torch.save(model.state_dict(), model_path)\n",
        "                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))\n",
        "    else:\n",
        "        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(\n",
        "            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)\n",
        "        ))\n",
        "\n",
        "# if not validating, save the last epoch\n",
        "if len(val_set) == 0:\n",
        "    torch.save(model.state_dict(), model_path)\n",
        "    print('saving model at last epoch')\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "[001/020] Train Acc: 0.467390 Loss: 1.812880 | Val Acc: 0.564884 loss: 1.440870\n",
            "saving model with acc 0.565\n",
            "[002/020] Train Acc: 0.594031 Loss: 1.332670 | Val Acc: 0.629594 loss: 1.209077\n",
            "saving model with acc 0.630\n",
            "[003/020] Train Acc: 0.644419 Loss: 1.154247 | Val Acc: 0.658295 loss: 1.102313\n",
            "saving model with acc 0.658\n",
            "[004/020] Train Acc: 0.672767 Loss: 1.051355 | Val Acc: 0.675568 loss: 1.040186\n",
            "saving model with acc 0.676\n",
            "[005/020] Train Acc: 0.691564 Loss: 0.982245 | Val Acc: 0.683853 loss: 1.004628\n",
            "saving model with acc 0.684\n",
            "[006/020] Train Acc: 0.705731 Loss: 0.930892 | Val Acc: 0.691707 loss: 0.977562\n",
            "saving model with acc 0.692\n",
            "[007/020] Train Acc: 0.716722 Loss: 0.890210 | Val Acc: 0.691016 loss: 0.973670\n",
            "[008/020] Train Acc: 0.726312 Loss: 0.856612 | Val Acc: 0.690207 loss: 0.971627\n",
            "[009/020] Train Acc: 0.734965 Loss: 0.827445 | Val Acc: 0.698561 loss: 0.942904\n",
            "saving model with acc 0.699\n",
            "[010/020] Train Acc: 0.741926 Loss: 0.801676 | Val Acc: 0.698854 loss: 0.946376\n",
            "saving model with acc 0.699\n",
            "[011/020] Train Acc: 0.748191 Loss: 0.779319 | Val Acc: 0.700944 loss: 0.938454\n",
            "saving model with acc 0.701\n",
            "[012/020] Train Acc: 0.754672 Loss: 0.758071 | Val Acc: 0.699423 loss: 0.940523\n",
            "[013/020] Train Acc: 0.759725 Loss: 0.739450 | Val Acc: 0.699728 loss: 0.951068\n",
            "[014/020] Train Acc: 0.765137 Loss: 0.721372 | Val Acc: 0.701903 loss: 0.938658\n",
            "saving model with acc 0.702\n",
            "[015/020] Train Acc: 0.769828 Loss: 0.704748 | Val Acc: 0.701761 loss: 0.937079\n",
            "[016/020] Train Acc: 0.774698 Loss: 0.688990 | Val Acc: 0.702293 loss: 0.938634\n",
            "saving model with acc 0.702\n",
            "[017/020] Train Acc: 0.779358 Loss: 0.674498 | Val Acc: 0.702492 loss: 0.943941\n",
            "saving model with acc 0.702\n",
            "[018/020] Train Acc: 0.783076 Loss: 0.660028 | Val Acc: 0.695195 loss: 0.966189\n",
            "[019/020] Train Acc: 0.787432 Loss: 0.646340 | Val Acc: 0.700708 loss: 0.958220\n",
            "[020/020] Train Acc: 0.791536 Loss: 0.633378 | Val Acc: 0.700643 loss: 0.957066\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1Hi7jTn3PX-m"
      },
      "source": [
        "## Testing"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NfUECMFCn5VG"
      },
      "source": [
        "Create a testing dataset, and load model from the saved checkpoint."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1PKjtAScPWtr",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "8c17272b-536a-4692-a95f-a3292766c698"
      },
      "source": [
        "# create testing dataset\n",
        "test_set = TIMITDataset(test, None)\n",
        "test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)\n",
        "\n",
        "# create model and load weights from checkpoint\n",
        "model = Classifier().to(device)\n",
        "model.load_state_dict(torch.load(model_path))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "<All keys matched successfully>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 12
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "940TtCCdoYd0"
      },
      "source": [
        "Make prediction."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "84HU5GGjPqR0"
      },
      "source": [
        "predict = []\n",
        "model.eval() # set the model to evaluation mode\n",
        "with torch.no_grad():\n",
        "    for i, data in enumerate(test_loader):\n",
        "        inputs = data\n",
        "        inputs = inputs.to(device)\n",
        "        outputs = model(inputs)\n",
        "        _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n",
        "\n",
        "        for y in test_pred.cpu().numpy():\n",
        "            predict.append(y)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AWDf_C-omElb"
      },
      "source": [
        "Write prediction to a CSV file.\n",
        "\n",
        "After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GuljYSPHcZir"
      },
      "source": [
        "with open('prediction.csv', 'w') as f:\n",
        "    f.write('Id,Class\\n')\n",
        "    for i, y in enumerate(predict):\n",
        "        f.write('{},{}\\n'.format(i, y))"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}