{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "accelerator": "GPU", "colab": { "name": "SHARE MLSpring2021 - HW2-1.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "OYlaRwNu7ojq" }, "source": [ "# **Homework 2-1 Phoneme Classification**" ] }, { "cell_type": "markdown", "metadata": { "id": "emUd7uS7crTz" }, "source": [ "## The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)\n", "The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.\n", "\n", "This homework is a multiclass classification task, \n", "we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.\n", "\n", "link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3" ] }, { "cell_type": "markdown", "metadata": { "id": "KVUGfWTo7_Oj" }, "source": [ "## Download Data\n", "Download data from google drive, then unzip it.\n", "\n", "You should have `timit_11/train_11.npy`, `timit_11/train_label_11.npy`, and `timit_11/test_11.npy` after running this block.

\n", "`timit_11/`\n", "- `train_11.npy`: training data
\n", "- `train_label_11.npy`: training label
\n", "- `test_11.npy`: testing data

\n", "\n", "**notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace**\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OzkiMEcC3Foq", "outputId": "4308c64c-6885-4d1c-8eb7-a2d9b8038401" }, "source": [ "!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip\n", "!unzip data.zip\n", "!ls " ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Downloading...\n", "From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR\n", "To: /content/data.zip\n", "372MB [00:03, 121MB/s]\n", "Archive: data.zip\n", " creating: timit_11/\n", " inflating: timit_11/train_11.npy \n", " inflating: timit_11/test_11.npy \n", " inflating: timit_11/train_label_11.npy \n", "data.zip sample_data timit_11\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "_L_4anls8Drv" }, "source": [ "## Preparing Data\n", "Load the training and testing data from the `.npy` file (NumPy array)." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "IJjLT8em-y9G", "outputId": "8edc6bfe-7511-447f-f239-00b96dba6dcf" }, "source": [ "import numpy as np\n", "\n", "print('Loading data ...')\n", "\n", "data_root='./timit_11/'\n", "train = np.load(data_root + 'train_11.npy')\n", "train_label = np.load(data_root + 'train_label_11.npy')\n", "test = np.load(data_root + 'test_11.npy')\n", "\n", "print('Size of training data: {}'.format(train.shape))\n", "print('Size of testing data: {}'.format(test.shape))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Loading data ...\n", "Size of training data: (1229932, 429)\n", "Size of testing data: (451552, 429)\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "us5XW_x6udZQ" }, "source": [ "## Create Dataset" ] }, { "cell_type": "code", "metadata": { "id": "Fjf5EcmJtf4e" }, "source": [ "import torch\n", "from torch.utils.data import Dataset\n", "\n", "class TIMITDataset(Dataset):\n", " def __init__(self, X, y=None):\n", " self.data = torch.from_numpy(X).float()\n", " if y is not None:\n", " y = y.astype(np.int)\n", " self.label = torch.LongTensor(y)\n", " else:\n", " self.label = None\n", "\n", " def __getitem__(self, idx):\n", " if self.label is not None:\n", " return self.data[idx], self.label[idx]\n", " else:\n", " return self.data[idx]\n", "\n", " def __len__(self):\n", " return len(self.data)\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "otIC6WhGeh9v" }, "source": [ "Split the labeled data into a training set and a validation set, you can modify the variable `VAL_RATIO` to change the ratio of validation data." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sYqi_lAuvC59", "outputId": "13dabe63-4849-47ee-fe04-57427b9d601c" }, "source": [ "VAL_RATIO = 0.2\n", "\n", "percent = int(train.shape[0] * (1 - VAL_RATIO))\n", "train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]\n", "print('Size of training set: {}'.format(train_x.shape))\n", "print('Size of validation set: {}'.format(val_x.shape))" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Size of training set: (983945, 429)\n", "Size of validation set: (245987, 429)\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "nbCfclUIgMTX" }, "source": [ "Create a data loader from the dataset, feel free to tweak the variable `BATCH_SIZE` here." ] }, { "cell_type": "code", "metadata": { "id": "RUCbQvqJurYc" }, "source": [ "BATCH_SIZE = 64\n", "\n", "from torch.utils.data import DataLoader\n", "\n", "train_set = TIMITDataset(train_x, train_y)\n", "val_set = TIMITDataset(val_x, val_y)\n", "train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) #only shuffle the training data\n", "val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "_SY7X0lUgb50" }, "source": [ "Cleanup the unneeded variables to save memory.
\n", "\n", "**notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later
the data size is quite huge, so be aware of memory usage in colab**" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "y8rzkGraeYeN", "outputId": "dc790996-a43c-4a99-90d4-e7928892a899" }, "source": [ "import gc\n", "\n", "del train, train_label, train_x, train_y, val_x, val_y\n", "gc.collect()" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "50" ] }, "metadata": { "tags": [] }, "execution_count": 6 } ] }, { "cell_type": "markdown", "metadata": { "id": "IRqKNvNZwe3V" }, "source": [ "## Create Model" ] }, { "cell_type": "markdown", "metadata": { "id": "FYr1ng5fh9pA" }, "source": [ "Define model architecture, you are encouraged to change and experiment with the model architecture." ] }, { "cell_type": "code", "metadata": { "id": "lbZrwT6Ny0XL" }, "source": [ "import torch\n", "import torch.nn as nn\n", "\n", "class Classifier(nn.Module):\n", " def __init__(self):\n", " super(Classifier, self).__init__()\n", " self.layer1 = nn.Linear(429, 1024)\n", " self.layer2 = nn.Linear(1024, 512)\n", " self.layer3 = nn.Linear(512, 128)\n", " self.out = nn.Linear(128, 39) \n", "\n", " self.act_fn = nn.Sigmoid()\n", "\n", " def forward(self, x):\n", " x = self.layer1(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.layer2(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.layer3(x)\n", " x = self.act_fn(x)\n", "\n", " x = self.out(x)\n", " \n", " return x" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "VRYciXZvPbYh" }, "source": [ "## Training" ] }, { "cell_type": "code", "metadata": { "id": "y114Vmm3Ja6o" }, "source": [ "#check device\n", "def get_device():\n", " return 'cuda' if torch.cuda.is_available() else 'cpu'" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "sEX-yjHjhGuH" }, "source": [ "Fix random seeds for reproducibility." ] }, { "cell_type": "code", "metadata": { "id": "88xPiUnm0tAd" }, "source": [ "# fix random seed\n", "def same_seeds(seed):\n", " torch.manual_seed(seed)\n", " if torch.cuda.is_available():\n", " torch.cuda.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed) \n", " np.random.seed(seed) \n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "KbBcBXkSp6RA" }, "source": [ "Feel free to change the training parameters here." ] }, { "cell_type": "code", "metadata": { "id": "QTp3ZXg1yO9Y" }, "source": [ "# fix random seed for reproducibility\n", "same_seeds(0)\n", "\n", "# get device \n", "device = get_device()\n", "print(f'DEVICE: {device}')\n", "\n", "# training parameters\n", "num_epoch = 20 # number of training epoch\n", "learning_rate = 0.0001 # learning rate\n", "\n", "# the path where checkpoint saved\n", "model_path = './model.ckpt'\n", "\n", "# create model, define a loss function, and optimizer\n", "model = Classifier().to(device)\n", "criterion = nn.CrossEntropyLoss() \n", "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "CdMWsBs7zzNs", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "c5ed561e-610d-4a35-d936-fd97adf342a0" }, "source": [ "# start training\n", "\n", "best_acc = 0.0\n", "for epoch in range(num_epoch):\n", " train_acc = 0.0\n", " train_loss = 0.0\n", " val_acc = 0.0\n", " val_loss = 0.0\n", "\n", " # training\n", " model.train() # set the model to training mode\n", " for i, data in enumerate(train_loader):\n", " inputs, labels = data\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " optimizer.zero_grad() \n", " outputs = model(inputs) \n", " batch_loss = criterion(outputs, labels)\n", " _, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n", " batch_loss.backward() \n", " optimizer.step() \n", "\n", " train_acc += (train_pred.cpu() == labels.cpu()).sum().item()\n", " train_loss += batch_loss.item()\n", "\n", " # validation\n", " if len(val_set) > 0:\n", " model.eval() # set the model to evaluation mode\n", " with torch.no_grad():\n", " for i, data in enumerate(val_loader):\n", " inputs, labels = data\n", " inputs, labels = inputs.to(device), labels.to(device)\n", " outputs = model(inputs)\n", " batch_loss = criterion(outputs, labels) \n", " _, val_pred = torch.max(outputs, 1) \n", " \n", " val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability\n", " val_loss += batch_loss.item()\n", "\n", " print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(\n", " epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)\n", " ))\n", "\n", " # if the model improves, save a checkpoint at this epoch\n", " if val_acc > best_acc:\n", " best_acc = val_acc\n", " torch.save(model.state_dict(), model_path)\n", " print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))\n", " else:\n", " print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(\n", " epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)\n", " ))\n", "\n", "# if not validating, save the last epoch\n", "if len(val_set) == 0:\n", " torch.save(model.state_dict(), model_path)\n", " print('saving model at last epoch')\n" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "[001/020] Train Acc: 0.467390 Loss: 1.812880 | Val Acc: 0.564884 loss: 1.440870\n", "saving model with acc 0.565\n", "[002/020] Train Acc: 0.594031 Loss: 1.332670 | Val Acc: 0.629594 loss: 1.209077\n", "saving model with acc 0.630\n", "[003/020] Train Acc: 0.644419 Loss: 1.154247 | Val Acc: 0.658295 loss: 1.102313\n", "saving model with acc 0.658\n", "[004/020] Train Acc: 0.672767 Loss: 1.051355 | Val Acc: 0.675568 loss: 1.040186\n", "saving model with acc 0.676\n", "[005/020] Train Acc: 0.691564 Loss: 0.982245 | Val Acc: 0.683853 loss: 1.004628\n", "saving model with acc 0.684\n", "[006/020] Train Acc: 0.705731 Loss: 0.930892 | Val Acc: 0.691707 loss: 0.977562\n", "saving model with acc 0.692\n", "[007/020] Train Acc: 0.716722 Loss: 0.890210 | Val Acc: 0.691016 loss: 0.973670\n", "[008/020] Train Acc: 0.726312 Loss: 0.856612 | Val Acc: 0.690207 loss: 0.971627\n", "[009/020] Train Acc: 0.734965 Loss: 0.827445 | Val Acc: 0.698561 loss: 0.942904\n", "saving model with acc 0.699\n", "[010/020] Train Acc: 0.741926 Loss: 0.801676 | Val Acc: 0.698854 loss: 0.946376\n", "saving model with acc 0.699\n", "[011/020] Train Acc: 0.748191 Loss: 0.779319 | Val Acc: 0.700944 loss: 0.938454\n", "saving model with acc 0.701\n", "[012/020] Train Acc: 0.754672 Loss: 0.758071 | Val Acc: 0.699423 loss: 0.940523\n", "[013/020] Train Acc: 0.759725 Loss: 0.739450 | Val Acc: 0.699728 loss: 0.951068\n", "[014/020] Train Acc: 0.765137 Loss: 0.721372 | Val Acc: 0.701903 loss: 0.938658\n", "saving model with acc 0.702\n", "[015/020] Train Acc: 0.769828 Loss: 0.704748 | Val Acc: 0.701761 loss: 0.937079\n", "[016/020] Train Acc: 0.774698 Loss: 0.688990 | Val Acc: 0.702293 loss: 0.938634\n", "saving model with acc 0.702\n", "[017/020] Train Acc: 0.779358 Loss: 0.674498 | Val Acc: 0.702492 loss: 0.943941\n", "saving model with acc 0.702\n", "[018/020] Train Acc: 0.783076 Loss: 0.660028 | Val Acc: 0.695195 loss: 0.966189\n", "[019/020] Train Acc: 0.787432 Loss: 0.646340 | Val Acc: 0.700708 loss: 0.958220\n", "[020/020] Train Acc: 0.791536 Loss: 0.633378 | Val Acc: 0.700643 loss: 0.957066\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "1Hi7jTn3PX-m" }, "source": [ "## Testing" ] }, { "cell_type": "markdown", "metadata": { "id": "NfUECMFCn5VG" }, "source": [ "Create a testing dataset, and load model from the saved checkpoint." ] }, { "cell_type": "code", "metadata": { "id": "1PKjtAScPWtr", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "8c17272b-536a-4692-a95f-a3292766c698" }, "source": [ "# create testing dataset\n", "test_set = TIMITDataset(test, None)\n", "test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)\n", "\n", "# create model and load weights from checkpoint\n", "model = Classifier().to(device)\n", "model.load_state_dict(torch.load(model_path))" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": { "tags": [] }, "execution_count": 12 } ] }, { "cell_type": "markdown", "metadata": { "id": "940TtCCdoYd0" }, "source": [ "Make prediction." ] }, { "cell_type": "code", "metadata": { "id": "84HU5GGjPqR0" }, "source": [ "predict = []\n", "model.eval() # set the model to evaluation mode\n", "with torch.no_grad():\n", " for i, data in enumerate(test_loader):\n", " inputs = data\n", " inputs = inputs.to(device)\n", " outputs = model(inputs)\n", " _, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability\n", "\n", " for y in test_pred.cpu().numpy():\n", " predict.append(y)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "AWDf_C-omElb" }, "source": [ "Write prediction to a CSV file.\n", "\n", "After finish running this block, download the file `prediction.csv` from the files section on the left-hand side and submit it to Kaggle." ] }, { "cell_type": "code", "metadata": { "id": "GuljYSPHcZir" }, "source": [ "with open('prediction.csv', 'w') as f:\n", " f.write('Id,Class\\n')\n", " for i, y in enumerate(predict):\n", " f.write('{},{}\\n'.format(i, y))" ], "execution_count": null, "outputs": [] } ] }