{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# k-Means" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Method\n", "\n", "Because of the excellent speed and good expandability,K-Means cluster method is regarded as the most famous cluster method。***K-Means algorthms is a process which repeatly moves the center point,moving the center point of the class which is called centroids to the average position of all other members,and redivide the members of it.***\n", "\n", "K is the hyper-parameter which has been calculated, represents the numbers of class. K-means can distribute sample into different class automatically, without deciding the numbers of the class.\n", "\n", "K must be a positive integer smaller than the number of samples in the training set. Sometimes, the number of classes is specified by the content of the question. For example, a shoe factory has three new styles. It wants to know which potential customers each new style has, so it investigates customers and then finds out three types from the data. \n", "\n", "The parameter of K-Means is the centriod positon of class and the position of its internal observation. Similar with generalized linear models and decision tree, the optimal solution of k-means parameter is also the goal of minimizing the cost function. The cost function of K-Means is\n", ":\n", "$$\n", "J = \\sum_{k=1}^{K} \\sum_{i \\in C_k} | x_i - u_k|^2\n", "$$\n", "\n", "$u_k$is the centriod poisition of samples from type $C_k$ with the definition of:\n", "$$\n", "u_k = \\frac{1}{|C_k|} \\sum_{x \\in C_k} x\n", "$$\n", "\n", "Cost is the sum of each class distortions. Every class distortion equal to the sum of quare between centroids of this class and its inner members. The more compact the members inside the class are, the less the class distorts. On the contrary, the more disperse the members are more distort. \n", "\n", "The argument for minimizing the cost function is a process of repeatedly configuring the observations contained in each class and constantly moving the class's ctriod.\n", "1. Firstly, class centriod is a random determined poisition. In fact, the poisition of centriod equal to observed value which being determined radomly.首\n", "2. At each iteration, K-Means will assigns the observations to the class closest to them and move the centriod to the average value of all class members.\n", "3. If the maximum number of iteration steps is reached or the difference between two iterations is less than the set threshold, the algorithm is finished, otherwise repeat step 2.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD8CAYAAABXe05zAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAADgNJREFUeJzt3U+I3Pd5x/HPZ1cZZaSEJOCwpZKpdAgpIlCcFcFT0zB0ekhIqC8tOOAUsoe9JI6TpgQ7UHLUJYT4kBaMPbl4SKBKDiE1ccp251BmENEfQyIpAeM6thybOAcnWRd+U2mfHrTbUY2q/cman77zzL5fMKBd764fnp197+i3O/o6IgQAyGOp9AAAgNtDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJHOgiQ96zz33xLFjx5r40LW99dZbOnz4cNEZ5gW7mGIXU+xiah52ce7cud9GxAfrvG0j4T527JjOnj3bxIeubTgcqtvtFp1hXrCLKXYxxS6m5mEXtn9V9225VAIAyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgmVrhtv1l2xdt/9z2d22/u+nBAAA3t2e4bR+R9EVJJyPiI5KWJT3U9GAAgJure6nkgKS27QOSDkn6dXMjAWjaeDzWYDDQeDwuPQregT3DHRGvSvqGpJclvSbpdxHxk6YHA9CM8XisXq+nfr+vXq9HvBPa87Bg2x+Q9KCk45LelPQvth+OiGfe9nbrktYlaWVlRcPhcPbT3oatra3iM8wLdjHFLqTBYKCqqrS9va2qqtTv91VVVemxikp3v4iIW94k/a2kp294+e8k/dOt3md1dTVK29zcLD3C3GAXU+wiYjQaRbvdjqWlpWi32zEajUqPVNw83C8knY09erx7q3ON+2VJ99s+ZNuSepIuN/R9BEDDOp2ONjY2tLa2po2NDXU6ndIj4TbteakkIs7YPi3pvKSrki5IerLpwQA0p9PpqKoqop3UnuGWpIj4uqSvNzwLAKAGnjkJAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7jRuPF4rFOnTnEordjFjeZlFxlPvK91kALwTu2eKD6ZTNRqtfb1UVnsYmpedrE7R1VVGgwGaT4nPOJGo4bDoSaTia5du6bJZJLrJO0ZYxdT87KL3Tm2t7dTfU4INxrV7XbVarW0vLysVqulbrdbeqRi2MXUvOxid46lpaVUnxMulaBRuyeKD4dDdbvdFH8NbQq7mJqXXezO0e/3tba2luZzQrjRuE6nk+YLomnsYmpedpHxxHsulQBAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgmVrhtv1+26dt/8L2Zdt5/v1DAFgwdf897ick/Tgi/sZ2S9KhBmcCANzCno+4bb9P0sclPS1JETGJiDebHgyYtYyneQM3U+dSyXFJb0j6ju0Ltp+yfbjhuYCZ2j3Nu9/vq9frEW+kVudSyQFJH5X0SEScsf2EpMck/eONb2R7XdK6JK2srBQ/LXlra6v4DPOCXUiDwUBVVWl7e1tVVanf76uqqtJjFcX9YirdLiLiljdJfyTppRte/gtJ/3qr91ldXY3SNjc3S48wN9hFxGg0ina7HUtLS9Fut2M0GpUeqTjuF1PzsAtJZ2OPHu/e9rxUEhGvS3rF9od3XtWTdKmZbyNAM3ZP815bW9PGxkaqg2GBt6v7WyWPSBrs/EbJi5I+19xIQDMynuYN3EytcEfE85JONjwLAKAGnjkJAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQIN3AXjcdjnTp1ijMvxS7uRN2DFADcod0DiyeTiVqt1r4+iYdd3BkecQN3yXA41GQy0bVr1zSZTHIdTjtj7OLOEG7gLul2u2q1WlpeXlar1VK32y09UjHs4s5wqQS4S3YPLB4Oh+p2u/v60gC7uDOEG7iLOp0OkdrBLt45LpUAQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIJna4ba9bPuC7R81ORAA4NZu5xH3o5IuNzUIAKCeWuG2fVTSpyQ91ew4i4VTrAE0oe4JON+S9FVJ721wloXCKdYAmrJnuG1/WtJvIuKc7e4t3m5d0rokraysFD+1eWtrq+gMg8FAVVVpe3tbVVWp3++rqqois5TexTxhF1PsYirdLiLiljdJpyRdkfSSpNcl/ZekZ271Pqurq1Ha5uZm0f//aDSKdrsdy8vL0W63YzQaFZul9C7mCbuYYhdT87ALSWdjjx7v3vZ8xB0Rj0t6XJJ2HnH/Q0Q83My3kcXBKdYAmsIp7w3iFGsATbitcEfEUNKwkUkAALXwzEkASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwo3Gcdo9MFucgINGcdo9MHs84kajhsOhJpOJrl27pslkkuskbWBOEW40qtvtqtVqaXl5Wa1WS91ut/RIQHpcKkGjOO0emD3CjcZx2j0wW1wqAYBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0Aye4bb9r22N21fsn3R9qN3YzAAwM3V+fe4r0r6SkSct/1eSeds/1tEXGp4NgDATez5iDsiXouI8zt//oOky5KOND0YZmM8HmswGHDCOrBAbusat+1jku6TdKaJYTBbuyes9/t99Xo94g0siNpHl9l+j6TvS/pSRPz+Jv99XdK6JK2srBQ/zXtra6v4DKUNBgNVVaXt7W1VVaV+v6+qqkqPVRT3iyl2MZVuFxGx503SuyQ9J+nv67z96upqlLa5uVl6hOJGo1G02+1YWlqKdrsdo9Go9EjFcb+YYhdT87ALSWejRl8jotZvlVjS05IuR8Q3G/0ugpnaPWF9bW1NGxsbHNgLLIg6l0oekPRZST+z/fzO674WEc82NxZmpdPpqKoqog0skD3DHRH/Icl3YRYAQA08cxIAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQDOEGgGQINwAkQ7gBIBnCDQDJEG4ASIZwA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkEytcNv+hO1f2n7B9mNNDwUA+P/tGW7by5K+LemTkk5I+oztE00PBgC4uTqPuD8m6YWIeDEiJpK+J+nBZse6M+PxWIPBQOPxuPQoADBzdcJ9RNIrN7x8Zed1c2k8HqvX66nf76vX6xFvAAvnwKw+kO11SeuStLKyouFwOKsPfVsGg4GqqtL29raqqlK/31dVVUVmmRdbW1vFPh/zhl1MsYupbLuoE+5XJd17w8tHd173f0TEk5KelKSTJ09Gt9udxXy37eDBg/8b74MHD2ptbU2dTqfILPNiOByq1Odj3rCLKXYxlW0XdS6V/FTSh2wft92S9JCkHzY71jvX6XS0sbGhtbU1bWxs7PtoA1g8ez7ijoirtr8g6TlJy5L6EXGx8cnuQKfTUVVVRBvAQqp1jTsinpX0bMOzAABq4JmTAJAM4QaAZAg3ACRDuAEgGcINAMkQbgBIhnADQDKEGwCSIdwAkAzhBoBkCDcAJEO4ASAZwg0AyRBuAEiGcANAMoQbAJIh3ACQjCNi9h/UfkPSr2b+gW/PPZJ+W3iGecEuptjFFLuYmodd/ElEfLDOGzYS7nlg+2xEnCw9xzxgF1PsYopdTGXbBZdKACAZwg0AySxyuJ8sPcAcYRdT7GKKXUyl2sXCXuMGgEW1yI+4AWAhLWS4bX/C9i9tv2D7sdLzlGL7Xtubti/Zvmj70dIzlWR72fYF2z8qPUtJtt9v+7TtX9i+bLtTeqZSbH9552vj57a/a/vdpWeqY+HCbXtZ0rclfVLSCUmfsX2i7FTFXJX0lYg4Iel+SZ/fx7uQpEclXS49xBx4QtKPI+JPJf2Z9ulObB+R9EVJJyPiI5KWJT1Udqp6Fi7ckj4m6YWIeDEiJpK+J+nBwjMVERGvRcT5nT//Qde/QI+UnaoM20clfUrSU6VnKcn2+yR9XNLTkhQRk4h4s+xURR2Q1LZ9QNIhSb8uPE8tixjuI5JeueHlK9qnsbqR7WOS7pN0puwkxXxL0lclbZcepLDjkt6Q9J2dy0ZP2T5ceqgSIuJVSd+Q9LKk1yT9LiJ+UnaqehYx3Hgb2++R9H1JX4qI35ee526z/WlJv4mIc6VnmQMHJH1U0j9HxH2S3pK0L38OZPsDuv638eOS/ljSYdsPl52qnkUM96uS7r3h5aM7r9uXbL9L16M9iIgflJ6nkAck/bXtl3T90tlf2n6m7EjFXJF0JSJ2/+Z1WtdDvh/9laT/jIg3IuK/Jf1A0p8XnqmWRQz3TyV9yPZx2y1d/2HDDwvPVIRt6/q1zMsR8c3S85QSEY9HxNGIOKbr94d/j4gUj6xmLSJel/SK7Q/vvKon6VLBkUp6WdL9tg/tfK30lOQHtQdKDzBrEXHV9hckPafrPyXuR8TFwmOV8oCkz0r6me3nd173tYh4tuBMKO8RSYOdBzYvSvpc4XmKiIgztk9LOq/rv4F1QUmeQckzJwEgmUW8VAIAC41wA0AyhBsAkiHcAJAM4QaAZAg3ACRDuAEgGcINAMn8DzWXEr0zzEqRAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "% matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "X0 = np.array([7, 5, 7, 3, 4, 1, 0, 2, 8, 6, 5, 3])\n", "X1 = np.array([5, 7, 7, 3, 6, 4, 0, 2, 7, 8, 5, 7])\n", "plt.figure()\n", "plt.axis([-1, 9, -1, 9])\n", "plt.grid(True)\n", "plt.plot(X0, X1, 'k.');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we intiate K-Means, set the centriod of the first class at the fifth sample and the centriod of the second class at the eleventh sample. Then we can calcualte the distance between each instance and two centriod, assigning them to the nearest class. The results are showing in the following talbe:\n", "![data_0](images/data_0.png)\n", "\n", "New centriod position and initial cluster result are shown in the following graph. The fist class are shown in X and the second are represented in dot. The position of centriod are indicated in a larger dot.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFYlJREFUeJzt3X+U3XV95/Hnm0kGCbHQ3bhpgQmD1UUpHIWE1pHanXHctayi5+w5ZW2RHM160vasgkU2qyLUVSmttVTtWntYTVlg1mwOup6CWHUnc/donbJJkF1+RM6hMGQAXbHKj4CdScJ7//je4U5CkrkDc/O9n5nn45x7Zr7f+73f+7qf3Lzudz73znwjM5EkleOYugNIkubH4pakwljcklQYi1uSCmNxS1JhLG5JKozFrY6KiDdExH01Z/hwRHyhzgwvVkRkRLyi7hzqDha3AIiI90bEjoiYiojr53G7iYh40+Guz8xvZ+bp7W7/YkXEYEQ8fFCGP8zM93TqPo+2iLg+Ij5Rdw7VZ1ndAdQ1HgU+AbwZOK7mLIcUEQFEZj5bd5ZDiYhlmbmv7hxa/DziFgCZ+ZXM/CrwDwdfFxGrIuLWiHg8In4SEd+OiGMi4kZgDXBLROyJiE2HuO1zR8CH2z4iXhcR323u//9ExOCs2zci4uqI+FvgGeDlEfHuiNgVEU9FxAMR8TvNbY8Hvg6c1Nz/nog4KSI+GhE3zdrn2yLinub9NSLi1bOum4iIyyPi/0bEExHx3yPiJYcas4h4V0T8bUT8WUT8A/DR5voNzXw/jYhvRMSpzfXR3PZHEfFkRNwVEWfOepzvOWjf3znEfW4ELgI2NR/fLc31/zEiHmmOyX0RMXyozFokMtOLl+cuVEfd1x+07hrgL4HlzcsbqI58ASaANx1hf4PAw7OWD9geOJnqxeJfUx1I/Mvm8sua1zeA3cAvU/2EuBx4C/BLQAD/gqrQzznU/TXXfRS4qfn9Pweebt7PcmATcD/QOyvf/wZOAv4JsAv43cM8tncB+4D3NbMdB7y9ub9XN9d9BPhuc/s3AzuBE5vZXw384qzH+Z6D9v2dWcsJvKL5/fXAJ2ZddzowCZzUXO4Hfqnu55KXzl084lY79gK/CJyamXuzmrdeqD9y807gtsy8LTOfzcxvATuoinzG9Zl5T2bua97/1zLz77Pyv4BvUr2YtOPfAl/LzG9l5l7gU1SF+/pZ23w2Mx/NzJ8AtwCvPcL+Hs3MP29m+xnwu8A1mbkrq2mTPwRe2zzq3gu8FHgV1Qvfrsz8QZu5j2Q/cCxwRkQsz8yJzPz7BdivupTFrXb8CdVR5DebUxMfXMB9nwr8ZnPa4vGIeBz4NaoXihmTs28QEedHxN81p20epyr5VW3e30nAQzMLWc2XT1Id+c/44azvnwFWHmF/kwctnwp8ZtZj+QnV0fXJmbkN+M/A54AfRcR1EfFzbeY+rMy8H3g/1U8WP4qILRFx0ovdr7qXxa05ZeZTmfmBzHw58DbgsllzqPM98j54+0ngxsw8cdbl+Mz8o0PdJiKOBb5MdaS8OjNPBG6jKsd28jxKVa4z+wugD3hkno/jedmaJoHfOejxHJeZ3wXIzM9m5lrgDKppm//QvN3TwIpZ+/mFedwnmfnfMvPXqB5bAn/8wh6OSmBxC6g+EdF8E64H6ImIl0TEsuZ1b42IVzRL7gmqH81nPtnx/4CXz+OuDt7+JuCCiHhzRMzc72BEnHKY2/dSTQs8BuyLiPOBf3XQ/v9pRJxwmNtvBd4SEcMRsRz4ADAFfHcej+FI/hL4UET8MkBEnBARv9n8/tyI+NXm/T4N/COtcbwT+DcRsSKqz2v/uyPcxwFjGBGnR8Qbmy9q/wj8bNZ+tQhZ3JrxEar/8B+kmnf+WXMdwCuB/wnsAcaBv8jMseZ11wAfaU4NXN7G/RywfWZOUr2h92GqMp6kOgo95HMzM58CLqEq4J8Cvw389azrvw98CXigeR8nHXT7+5qP78+BHwMXABdk5nQb2eeUmf+D6mh3S0Q8CdwNnN+8+ueA/9LM/RDVm7B/0rzuz4BpqlL+r8DIEe7mi1Tz2Y9HxFepXsj+qPl4fgj8M+BDC/F41J1mPhkgSSqER9ySVBiLW5IKY3FLUmEsbkkqTEf+yNSqVauyv7+/E7tu29NPP83xxx9fa4Zu4Vi0OBYtjkVLN4zFzp07f5yZL2tn244Ud39/Pzt27OjErtvWaDQYHBysNUO3cCxaHIsWx6KlG8YiIh6ae6uKUyWSVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUmLaKOyJ+PyLuiYi7I+JLEfGSTgeT1AGf/CSMjR24bmysWq9izFncEXEycAmwLjPPBHqAd3Q6mKQOOPdcuPDCVnmPjVXL555bby7NS7vnnFwGHBcRe4EVwKOdiySpY4aGYOtWuPBC+s8/H77+9Wp5aKjuZJqHyMy5N4q4FLga+Bnwzcy86BDbbAQ2AqxevXrtli1bFjjq/OzZs4eVK1fWmqFbOBYtjkWlf/Nm+m+8kYmLL2Ziw4a649SuG54XQ0NDOzNzXVsbZ+YRL8DPA9uAlwHLga8C7zzSbdauXZt1GxsbqztC13AsWhyLzNy2LXPVqnzw4oszV62qlpe4bnheADtyjj6eubTz5uSbgAcz87HM3At8BXj9C3hBkVS3mTntrVurI+3mtMnz3rBUV2unuHcDr4uIFRERwDCwq7OxJHXE9u0HzmnPzHlv315vLs3LnG9OZubtEXEzcAewD/gecF2ng0nqgE2bnr9uaMg3JwvT1qdKMvMPgD/ocBZJUhv8zUlJKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNzqHM8o3uJYaAFZ3Ooczyje4lg8z/jkONd8+xrGJ8drzzGye6T2HPPR7lnepfmbdUZxfu/34POfX7pnFHcsDjA+Oc7wDcNM75+mt6eX0fWjDPQN1JZjat8UI5MjteWYL4+41VlDQ1VRffzj1dclWlSAYzFLY6LB9P5p9ud+pvdP05ho1JrjWZ6tNcd8WdzqrLGx6ujyyiurr0v5pLSOxXMG+wfp7emlJ3ro7ellsH+w1hzHcEytOebLqRJ1zqwzij93XsPZy0uJY3GAgb4BRteP0phoMNg/WNv0xEyOzWOb2TC0oYhpErC41UlHOqP4Uisrx+J5BvoGuqIoB/oGmFoz1RVZ2mVxq3M8o3iLY6EF5By3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbi9OhThV2OJ5CTIWxuLU4HXyqsMPxFGIqUFvFHREnRsTNEfH9iNgVEeX8/UMtTbNPFXa48j74b2RLhWj3iPszwN9k5quA1wC7OhdJWiAz5X3BBXDttQded+211XpLWwWa8+9xR8QJwK8D7wLIzGlgurOxpAUyNAQf+xhcfnm1fM45VWlffjl86lOWtorUzokUTgMeA/4qIl4D7AQuzcynO5pMWiiXXVZ9vfxyXnvmmXD33VVpz6yXChOZeeQNItYBfwecl5m3R8RngCcz88qDttsIbARYvXr12i1btnQocnv27NnDypUra83QLRyLymsvuYQT77qLx886izs/+9m649TO50VLN4zF0NDQzsxc19bGmXnEC/ALwMSs5TcAXzvSbdauXZt1GxsbqztC13AsMvNP/zQzIn961lmZEdXyEufzoqUbxgLYkXP08cxlzqmSzPxhRExGxOmZeR8wDNz7Ql9VpKNu1pz2neecw+Add7TmvJ0uUYHaPVnw+4CRiOgFHgDe3blI0gIaG4OrrmrNaTcarbK+6io4+2zfoFRx2iruzLwTaG/uReoWM5/TvuWW55fzZZdVpe3nuFUgf3NSi1M7v1zTzi/pSF3I4tbitH17e0fSM+W9ffvRySUtgHbnuKWybNrU/rZDQ06VqCgecUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtHSUjd43Q/+l+jvlPx9D/6X5G7hqpO5IK5d8qkY6CkbtG2HjLRp7Z+wwADz3xEBtv2QjARWddVGe02oxPjtOYaDDYP8hA30DdcYpicUtHwRWjVzxX2jOe2fsMV4xesSSLe3xynOEbhpneP01vTy+j60ct73lwqkQ6CnY/sXte6xe7xkSD6f3T7M/9TO+fpjHRqDtSUSxu6ShYc8Kaea1f7Ab7B+nt6aUneujt6WWwf7DuSEWxuKWj4Orhq1mxfMUB61YsX8HVw1fXlKheA30DjK4f5eNDH3ea5AVwjls6Cmbmsa8YvYLdT+xmzQlruHr46iU5vz1joG/Awn6BLG7pKLnorIuWdFFr4ThVIkmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IK03ZxR0RPRHwvIm7tZKBF4ZOfhLGxA9eNjVXrJelFms8R96XArk4FWVTOPRcuvLBV3mNj1fK559abS9Ki0FZxR8QpwFuAL3Q2ziIxNARbt1ZlfdVV1detW6v1kvQiRWbOvVHEzcA1wEuByzPzrYfYZiOwEWD16tVrt2zZssBR52fPnj2sXLmy1gz9mzfTf+ONTFx8MRMbNtSWoxvGols4Fi2ORUs3jMXQ0NDOzFzX1saZecQL8FbgL5rfDwK3znWbtWvXZt3GxsbqDbBtW+aqVZlXXll93battii1j0UXcSxaHIuWbhgLYEfO0a0zl3amSs4D3hYRE8AW4I0RcdP8X0+WkJk57a1b4WMfa02bHPyGpSS9AHMWd2Z+KDNPycx+4B3Atsx8Z8eTlWz79gPntGfmvLdvrzeXpEXBs7x3wqZNz183NOSbk5IWxLyKOzMbQKMjSSRJbfE3JyWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuddz45DjXfPsaxifH644iLQqeSEEdNT45zvANw0zvn6a3p5fR9aMM9A3UHUsqmkfc6qjGRIPp/dPsz/1M75+mMdGoO5JUPItbHTXYP0hvTy890UNvTy+D/YN1R5KK51SJOmqgb4DR9aM0JhoM9g86TSItAItbHTfQN2BhSwvIqRJJKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVJg5izsi+iJiLCLujYh7IuLSoxFMknRo7fw97n3ABzLzjoh4KbAzIr6Vmfd2OJsk6RDmPOLOzB9k5h3N758CdgEndzqYFsb45Dgju0c8w7q0iMxrjjsi+oGzgds7EUYLa+YM65sf3MzwDcOWt7RItH3qsohYCXwZeH9mPnmI6zcCGwFWr15No9FYqIwvyJ49e2rPULeR3SNM7ZviWZ5lat8Um8c2M7Vmqu5YtfJ50eJYtJQ2FpGZc28UsRy4FfhGZl471/br1q3LHTt2LEC8F67RaDA4OFhrhrrNHHFP7Zvi2GXHMrp+dMmf+9HnRYtj0dINYxEROzNzXTvbtvOpkgC+COxqp7TVPWbOsL7htA2WtrSItDNVch5wMXBXRNzZXPfhzLytc7G0UAb6BphaM2VpS4vInMWdmd8B4ihkkSS1wd+clKTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTAWtyQVxuKWpMJY3JJUGItbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCtNWcUfEb0TEfRFxf0R8sNOhJEmHN2dxR0QP8DngfOAM4Lci4oxOB3sxxifHGdk9wvjkeN1RJGnBtXPE/SvA/Zn5QGZOA1uAt3c21gs3PjnO8A3DbH5wM8M3DFvekhadZW1sczIwOWv5YeBXD94oIjYCGwFWr15No9FYiHzzNrJ7hKl9UzzLs0ztm2Lz2Gam1kzVkqVb7Nmzp7Z/j27jWLQ4Fi2ljUU7xd2WzLwOuA5g3bp1OTg4uFC7npdjJ49lZLIq72OXHcuGoQ0M9A3UkqVbNBoN6vr36DaORYtj0VLaWLQzVfII0Ddr+ZTmuq400DfA6PpRNpy2gdH1o0u+tCUtPu0ccW8HXhkRp1EV9juA3+5oqhdpoG+AqTVTlrakRWnO4s7MfRHxXuAbQA+wOTPv6XgySdIhtTXHnZm3Abd1OIskqQ3+5qQkFcbilqTCWNySVBiLW5IKY3FLUmEsbkkqjMUtSYWxuCWpMBa3JBXG4pakwljcklQYi1uSCmNxS1JhLG5JKozFLUmFsbglqTCRmQu/04jHgIcWfMfzswr4cc0ZuoVj0eJYtDgWLd0wFqdm5sva2bAjxd0NImJHZq6rO0c3cCxaHIsWx6KltLFwqkSSCmNxS1JhFnNxX1d3gC7iWLQ4Fi2ORUtRY7Fo57glabFazEfckrQoWdySVJhFWdwR8RsRcV9E3B8RH6w7T10ioi8ixiLi3oi4JyIurTtTnSKiJyK+FxG31p2lThFxYkTcHBHfj4hdETFQd6a6RMTvN/9v3B0RX4qIl9SdqR2Lrrgjogf4HHA+cAbwWxFxRr2parMP+EBmngG8Dvj3S3gsAC4FdtUdogt8BvibzHwV8BqW6JhExMnAJcC6zDwT6AHeUW+q9iy64gZ+Bbg/Mx/IzGlgC/D2mjPVIjN/kJl3NL9/iuo/6Mn1pqpHRJwCvAX4Qt1Z6hQRJwC/DnwRIDOnM/PxelPVahlwXEQsA1YAj9acpy2LsbhPBiZnLT/MEi2r2SKiHzgbuL3eJLX5NLAJeLbuIDU7DXgM+KvmtNEXIuL4ukPVITMfAT4F7AZ+ADyRmd+sN1V7FmNx6yARsRL4MvD+zHyy7jxHW0S8FfhRZu6sO0sXWAacA3w+M88GngaW5PtAEfHzVD+NnwacBBwfEe+sN1V7FmNxPwL0zVo+pbluSYqI5VSlPZKZX6k7T03OA94WERNUU2dvjIib6o1Um4eBhzNz5ievm6mKfCl6E/BgZj6WmXuBrwCvrzlTWxZjcW8HXhkRp0VEL9WbDX9dc6ZaRERQzWXuysxr685Tl8z8UGaekpn9VM+HbZlZxJHVQsvMHwKTEXF6c9UwcG+Nkeq0G3hdRKxo/l8ZppA3apfVHWChZea+iHgv8A2qd4k3Z+Y9Nceqy3nAxcBdEXFnc92HM/O2GjOpfu8DRpoHNg8A7645Ty0y8/aIuBm4g+oTWN+jkF9991feJakwi3GqRJIWNYtbkgpjcUtSYSxuSSqMxS1JhbG4JakwFrckFeb/AyaUIWRb0bIhAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "C1 = [1, 4, 5, 9, 11]\n", "C2 = list(set(range(12)) - set(C1))\n", "X0C1, X1C1 = X0[C1], X1[C1]\n", "X0C2, X1C2 = X0[C2], X1[C2]\n", "plt.figure()\n", "plt.title('1st iteration results')\n", "plt.axis([-1, 9, -1, 9])\n", "plt.grid(True)\n", "plt.plot(X0C1, X1C1, 'rx')\n", "plt.plot(X0C2, X1C2, 'g.')\n", "plt.plot(4,6,'rx',ms=12.0)\n", "plt.plot(5,5,'g.',ms=12.0);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we recalculate the centiod of two class, move centriod to the new poisition, recalculate the distance between each sample and new centriod and reclassify the sample according the distacne.\n", "\n", "![data_1](images/data_1.png)\n", "\n", "The result of drawing are shown as follows:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFQZJREFUeJzt3X+U3Hdd7/Hnu5sfNAkWNLBIm7DRo60RxZoUuvSqu271UKhyz9FbfpT0Qg43VxQt3t6DFm6lFirq8XjAg/ZeLKk0rOTWwrlirVJNd/VCY23SVkubopWkSUtLA9gfm8Juk7zvH/PdO0PYzc4mO/nOZ/b5OGfO7nfmO9/ve967+9rvfL4z84nMRJJUjtPqLkCSND8GtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuLZiIeGtEfG6W29ZGxERE9J3qulpquDQibqtr/wshIvZFxIV116F6GdyLWEQsj4iPRcTDEfFMRNwbERd1Yl+ZuT8zV2XmkWrf4xHx9k7sq9r+QERkRCxpqWE0M3+6U/s81SLi6oj4RN116NQzuBe3JcAB4CeAM4D/AdwUEQM11tSWOo/c59L6z0LqBIN7EcvMQ5l5dWbuy8yjmXkLsBfYABARQxHxSERcERFPRMRjEfG26ftHxHdFxGci4umI+Efge2fbV+sRcERcC/wY8JFq+OQj1TrnRMTfRMTXI+KLEXFJy/3/JCKui4hbI+IQMBwRr4uIe6r9H4iIq1t2+ffV1yerfQweO5QTEa+OiLsi4qnq66tbbhuPiPdHxOerZyO3RcTqWR7bdJ9+LSIeB26orr+4ehbzZETcERE/3HKfX4uIR6ttfzEiRloe5weO3fYM+3wN8B7gDdXj+6fq+rdGxJeq7e6NiEtn+5moYJnpxQuZCdAPfBM4p1oeAg4D1wBLgdcCzwIvrG7fDtwErAReDjwKfG6WbQ8ACSyplseBt7fcvpLG0f/baDwTOBf4KrC+uv1PgKeAC2gccDyvqu+HquUfBr4C/MeZ9ldd99bp+oDvBP4d2FTt703V8ne11PdvwPcDp1fLvz3LY5vu0+8Ay6v1zwWeAF4F9AH/GdhX3X529Vhf2lLr97Y8zg8cs+1HWpb3ARdW318NfOKYHj4NnF0tfzfwg3X/XnlZ+ItH3AIgIpYCo8DHM/PBlpueA67JzOcy81ZgAji7Gqr4OeA3snHk/gXg4ydRwsXAvsy8ITMPZ+Y9wKeA/9Syzp9n5uez8ezgm5k5npn3Vcv/DHySxrBPO14H/Gtmbqv290ngQeBnWta5ITP/JTO/QeMf1I8cZ3tHgfdl5mS1/hbgf2XmnZl5JDM/DkwC5wNHaAT4+ohYmo1nPP/WZt1zOQq8PCJOz8zHMvP+BdquuojBLSLiNGAbMAW885ibv5aZh1uWnwVWAS+iOUY+7eGTKONlwKuqYYUnI+JJ4FLgJS3rtO6LiHhVRIxFxMGIeAr4BWDG4YwZvHSGeh8GzmxZfrzl++nHPZuDmfnNluWXAVcc83jW0DjKfgh4F40j5iciYntEvLTNumeVmYeAN9Dow2MR8ZcRcc7Jblfdx+Be5CIigI/RGCb5ucx8rs27HqQxPLCm5bq189j1sR9LeQD4u8x8QctlVWa+4zj3+VPgM8CazDwD+J9AzLLusb5MI1xbraUx3HMiZno81x7zeFZUR/Zk5p9m5n+oakgawywAh4AVLdt5CbP7tseYmZ/NzJ+iMUzyIPDHJ/Zw1M0Mbl0H/ADwM9VT/LZk42V9nwaujogVEbGexjhuu74CfE/L8i3A90fEpohYWl3Oi4gfOM42ng98PTO/GRGvBN7ccttBGsMG3zPjPeHWan9vrk6YvgFYX9WxEP4Y+IXqWUFExMrqZOrzI+LsiPjJiFhO45zCN6paAe4FXhsR3xkRL6FxZD6brwAD1TMmIqI/Il4fEStpDMtMtGxXPcTgXsQi4mXAf6Uxdvt49eqEiXm8EuGdNIYPHqdxUu2Geez+w8DPR8S/R8QfZOYzwE8Db6RxNPw4zZN9s/lF4JqIeAb4DRrj0ABk5rPAtcDnq6GK81vvmJlfozGufgXwNeDdwMWZ+dV5PIZZZeYu4L8AH6Fx0vMhGidHqR7Tb9M4+fo48GLgyuq2bcA/0TgJeRvwv4+zmz+rvn4tIu6m8ff832j07+s0xvvfMct9VbDIdCIFSSqJR9ySVBiDW5IKY3BLUmEMbkkqTEc+DGf16tU5MDDQiU237dChQ6xcubLWGrqFvWiyF032oqkberF79+6vZuaL2lm3I8E9MDDArl27OrHpto2PjzM0NFRrDd3CXjTZiyZ70dQNvYiItt957FCJJBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFaat4I6IX42I+yPiCxHxyYh4XqcLk9QBv/u7MDb2rdeNjTWuVzHmDO6IOBP4FWBjZr4c6APe2OnCJHXAeefBJZc0w3tsrLF83nn11qV5aXfOySXA6RHxHLAC+HLnSpLUMcPDcNNNcMklDFx0EfzVXzWWh4frrkzzEJk590oRlwPXAt8AbsvMS2dYZwuwBaC/v3/D9u3bF7jU+ZmYmGDVqlW11tAt7EWTvWgY2LqVgW3b2LdpE/s2b667nNp1w+/F8PDw7szc2NbKmXncC/BC4HbgRcBS4P8AbznefTZs2JB1Gxsbq7uErmEvmuxFZt5+e+bq1bl306bM1asby4tcN/xeALtyjjyevrRzcvJCYG9mHszM54BPA68+gX8okuo2PaZ9002NI+1q2OTbTliqq7UT3PuB8yNiRUQEMALs6WxZkjrirru+dUx7esz7rrvqrUvzMufJycy8MyJuBu4GDgP3AB/tdGGSOuDd7/7264aHPTlZmLZeVZKZ7wPe1+FaJElt8J2TklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG41TnOKN5kL5q6pRfdUscJMLjVOc4o3mQvmrqlF91Sxwlod5Z3af5aZhTnHe+A665bvDOK24umbulFwTPee8Stzhoebvxxvv/9ja8F/FF0jL1o6pZeVHUMbNtW1M/E4FZnjY01jqiuuqrxdTFPSmsvmrqlF1Ud+zZtKupnYnCrc1pmFOeaaxb3jOL2oqlbelHwjPcGtzrHGcWb7EVTt/SiW+o4AZ6cVOc4o3iTvWjqll50Sx0nwCNuSSqMwS1JhTG4Va6Z3vk2m0LeESe1w+BWuY5959tsCnpHnNQOg1vlan0H3mzh3frSswJOOkntMLhVtuOFt6GtHmVwq3wzhbehrR7m67jVG7rlg4ukU8AjbvWObvngIqnDDG71jm754CKpwwxu9YZu+eAi6RQwuFW+mU5EtvNSQalQBrfKdrxXjxje6lFtBXdEvCAibo6IByNiT0QMdrowaU7tvOTP8FYPaveI+8PAX2fmOcArgD2dK0lq07Gfp3y89a688ls/Z9nPLlHB5nwdd0ScAfw48FaAzJwCpjpbltSGmT5PeSbTn2ly002N5dYjdalA7bwBZx1wELghIl4B7AYuz8xDHa1MWigFz+YtzSQy8/grRGwE/gG4IDPvjIgPA09n5lXHrLcF2ALQ39+/Yfv27R0quT0TExOsWrWq1hq6hb1oGNi6lYFt29i3aVNjjsFFzt+Lpm7oxfDw8O7M3NjWypl53AvwEmBfy/KPAX95vPts2LAh6zY2NlZ3CV3DXmTm7bdnrl6dezdtyly9urG8yPl70dQNvQB25Rx5PH2Z8+RkZj4OHIiIs6urRoAHTuAfilSPgmfzlmbS7qtKfhkYjYh/Bn4E+K3OlSQtsIJn85Zm0tanA2bmvUB7Yy9Styl4Nm9pJr5zUpIKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWTqGdB3bywf/7QXYe2Fl3KbWzFyeurc/jlnTydh7YyciNI0wdmWJZ3zJ2XLaDwTWDdZdVC3txcjzilk6R8X3jTB2Z4kgeYerIFOP7xusuqTb24uQY3NIpMjQwxLK+ZfRFH8v6ljE0MFR3SbWxFyfHoRLpFBlcM8iOy3Ywvm+coYGhRT00YC9OjsEtnUKDawYNqYq9OHEOlUhSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3NE+j940y8KEBTvvN0xj40ACj943WXZIWGT8dUJqH0ftG2fIXW3j2uWcBePiph9nyF1sAuPSHLq2zNC0iHnFL8/DeHe/9/6E97dnnnuW9O95bU0VajNoO7ojoi4h7IuKWThYkdbP9T+2f1/VSJ8zniPtyYE+nCulFzmLde9aesXZe10ud0FZwR8RZwOuA6ztbTu+YnsX6qrGrGLlxxPDuEdeOXMuKpSu+5boVS1dw7ci1NVWkxSgyc+6VIm4GPgg8H/jvmXnxDOtsAbYA9Pf3b9i+ffsClzo/ExMTrFq1qrb9j+4fZeverRzlKKdxGpvXbebStfWcvKq7F91kIXrxt1/5W67fez1PTD7Bi5e/mLevezsX9l+4QBWeOv5eNHVDL4aHh3dn5sa2Vs7M416Ai4E/qr4fAm6Z6z4bNmzIuo2NjdW6/zv235Gnf+D07PvNvjz9A6fnHfvvqK2WunvRTexFk71o6oZeALtyjmydvrTzcsALgJ+NiNcCzwO+IyI+kZlvOYF/KouGs1hL6pQ5gzszrwSuBIiIIRpDJYZ2G5zFWlIn+DpuSSrMvN45mZnjwHhHKpEktcUjbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbHeds99LCmtfncUvzNT3b/dSRKZb1LWPHZTucFUg6SR5xq6PG940zdWSKI3mEqSNTjO8br7skqXgGtzpqaGCIZX3L6Is+lvUtY2hgqO6SpOI5VKKOcrZ7aeEZ3Oo4Z7uXFpZDJZJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMHMGd0SsiYixiHggIu6PiMtPRWGSpJm183nch4ErMvPuiHg+sDsi/iYzH+hwbZKkGcx5xJ2Zj2Xm3dX3zwB7gDM7XZgWxs4DOxndP+oM61IPmdcYd0QMAOcCd3aiGC2s6RnWt+7dysiNI4a31CPanrosIlYBnwLelZlPz3D7FmALQH9/P+Pj4wtV4wmZmJiovYa6je4fZfLwJEc5yuThSbaObWVy7WTdZdXK34sme9FUWi8iM+deKWIpcAvw2cz8/bnW37hxY+7atWsByjtx4+PjDA0N1VpD3aaPuCcPT7J8yXJ2XLZj0c/96O9Fk71o6oZeRMTuzNzYzrrtvKokgI8Be9oJbXWP6RnWN6/bbGhLPaSdoZILgE3AfRFxb3XdezLz1s6VpYUyuGaQybWThrbUQ+YM7sz8HBCnoBZJUht856QkFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklSYtoI7Il4TEV+MiIci4tc7XZQkaXZzBndE9AF/CFwErAfeFBHrO13Yydh5YCej+0fZeWBn3aVI0oJr54j7lcBDmfmlzJwCtgOv72xZJ27ngZ2M3DjC1r1bGblxxPCW1HOWtLHOmcCBluVHgFcdu1JEbAG2APT39zM+Pr4Q9c3b6P5RJg9PcpSjTB6eZOvYVibXTtZSS7eYmJio7efRbexFk71oKq0X7QR3WzLzo8BHATZu3JhDQ0MLtel5WX5gOaMHGuG9fMlyNg9vZnDNYC21dIvx8XHq+nl0G3vRZC+aSutFO0MljwJrWpbPqq7rSoNrBtlx2Q42r9vMjst2LPrQltR72jnivgv4vohYRyOw3wi8uaNVnaTBNYNMrp00tCX1pDmDOzMPR8Q7gc8CfcDWzLy/45VJkmbU1hh3Zt4K3NrhWiRJbfCdk5JUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwkRmLvxGIw4CDy/4hudnNfDVmmvoFvaiyV402YumbujFyzLzRe2s2JHg7gYRsSszN9ZdRzewF032osleNJXWC4dKJKkwBrckFaaXg/ujdRfQRexFk71oshdNRfWiZ8e4JalX9fIRtyT1JINbkgrTk8EdEa+JiC9GxEMR8et111OXiFgTEWMR8UBE3B8Rl9ddU50ioi8i7omIW+qupU4R8YKIuDkiHoyIPRExWHdNdYmIX63+Nr4QEZ+MiOfVXVM7ei64I6IP+EPgImA98KaIWF9vVbU5DFyRmeuB84FfWsS9ALgc2FN3EV3gw8BfZ+Y5wCtYpD2JiDOBXwE2ZubLgT7gjfVW1Z6eC27glcBDmfmlzJwCtgOvr7mmWmTmY5l5d/X9MzT+QM+st6p6RMRZwOuA6+uupU4RcQbw48DHADJzKjOfrLeqWi0BTo+IJcAK4Ms119OWXgzuM4EDLcuPsEjDqlVEDADnAnfWW0ltPgS8GzhadyE1WwccBG6oho2uj4iVdRdVh8x8FPg9YD/wGPBUZt5Wb1Xt6cXg1jEiYhXwKeBdmfl03fWcahFxMfBEZu6uu5YusAT4UeC6zDwXOAQsyvNAEfFCGs/G1wEvBVZGxFvqrao9vRjcjwJrWpbPqq5blCJiKY3QHs3MT9ddT00uAH42IvbRGDr7yYj4RL0l1eYR4JHMnH7mdTONIF+MLgT2ZubBzHwO+DTw6ppraksvBvddwPdFxLqIWEbjZMNnaq6pFhERNMYy92Tm79ddT10y88rMPCszB2j8PtyemUUcWS20zHwcOBARZ1dXjQAP1FhSnfYD50fEiupvZYRCTtQuqbuAhZaZhyPincBnaZwl3pqZ99dcVl0uADYB90XEvdV178nMW2usSfX7ZWC0OrD5EvC2muupRWbeGRE3A3fTeAXWPRTy1nff8i5JhenFoRJJ6mkGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSrM/wNZ1XFVcoOSCQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "C1 = [1, 2, 4, 8, 9, 11]\n", "C2 = list(set(range(12)) - set(C1))\n", "X0C1, X1C1 = X0[C1], X1[C1]\n", "X0C2, X1C2 = X0[C2], X1[C2]\n", "plt.figure()\n", "plt.title('2nd iteration results')\n", "plt.axis([-1, 9, -1, 9])\n", "plt.grid(True)\n", "plt.plot(X0C1, X1C1, 'rx')\n", "plt.plot(X0C2, X1C2, 'g.')\n", "plt.plot(3.8,6.4,'rx',ms=12.0)\n", "plt.plot(4.57,4.14,'g.',ms=12.0);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we move the center of mass to the new position, recalculate the distance between each sample and the new center of mass, and reclassify the samples according to the distance. The results are shown in the table below:\n", "![data_2](images/data_2.png)\n", "\n", "The result of drawing are shown as follows:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEICAYAAAB/Dx7IAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAFKhJREFUeJzt3X+Q3HV9x/Hn2wQiIQjaYCwk4VColuqoJagn1d41dgoVdabTMlAM1bTNFKvir+IPpFpptONYC1aLjXKM4FXKANNRC2oNd1U7EUnAiiFqGRJyICi08uNAL4S8+8d+jz3DXW4vt5vvfu6ej5mby+5+9/t9f9/Ze91nP7u3n8hMJEnleErdBUiSZsbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMGttoiIjIjjprjt+oj4kwNd0141jEbEs+usYTYi4oMR8fm661B3MLhFRHw+Iu6JiIci4kcR8Wft3H9mnpqZn6uO9YaI+FY797+3iBje+xwyc0lm3tHJ4x4oEdFT/aJcWHctqofBLYCPAD2Z+TTgtcDfRsSJk21Yd1jUffx96ebaNLcY3CIzt2bm2PjF6us5ABHRFxF3RcS7I+Je4LLq+r+qRuk/joi1+9r/+Ag4In4d+DTQW01dPFDdvigiPhYROyPiJxHx6Yg4ZKrjR8TTI+LLEXFfRPys+vfyavv1wCuAT1bH+GR1/RNTORFxeERcXt3/zoh4f0Q8pbrtDRHxraqen0XE9og4dR/ntqOq7XvAIxGxMCKOiohrqv1vj4i3Ttj+JRGxuXp285OI+PjE85xk36+a5LDfqL4/UJ1jb0QcFxH/GREPRsT9EfGv+/o/UdkMbgEQEf8UEY8CPwDuAa6bcPOzgGcAxwDrIuIU4F3A7wLHA5OFy5Nk5jbgL4BN1dTFEdVNfwf8GvAi4DjgaOCvpzo+jcftZdXllcDPgU9Wxzgf+Cbw5uoYb56klH8EDgeeDfw2cDbwxgm3vxT4IbAU+ChwaUTEPk7tTODVwBHAHuBLwH9X57EaeFtE/F617cXAxdWzm+cAV+1jv1N5ZfX9iOocNwEXAl8Dng4sr85Rc5TBLQAy803AYTRGq9cCYxNu3gN8IDPHMvPnwOnAZZn5/cx8BPjg/h63CsR1wNsz8/8y82Hgw8AZUx0/M/83M6/JzEer7dfTCOBWjreg2vd7M/PhzNwB/D2wZsJmd2bmZzLzceBzwK8Cy/ax209k5kjVm5OAIzPzQ5m5q5pX/8yE83kMOC4ilmbmaGZ+u5W6W/AYjV9kR2XmLzKzo68jqF4Gt56QmY9XP/DLgXMm3HRfZv5iwuWjgJEJl++cxWGPBBYDWyLigWr65CvV9ZMePyIWR8Q/V9McD9GYOjiiCuXpLAUO2qvmO2mMjsfdO/6PzHy0+ueSfexzYi+OAY4aP5fqfN5HM/j/lMazix9ExE0RcVoLNbfiPCCA70TE1ummr1Q2X0zRZBZSzXFX9v4IyXuAFRMur5zBvvfe1/00pjp+IzPvbvE+7wSeC7w0M++NiBcBt9AIrsm23/t446PT26rrVgJTHbsVE483AmzPzOMn3TDzf4Azqzn1PwCujohfAR6h8QsMeOKZwZGT7YNJzi8z7wX+vLrvbwFfj4hvZObt+3E+6nKOuOe5iHhmRJwREUsiYkE1F3smsHEfd7sKeENEnBARi4EPzOCQPwGWR8TBAJm5h8ZUwj9ExDOrmo6eMCc8mcNohP0DEfGMSY7/Exrz109STX9cBayPiMMi4hjgHUC73iP9HeDh6gXLQ6qePj8iTgKIiNdHxJHVeT9Q3WcP8CPgqRHx6og4CHg/sGiKY9xX3eeJc4yIPxp/gRb4GY1w39Omc1KXMbiVNKZF7qLxA/8x4G2Z+cUp75B5PXARcANwe/W9VTcAW4F7I+L+6rp3V/v5djX18XUaI+qpXAQcQmP0/G0aUysTXQz8YfWukE9Mcv+30Bjh3gF8C/gXYGAG5zCl6hfDaTReaN1e1fhZGi+GApwCbI2I0arOM6p5+weBN1Xb3l3VdxeTqKZv1gP/VU3HvIzG3PqN1X6/CJw7V963ricLF1KQpLI44pakwhjcklQYg1uSCmNwS1JhOvI+7qVLl2ZPT08ndt2yRx55hEMPPbTWGrqFvWiyF032oqkberFly5b7M3Oq9+7/ko4Ed09PD5s3b+7Erls2PDxMX19frTV0C3vRZC+a7EVTN/QiIlr+C2SnSiSpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwLQV3RLw9IrZGxPcj4gsR8dROFyapAz76URga+uXrhoYa16sY0wZ3RBwNvBVYlZnPBxYAZ3S6MEkdcNJJcPrpzfAeGmpcPumkeuvSjLS65uRC4JCIeAxYDPy4cyVJ6pj+frjqKjj9dHpOPRWuv75xub+/7so0A5GZ028UcS6wHvg58LXMPGuSbdYB6wCWLVt24pVXXtnmUmdmdHSUJUuW1FpDt7AXTfaioWdggJ4rrmDHmjXsWLu27nJq1w2Pi/7+/i2ZuaqljTNzn1/A04EbgCOBg4B/A16/r/uceOKJWbehoaG6S+ga9qLJXmTmDTdkLl2a29esyVy6tHF5nuuGxwWwOafJ4/GvVl6cfBWwPTPvy8zHgGuBl+/HLxRJdRuf077qqsZIu5o2edILlupqrQT3TuBlEbE4IgJYDWzrbFmSOuKmm355Tnt8zvumm+qtSzMy7YuTmXljRFwN3AzsBm4BNnS6MEkdcN55T76uv98XJwvT0rtKMvMDwAc6XIskqQX+5aQkFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcUjebbKmxqbgE2bxhcEvdbO+lxqbiEmTzisEtdbMJS41NGd4TPmPbT/mbHwxudY4rijfNphf7Cu8SQ7tbHhfdUsd+MLjVOa4o3jTbXkwW3iWGNnTP46Jb6tgfra5xNpMv15zsLrX2olrfMC+4oCvWNyy+F23sZ/G9aGMd3bD+Jm1ec1Laf/39cM45cOGFje8ljQzbrR29mCv97JbzqOroueKKovppcKuzhobgkkvgggsa3+fzorTt6MVc6We3nEdVx441a8rqZ6tD85l8OVXSXWrrxfjT4fGnn3tfrkHRvWhzP4vuRZvrGBoaqv3xiVMl6gquKN40215M9kJkK28V7Ebd8rjoljr2R6sJP5MvR9zdxV40FdmL6UaC+zlSLLIXHdINvcARtzRHtPKWv1JH3tpvBrfUzfZ+Oj+Vkp7ma9YW1l2ApH0477zWt+3vL+btbJodR9ySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCtNScEfEERFxdUT8ICK2RURvpwuTJE2u1RH3xcBXMvN5wAuBbZ0rSWqzglfzliYzbXBHxOHAK4FLATJzV2Y+0OnCpLYpeTVvaRKtfDrgscB9wGUR8UJgC3BuZj7S0cqkdpnwedU9p54K11/f2kelSl0qGgsv7GODiFXAt4GTM/PGiLgYeCgzL9hru3XAOoBly5adeOWVV3ao5NaMjo6yZMmSWmvoFvaioWdggJ4rrmDHmjXsWLu27nJq5+OiqRt60d/fvyUzV7W08XRL5ADPAnZMuPwK4N/3dR+XLusu9iKfWN5r+5o1tS9Y3C18XDR1Qy9o59JlmXkvMBIRz62uWg3cth+/UKR6TFj+a8fatS7zpeK1+q6StwCDEfE94EXAhztXktRmJa/mLU2ipaXLMvO7QGtzL1K3mWz5L5f5UsH8y0lJKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNzSgeCCxU32YtYMbulAcMHiJnsxay19HrekWZqwYDHnnAOXXDJ/Fyy2F7PmiFs6UPr7G0F14YWN7/M5qOzFrBjc0oEyNNQYXV5wQeP7fF7z0l7MisEtHQgTFizmQx+a3wsW24tZM7ilA8EFi5vsxaz54qR0ILhgcZO9mDVH3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMC0Hd0QsiIhbIuLLnSxIkrRvMxlxnwts61Qhc9GmkU185JsfYdPIprpLkTSHtLSQQkQsB14NrAfe0dGK5ohNI5tYfflqdj2+i4MXHMzGszfSu6K37rIkzQGtroBzEXAecNhUG0TEOmAdwLJlyxgeHp51cbMxOjpaaw2DOwcZ2z3GHvYwtnuMgaEBxlaO1VJL3b3oJvaiyV40ldaLaYM7Ik4DfpqZWyKib6rtMnMDsAFg1apV2dc35aYHxPDwMHXWsGhkEYMjg0+MuNf2r61txF13L7qJvWiyF02l9aKVEffJwGsj4veBpwJPi4jPZ+brO1ta2XpX9LLx7I0M7ximr6fPaRJJbTNtcGfme4H3AlQj7ncZ2q3pXdFrYEtqO9/HLUmFafXFSQAycxgY7kglkqSWOOKWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuNVxrnYvtdeMPo9bmilXu5fazxG3Omp4xzC7Ht/F4/k4ux7fxfCO4bpLkopncM9Tg7cO0nNRD0/5m6fQc1EPg7cOduQ4fT19HLzgYBbEAg5ecDB9PX0dOY40nzhVMg8N3jrIui+t49HHHgXgzgfvZN2X1gFw1gvOauuxXO1eaj+Dex46f+P5T4T2uEcfe5TzN57f9uAGV7uX2s2pknlo54M7Z3S9pO5icM9DKw9fOaPrJXUXg3seWr96PYsPWvxL1y0+aDHrV6+vqSJJM2Fwz0NnveAsNrxmA8ccfgxBcMzhx7DhNRs6Mr8tqf18cXKeOusFZxnUUqEccUtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmGmDe6IWBERQxFxW0RsjYhzD0RhkqTJtfIn77uBd2bmzRFxGLAlIv4jM2/rcG2SpElMO+LOzHsy8+bq3w8D24CjO12Y2mPTyCYGdw66wro0h8xojjsieoAXAzd2ohi11/gK6wPbB1h9+WrDW5ojWv50wIhYAlwDvC0zH5rk9nXAOoBly5YxPDzcrhr3y+joaO011G1w5yBju8fYwx7Gdo8xMDTA2MqxusuqlY+LJnvRVFovIjOn3yjiIODLwFcz8+PTbb9q1arcvHlzG8rbf8PDw/T19dVaQ93GR9xju8dYtHARG8/eOO/XfvRx0WQvmrqhFxGxJTNXtbJtK+8qCeBSYFsroa3uMb7C+tpj1xra0hzSylTJycAa4NaI+G513fsy87rOlaV26V3Ry9jKMUNbmkOmDe7M/BYQB6AWSVIL/MtJSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMAa3JBXG4JakwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFcbglqTCGNySVBiDW5IKY3BLUmEMbkkqjMEtSYUxuCWpMC0Fd0ScEhE/jIjbI+I9nS5KkjS1aYM7IhYAnwJOBU4AzoyIEzpd2GxsGtnE4M5BNo1sqrsUSWq7VkbcLwFuz8w7MnMXcCXwus6Wtf82jWxi9eWrGdg+wOrLVxvekuachS1sczQwMuHyXcBL994oItYB6wCWLVvG8PBwO+qbscGdg4ztHmMPexjbPcbA0ABjK8dqqaVbjI6O1vb/0W3sRZO9aCqtF60Ed0sycwOwAWDVqlXZ19fXrl3PyKKRRQyONMJ70cJFrO1fS++K3lpq6RbDw8PU9f/RbexFk71oKq0XrUyV3A2smHB5eXVdV+pd0cvGszey9ti1bDx747wPbUlzTysj7puA4yPiWBqBfQbwxx2tapZ6V/QytnLM0JY0J00b3Jm5OyLeDHwVWAAMZObWjlcmSZpUS3PcmXkdcF2Ha5EktcC/nJSkwhjcklQYg1uSCmNwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpMIY3JJUGINbkgpjcEtSYQxuSSqMwS1JhTG4JakwBrckFSYys/07jbgPuLPtO56ZpcD9NdfQLexFk71oshdN3dCLYzLzyFY27Ehwd4OI2JyZq+quoxvYiyZ70WQvmkrrhVMlklQYg1uSCjOXg3tD3QV0EXvRZC+a7EVTUb2Ys3PckjRXzeURtyTNSQa3JBVmTgZ3RJwSET+MiNsj4j1111OXiFgREUMRcVtEbI2Ic+uuqU4RsSAibomIL9ddS50i4oiIuDoifhAR2yKit+6a6hIRb69+Nr4fEV+IiKfWXVMr5lxwR8QC4FPAqcAJwJkRcUK9VdVmN/DOzDwBeBnwl/O4FwDnAtvqLqILXAx8JTOfB7yQedqTiDgaeCuwKjOfDywAzqi3qtbMueAGXgLcnpl3ZOYu4ErgdTXXVIvMvCczb67+/TCNH9Cj662qHhGxHHg18Nm6a6lTRBwOvBK4FCAzd2XmA/VWVauFwCERsRBYDPy45npaMheD+2hgZMLlu5inYTVRRPQALwZurLeS2lwEnAfsqbuQmh0L3AdcVk0bfTYiDq27qDpk5t3Ax4CdwD3Ag5n5tXqras1cDG7tJSKWANcAb8vMh+qu50CLiNOAn2bmlrpr6QILgd8ELsnMFwOPAPPydaCIeDqNZ+PHAkcBh0bE6+utqjVzMbjvBlZMuLy8um5eioiDaIT2YGZeW3c9NTkZeG1E7KAxdfY7EfH5ekuqzV3AXZk5/szrahpBPh+9Ctiemfdl5mPAtcDLa66pJXMxuG8Cjo+IYyPiYBovNnyx5ppqERFBYy5zW2Z+vO566pKZ783M5ZnZQ+PxcENmFjGyarfMvBcYiYjnVletBm6rsaQ67QReFhGLq5+V1RTyQu3Cugtot8zcHRFvBr5K41XigczcWnNZdTkZWAPcGhHfra57X2ZeV2NNqt9bgMFqYHMH8Maa66lFZt4YEVcDN9N4B9YtFPKn7/7JuyQVZi5OlUjSnGZwS1JhDG5JKozBLUmFMbglqTAGtyQVxuCWpML8P42o419LPfFMAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "C1 = [0, 1, 2, 4, 8, 9, 10, 11]\n", "C2 = list(set(range(12)) - set(C1))\n", "X0C1, X1C1 = X0[C1], X1[C1]\n", "X0C2, X1C2 = X0[C2], X1[C2]\n", "plt.figure()\n", "plt.title('3rd iteration results')\n", "plt.axis([-1, 9, -1, 9])\n", "plt.grid(True)\n", "plt.plot(X0C1, X1C1, 'rx')\n", "plt.plot(X0C2, X1C2, 'g.')\n", "plt.plot(5.5,7.0,'rx',ms=12.0)\n", "plt.plot(2.2,2.8,'g.',ms=12.0);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The centriod of class will remain the same when repeat the method, K-Means will stop cluster process when the condition are satisfied. Usually, the condition is that the difference value between two cost value of iteration are reaching the set value, or the change of the center of gravity position of the two iterations before and after reaches the limit value. If these stop conditions are small enough, k-means will find the optimal solution. But this is not necessarily the global optimal solution.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Program" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal-lengthsepal-widthpetal-lengthpetal-widthclass
05.13.51.40.2Iris-setosa
14.93.01.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
45.03.61.40.2Iris-setosa
\n", "
" ], "text/plain": [ " sepal-length sepal-width petal-length petal-width class\n", "0 5.1 3.5 1.4 0.2 Iris-setosa\n", "1 4.9 3.0 1.4 0.2 Iris-setosa\n", "2 4.7 3.2 1.3 0.2 Iris-setosa\n", "3 4.6 3.1 1.5 0.2 Iris-setosa\n", "4 5.0 3.6 1.4 0.2 Iris-setosa" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This line configures matplotlib to show figures embedded in the notebook, \n", "# instead of opening a new window for each figure. More about that later. \n", "# If you are using an old version of IPython, try using '%pylab inline' instead.\n", "%matplotlib inline\n", "\n", "# import librarys\n", "from numpy import *\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import random\n", "\n", "# Load dataset\n", "names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']\n", "dataset = pd.read_csv(\"iris.csv\", header=0, index_col=0)\n", "dataset.head()\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "lines_to_next_cell": 2 }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\lenovo\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:3: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " This is separate from the ipykernel package so we can avoid doing imports until\n", "C:\\Users\\lenovo\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:4: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " after removing the cwd from sys.path.\n", "C:\\Users\\lenovo\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:5: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " \"\"\"\n" ] } ], "source": [ "#Fixme-解决赋值的问题,参考https://www.jb51.net/article/138045.htm\n", "#Coding the class and assign value 0, 1, 2 to each class.\n", "dataset['class'][dataset['class']=='Iris-setosa']=0\n", "dataset['class'][dataset['class']=='Iris-versicolor']=1\n", "dataset['class'][dataset['class']=='Iris-virginica']=2" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "def originalDatashow(dataSet):\n", " #Draw original sample point.\n", " num,dim=shape(dataSet)\n", " marksamples=['ob'] #Sample graphic marking\n", " for i in range(num):\n", " plt.plot(datamat.iat[i,0],datamat.iat[i,1],marksamples[0],markersize=5)\n", " plt.title('original dataset')\n", " plt.xlabel('sepal length')\n", " plt.ylabel('sepal width') \n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "lines_to_end_of_cell_marker": 2, "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Get sample data\n", "datamat = dataset.loc[:, ['sepal-length', 'sepal-width']]\n", "#True label\n", "labels = dataset.loc[:, ['class']]\n", "#Show original data\n", "originalDatashow(datamat)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "def randChosenCent(dataSet,k):\n", " \"\"\"Initialize cluster center:By randomly generating a value over the interval as a new central point \"\"\"\n", "\n", " # Sample numb\n", " m=shape(dataSet)[0]\n", " # initialize list\n", " centroidsIndex=[]\n", " \n", " #Generate a list similar to the sample index\n", " dataIndex=list(range(m))\n", " if False:\n", " for i in range(k):\n", " #Generate random number\n", " randIndex=random.randint(0,len(dataIndex))\n", " #Put the sample index that generate randomly into centroidsIndex\n", " centroidsIndex.append(dataIndex[randIndex])\n", " #Delete the sample that has been choosen\n", " del dataIndex[randIndex]\n", " else:\n", " random.shuffle(dataIndex)\n", " centroidsIndex = dataIndex[:k]\n", " \n", " #Get the sample by index\n", " centroids = dataSet.iloc[centroidsIndex]\n", " return mat(centroids)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "\n", "def distEclud(vecA, vecB):\n", " \"\"\"Calculate the Euclidean distance between two vector\"\"\"\n", " return sqrt(sum(power(vecA - vecB, 2))) #la.norm(vecA-vecB)\n", "\n", "\n", "def kMeans(dataSet, k):\n", " # The total number of sample\n", " m = shape(dataSet)[0]\n", " # Allocate sample to the nearest culster: save as [cluster number, square of distance](m_row x 2_column)\n", " clusterAssment = mat(zeros((m, 2)))\n", "\n", " # step1: Initialize cluster center by the sample point that generate randomly\n", " centroids = randChosenCent(dataSet, k)\n", " print('Original centers=', centroids)\n", "\n", " # Flag bit,if the result of sample classification before and after iteration has changed, the value is True\n", " clusterChanged = True\n", " # View the number of iterations\n", " iterTime = 0\n", " \n", " # All sample assignment results are no longer changed and the iteration terminates\n", " while clusterChanged:\n", " clusterChanged = False\n", " \n", " # step2: Allocate to the nearest cluster corresponding to the nearest cluster center\n", " for i in range(m):\n", " # Initially define distance as infinite\n", " minDist = inf;\n", " # Initialize index value\n", " minIndex = -1\n", " # Calculate the distance of each sample and k centriods\n", " for j in range(k):\n", " # Calculate the distance between the ith smaple and jth centriods\n", " distJI = distEclud(centroids[j, :], dataSet.values[i, :])\n", " # Judeg if the distance if the minimum\n", " if distJI < minDist:\n", " # Update to get the minimum distance\n", " minDist = distJI\n", " # Get corresponding cluster numbers\n", " minIndex = j\n", " # If the result of sample classification is not the same,mark clusterChanged to True\n", " if clusterAssment[i, 0] != minIndex:\n", " clusterChanged = True\n", " clusterAssment[i, :] = minIndex, minDist ** 2 # Allocate smaple to nearest cluster\n", " \n", " iterTime += 1\n", " sse = sum(clusterAssment[:, 1])\n", " print('the SSE of %d' % iterTime + 'th iteration is %f' % sse)\n", " \n", " # step3:Update cluster center\n", " for cent in range(k): # When finished sample classification ,recalculate cluster center\n", " # Get all sample point of this cluster,nonzero[0] represent the column of A == cent \n", " #Without [0], column will also be shown\n", " ptsInClust = dataSet.iloc[nonzero(clusterAssment[:, 0].A == cent)[0]]\n", " # Update cluster center: calculate average value according to column direction, axis=0.\n", " centroids[cent, :] = mean(ptsInClust, axis=0)\n", " return centroids, clusterAssment\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "最初的中心= [[4.8 3.4]\n", " [5.5 2.4]\n", " [5.3 3.7]]\n", "the SSE of 1th iteration is 136.220000\n", "the SSE of 2th iteration is 58.811902\n", "the SSE of 3th iteration is 53.100129\n", "the SSE of 4th iteration is 49.715722\n", "the SSE of 5th iteration is 47.874761\n", "the SSE of 6th iteration is 46.133064\n", "the SSE of 7th iteration is 44.593439\n", "the SSE of 8th iteration is 44.384855\n", "the SSE of 9th iteration is 43.591498\n", "the SSE of 10th iteration is 41.904928\n", "the SSE of 11th iteration is 39.066514\n", "the SSE of 12th iteration is 38.316500\n", "the SSE of 13th iteration is 37.912536\n", "the SSE of 14th iteration is 37.423306\n", "the SSE of 15th iteration is 37.136261\n", "the SSE of 16th iteration is 37.123702\n" ] } ], "source": [ "# Perform k-means clustering\n", "k = 3 # Cluster numbers designed by customer\n", "mycentroids, clusterAssment = kMeans(datamat, k)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def datashow(dataSet, k, centroids, clusterAssment): # Show cluster result in two dimensional space\n", " from matplotlib import pyplot as plt\n", " num, dim = shape(dataSet) # sample numbers:num ,dimension: dim\n", "\n", " if dim != 2:\n", " print('sorry,the dimension of your dataset is not 2!')\n", " return 1\n", " marksamples = ['or', 'ob', 'og', 'ok', '^r', '^b', ' len(marksamples):\n", " print('sorry,your k is too large,please add length of the marksample!')\n", " return 1\n", " # Draw all sample\n", " for i in range(num):\n", " markindex = int(clusterAssment[i, 0]) # Change value to int form, cluster number\n", " # The characteristic dimension corresponds to x,y; Sample graphic marking and size\n", " plt.plot(dataSet.iat[i, 0], dataSet.iat[i, 1], marksamples[markindex], markersize=6)\n", "\n", " # Draw center point\n", " markcentroids = ['o', '*', '^'] # Cluster center graphic marking\n", " label = ['0', '1', '2']\n", " c = ['yellow', 'pink', 'red']\n", " for i in range(k):\n", " plt.plot(centroids[i, 0], centroids[i, 1], markcentroids[i], markersize=15, label=label[i], c=c[i])\n", " plt.legend(loc='upper left')\n", " plt.xlabel('sepal length')\n", " plt.ylabel('sepal width')\n", "\n", " plt.title('k-means cluster result') # Title\n", " plt.show()\n", " \n", " \n", "# Draw real graphic\n", "def trgartshow(dataSet, k, labels):\n", " from matplotlib import pyplot as plt\n", "\n", " num, dim = shape(dataSet)\n", " label = ['0', '1', '2']\n", " marksamples = ['ob', 'or', 'og', 'ok', '^r', '^b', '" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# The drawing shows\n", "datashow(datamat, k, mycentroids, clusterAssment)\n", "trgartshow(datamat, 3, labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to use sklearn to do the classifiction\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP4AAAECCAYAAADesWqHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAC9pJREFUeJzt3V+IXPUZxvHn6Zr4L5HEakUSMV0pARFq/hAqAWmTKLFKelNDAgqVluSiFUMLGntTvPNK7EURQtQKxoiJBoq01gQVEVptNsYaTSwaIm6irpJIjIUE49uLOSkxpO7Z7f5+OzPv9wNLZndn5/ntbp45Z2bPnNcRIQC5fGuyFwCgPooPJETxgYQoPpAQxQcSovhAQl1RfNvLbb9j+13b6wtnPWJ7xPaekjmn5V1h+0Xbe22/Zfuuwnnn2X7N9htN3n0l85rMAduv2362dFaTd8D2m7Z3295ZOGuG7a229zW/w+sKZs1tvqdTb0dtrysSFhGT+iZpQNJ7kgYlTZX0hqSrC+ZdL2m+pD2Vvr/LJc1vLk+X9K/C358lTWsuT5H0qqQfFP4efy3pCUnPVvqZHpB0SaWsxyT9ork8VdKMSrkDkj6SdGWJ2++GLf4iSe9GxP6IOCHpSUk/KRUWES9LOlzq9s+S92FE7Goufy5pr6RZBfMiIo41705p3oodpWV7tqSbJW0slTFZbF+kzobiYUmKiBMR8Vml+KWS3ouI90vceDcUf5akD057f1gFizGZbM+RNE+drXDJnAHbuyWNSNoeESXzHpR0t6SvCmacKSQ9b3vI9pqCOYOSPpH0aPNQZqPtCwvmnW6VpM2lbrwbiu+zfKzvjiO2PU3S05LWRcTRklkRcTIirpU0W9Ii29eUyLF9i6SRiBgqcfvfYHFEzJd0k6Rf2r6+UM456jwsfCgi5kn6QlLR56AkyfZUSSskbSmV0Q3FH5Z0xWnvz5Z0aJLWUoTtKeqUflNEPFMrt9ktfUnS8kIRiyWtsH1AnYdoS2w/XijrvyLiUPPviKRt6jxcLGFY0vBpe0xb1bkjKO0mSbsi4uNSAd1Q/H9I+p7t7zb3dKsk/WmS1zRhbFudx4h7I+KBCnmX2p7RXD5f0jJJ+0pkRcS9ETE7Iuao83t7ISJuK5F1iu0LbU8/dVnSjZKK/IUmIj6S9IHtuc2Hlkp6u0TWGVar4G6+1NmVmVQR8aXtX0n6qzrPZD4SEW+VyrO9WdIPJV1ie1jS7yLi4VJ56mwVb5f0ZvO4W5J+GxF/LpR3uaTHbA+oc8f+VERU+TNbJZdJ2ta5P9U5kp6IiOcK5t0paVOzUdov6Y6CWbJ9gaQbJK0tmtP86QBAIt2wqw+gMooPJETxgYQoPpAQxQcS6qriFz78ctKyyCOv2/K6qviSav5wq/4iySOvm/K6rfgAKihyAI/tvj4qaObMmWP+muPHj+vcc88dV96sWWN/seLhw4d18cUXjyvv6NGxv4bo2LFjmjZt2rjyDh48OOaviQg1R++N2cmTJ8f1db0iIkb9wUz6Ibu9aNmyZVXz7r///qp5O3bsqJq3fn3xF7x9zZEjR6rmdSN29YGEKD6QEMUHEqL4QEIUH0iI4gMJUXwgIYoPJNSq+DVHXAEob9TiNydt/IM6p/y9WtJq21eXXhiActps8auOuAJQXpvipxlxBWTR5kU6rUZcNScOqP2aZQDj0Kb4rUZcRcQGSRuk/n9ZLtDr2uzq9/WIKyCjUbf4tUdcASiv1Yk4mjlvpWa9AaiMI/eAhCg+kBDFBxKi+EBCFB9IiOIDCVF8ICGKDyTEJJ1xqD3ZZnBwsGreeEaE/T8OHz5cNW/lypVV87Zs2VI1rw22+EBCFB9IiOIDCVF8ICGKDyRE8YGEKD6QEMUHEqL4QEIUH0iozQitR2yP2N5TY0EAymuzxf+jpOWF1wGgolGLHxEvS6r7KgoARfEYH0howl6Wy+w8oHdMWPGZnQf0Dnb1gYTa/Dlvs6S/SZpre9j2z8svC0BJbYZmrq6xEAD1sKsPJETxgYQoPpAQxQcSovhAQhQfSIjiAwlRfCChvpidt2DBgqp5tWfZXXXVVVXz9u/fXzVv+/btVfNq/39hdh6ArkDxgYQoPpAQxQcSovhAQhQfSIjiAwlRfCAhig8kRPGBhNqcbPMK2y/a3mv7Ldt31VgYgHLaHKv/paTfRMQu29MlDdneHhFvF14bgELazM77MCJ2NZc/l7RX0qzSCwNQzpge49ueI2mepFdLLAZAHa1flmt7mqSnJa2LiKNn+Tyz84Ae0ar4tqeoU/pNEfHM2a7D7Dygd7R5Vt+SHpa0NyIeKL8kAKW1eYy/WNLtkpbY3t28/bjwugAU1GZ23iuSXGEtACrhyD0gIYoPJETxgYQoPpAQxQcSovhAQhQfSIjiAwn1xey8mTNnVs0bGhqqmld7ll1ttX+eYIsPpETxgYQoPpAQxQcSovhAQhQfSIjiAwlRfCAhig8kRPGBhNqcZfc826/ZfqOZnXdfjYUBKKfNsfrHJS2JiGPN+fVfsf2XiPh74bUBKKTNWXZD0rHm3SnNGwMzgB7W6jG+7QHbuyWNSNoeEczOA3pYq+JHxMmIuFbSbEmLbF9z5nVsr7G90/bOiV4kgIk1pmf1I+IzSS9JWn6Wz22IiIURsXCC1gagkDbP6l9qe0Zz+XxJyyTtK70wAOW0eVb/ckmP2R5Q547iqYh4tuyyAJTU5ln9f0qaV2EtACrhyD0gIYoPJETxgYQoPpAQxQcSovhAQhQfSIjiAwkxO28cduzYUTWv39X+/R05cqRqXjdiiw8kRPGBhCg+kBDFBxKi+EBCFB9IiOIDCVF8ICGKDyRE8YGEWhe/Garxum1OtAn0uLFs8e+StLfUQgDU03aE1mxJN0vaWHY5AGpou8V/UNLdkr4quBYAlbSZpHOLpJGIGBrleszOA3pEmy3+YkkrbB+Q9KSkJbYfP/NKzM4DeseoxY+IeyNidkTMkbRK0gsRcVvxlQEohr/jAwmN6dRbEfGSOmOyAfQwtvhAQhQfSIjiAwlRfCAhig8kRPGBhCg+kBDFBxLqi9l5tWehLViwoGpebbVn2dX+eW7ZsqVqXjdiiw8kRPGBhCg+kBDFBxKi+EBCFB9IiOIDCVF8ICGKDyRE8YGEWh2y25xa+3NJJyV9ySm0gd42lmP1fxQRnxZbCYBq2NUHEmpb/JD0vO0h22tKLghAeW139RdHxCHb35G03fa+iHj59Cs0dwjcKQA9oNUWPyIONf+OSNomadFZrsPsPKBHtJmWe6Ht6acuS7pR0p7SCwNQTptd/cskbbN96vpPRMRzRVcFoKhRix8R+yV9v8JaAFTCn/OAhCg+kBDFBxKi+EBCFB9IiOIDCVF8ICGKDyTkiJj4G7Un/ka/weDgYM047dy5s2re2rVrq+bdeuutVfNq//4WLuzvl5NEhEe7Dlt8ICGKDyRE8YGEKD6QEMUHEqL4QEIUH0iI4gMJUXwgIYoPJNSq+LZn2N5qe5/tvbavK70wAOW0Hajxe0nPRcRPbU+VdEHBNQEobNTi275I0vWSfiZJEXFC0omyywJQUptd/UFJn0h61Pbrtjc2gzW+xvYa2ztt133pGoAxa1P8cyTNl/RQRMyT9IWk9WdeiRFaQO9oU/xhScMR8Wrz/lZ17ggA9KhRix8RH0n6wPbc5kNLJb1ddFUAimr7rP6dkjY1z+jvl3RHuSUBKK1V8SNityQeuwN9giP3gIQoPpAQxQcSovhAQhQfSIjiAwlRfCAhig8k1Bez82pbs2ZN1bx77rmnat7Q0FDVvJUrV1bN63fMzgNwVhQfSIjiAwlRfCAhig8kRPGBhCg+kBDFBxKi+EBCoxbf9lzbu097O2p7XY3FAShj1HPuRcQ7kq6VJNsDkg5K2lZ4XQAKGuuu/lJJ70XE+yUWA6COsRZ/laTNJRYCoJ7WxW/Oqb9C0pb/8Xlm5wE9ou1ADUm6SdKuiPj4bJ+MiA2SNkj9/7JcoNeNZVd/tdjNB/pCq+LbvkDSDZKeKbscADW0HaH1b0nfLrwWAJVw5B6QEMUHEqL4QEIUH0iI4gMJUXwgIYoPJETxgYQoPpBQqdl5n0gaz2v2L5H06QQvpxuyyCOvVt6VEXHpaFcqUvzxsr0zIhb2WxZ55HVbHrv6QEIUH0io24q/oU+zyCOvq/K66jE+gDq6bYsPoAKKDyRE8YGEKD6QEMUHEvoPF72a45tCHDcAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.datasets import load_digits\n", "import matplotlib.pyplot as plt \n", "from sklearn.cluster import KMeans\n", "\n", "# load digital data\n", "digits, dig_label = load_digits(return_X_y=True)\n", "\n", "# draw one digital\n", "plt.gray() \n", "plt.matshow(digits[0].reshape([8, 8])) \n", "plt.show() \n", "\n", "# calculate train/test data number\n", "N = len(digits)\n", "N_train = int(N*0.8)\n", "N_test = N - N_train\n", "\n", "# split train/test data\n", "x_train = digits[:N_train, :]\n", "y_train = dig_label[:N_train]\n", "x_test = digits[N_train:, :]\n", "y_test = dig_label[N_train:]\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAA/CAYAAADAByJpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAEIVJREFUeJztnWtsVNUWx/97ZjptZ/qwSKGlIAW9PlBzsUEQ/KDGB6hRoh/EYHxEDT7AxA9qvImYq0YUfKQmGpUYFUh8ViHRWPCRWg2YCEbxKgjyKBYrpbTl1c60nZl9P9DZrL07Mz1z5nF66PolE9ZhT/f5z3mss/c6a+8tpJRgGIZh3IPHaQEMwzBMerDjZhiGcRnsuBmGYVwGO26GYRiXwY6bYRjGZbDjZhiGcRnsuBmGYVyGJccthJgnhNghhNglhHg816JYB+tgHazjVNWRFaSUKT8AvAB2A5gKwA9gK4Bpw/1dtj+sg3WwDtbhdh3Z+ojBH5UUIcRsAP+VUs4d3P7PoMN/LsXfJK20tLRU266pqVG2z+fTyg4cOKDsgYEBHDlyBEIIDO5f+66UUqSjw+v1attnnnmmsiORiFbW2tqq7Fgshmg0mqxaSzo8npMdnUmTJmnfPf3007V9Udrb25Xd39+Pzs5OVZf53XSPR1lZmbY9ZcoUbV+Uv/76S9nRaBThcDhZtWkfj8mTJ2vfHTNmjLIPHjyolbW1tdH9DDkG6epIRWFhobLptQLo+nt7e7Fv3z4EAgEAwLFjx7KqY/z48cqurKzUyv78809lx2IxDAwMJK0nXR0VFRXa9sSJE5Vt3kuhUEiz29ra1PXV3d0N4MQxi8ViiMViw+qgfqG2tlb7bnFxcVId9Pf39vaitbVV+Z+4jjjpHo9x48Yl3U513wJAV1cX3e+wOhLhG/4rqAHQSrb3A5hlpfI4cWcLADNmzNDKnn/+eWWbF8fy5cuVvXfvXjQ3N6OgoAAAUjoLK5x22mna9ptvvqls80A/8sgjyu7t7dUOvB3oxfbEE09oZXfccYeyzd/40ksvKXvbtm1Yt26dcig9PT0Zabr00ku17VWrVil7//79WtmDDz6o7M7OTs1h2CHu5ADg2Wef1coWLFig7FdffVUre/rpp5Xd19eH48ePZ6QjFdRRvffee1pZSUmJshsbG7Fs2TKce+65AICmpiYAJ+6B4RpJiTCd0e23367sRYsWaWXXXXedso8fP4729nb1UIk3NtLRQe/bq666SitbsWKFsunDFQB+/fVXZTc1NWH16tWYPXs2AGDt2rUYGBhASUkJDh8+bEkH9Qsvv/yyVnbRRRcpOxgMamUdHR3KXr9+Perr6zFz5kwAwIcffmhp3xR6Lm677TatbMmSJco2H9avvPKKtv3BBx8omz7k0sGK4070BBhy5oUQiwAsSvDdvMI6WIeTOtJwiqPieCTZZ951uO28DIcVx70fAO3PTwTQZn5JSrkSwEogva6fVQKBgKWDn2sdZgvIKR2lpaUj4nj4/X5L38u1DhqucFJHVVWVpd5gPq7TkXB9VFZWar3BWCyW8Fzl47xYad3mWke2sOK4NwP4lxBiCoC/AdwKYGE6O6FdSdqdA4CpU6cq++jRo1rZ/PnzlR2NRtHY2IiysjJ4vV78888/6UgAoD/paXcfAC655BJlP/roo1oZvfDsdHdNrrjiCmVfdtllWtk777yj7HPOOUcru+mmm5QdiUSwZs0ajBkzBj6fD3v37k1bB+2Crly5Uiujx8oMw7z22muajtmzZ6O8vBwejwednZ1p67j66quVfc0112hlu3btUvacOXO0svPOO0/ZUkps3Lgx7X1T6G+moREAePzxk0kI8TBIHPqbZ82ahf7+fgQCARUSE0LYDpXQUACgh9bWrl2rldGYrtfrhcfjQWlpKTweD7q6uuDz+SCEGPIOJxk0nPjAAw9oZfT909atW7WyCy64QNmVlZV47LHH1DEJh8MoLCxEOBy2fDxoGO+GG27Qynbs2KHszz//XCuj72JisRi6urqwdetWFW5NF3q9LVu2TCv76KOPlG32KG699VZt+7PPPlN2zkIlUsqIEGIJgA048Wb2bSnl77b2lgFerxfl5eW2HEM2SdTNcwKfz4exY8fiwIEDWXmYZKKjpKQER44ccUwDMLLOS11dHZqbm0+8/R902vlGCIFgMKgaQx6PxxEdPp8Pc+bMQWNjI6SU6oGSbzweD2pqarBnz5687zsXWGlxQ0r5BYAvcqxlWIqKilBUVARAzygYrQQCAfViz8kL0u/3qxdUhw4dckzHSGHChAmYMGECAODjjz92TIff71ehLLM3m08mTZqksqfef/99x3SUlZWp7Bazl+A2LDnuTKFZA2ZMdPfu3UnLzFZcppkkNGRjdl9opsDq1au1smxnK/zxxx/KNjMD6L5oSAIAtm/frm1n2vu4+OKLlW2GBhYuPBkN27x5s1ZmdtGnT5+u7K+//jptHTS90szYoKGSe++9Vyuz2+VNxtixY5VNM4kAPaT1999/a2VmKiUN49npDdHsiGeeeUYr27dvn7LN82CGkuj18e2336atg77PWbNmjVb2/fffK9sMX1RVVWnb9BzS0ECq9E0KPS99fX1a2XPPncxK/uqrr7QyM+UvU/9B00BpqAjQH9D0fgD0EAug+yG7DR0e8s4wDOMy2HEzDMO4DHbcDMMwLiMvMW4alzLT+M4++2xlmznSNH4FDI15p0v8hREwdJQmHS05a5Y+MNQcGUjjjHZimDSubw7jXrp0qbLN2Ninn36qbWc6WrK6ulrZ5ujIH374QdlmTPfnn3/WtmnKmp0Y96ZNm5RtHo958+YpO/5iOo4Zw8wU+jtuueUWrYzGY+l1BOgj9AA9Dc3O9UFTIunvB/Q0VvP6iI9OjLN+/Xplf/nll2nroL/ZnPbg2muvVbb57sGMXdOYt9W4NoW+WDXj1A899JCy6TQNAPDWW29p2+YUCelCUw9Nn0DfRZjTWJgjsc0RnnbgFjfDMIzLYMfNMAzjMvISKqHdLDN1ioYszO6HmYaWalY+K9B90Vn4AH1Uotk9pV1fAHjyySeVTSfUsYP5m5qbm5VNR3MCwF133aVt05Fiv/+e/pgoOtmV2Z2jaUrmLHNmN5Ome9qB7susi47Ca2lp0cqyHSqhITBzEiLavb355pu1MjNU0tvbm5GOuXPnJi2jk7SZo27NEI6ZLpguNHRpTkJG7xEzRGGm5ZnHJ11+/PFHZb/xxhtaGd23OcrUPD4NDQ3KNtMKrUBDnE899ZRWRlMxzbRm83zSUIqd+xbgFjfDMIzrsNTiFkK0ADgGIAogIqWckfovTm22b99uebKpXLJz507HhjJT6uvrUVhY6LiOkcKhQ4ccG+pOaWhoQEFBgeM6Nm7cCK/X67iOvr4+xzVki3Ra3FdIKaePdqcdZ+rUqVpGjFPU1tYOmdjfCe68807cf//9TssYMVRUVAyZp9oJ5s6dixtvvNFpGairqxuSreUEBQUFlme0HMnkJcZNV70xTx6NEZuxVHO1i5aWFvXktjrDGSXVUGQa3zSH95oXfnl5ORYvXozS0lLcfffd6v+taqJpbeXl5VoZTaczJ2Q3FxIoLi7GlVdeiUAgYCtWtm3bNmWbx5qeMzPubKahRaNRtLe3azHzdKArnFx44YVaGV15xkxDNPcnhEBxcTGEELZSJen18frrr2tl9Ddff/31WpmZKielRCgUSqt1R7/722+/KZu+8wD0OK75wP7uu++07XA4rFq7dqD3ozlknr57MN8BmfHfvr4+bN682XZrl6bxmYsS0PdA5uIGZ5xxhrYthFAzJNqJcVOfYaYmf/LJJ8o2/Zg54yW9lmjKZjpYddwSwJeD89O+OThnrSNk+oIyW6xYsQJCiKTzC+eLhoYGx7t/Qgh88YXjc5AByHw+imxhd7rObGPmxTuFHUeZC0bK9ZEpVh33pVLKNiHEOABfCSH+kFJqj/d8rBwRb21LKZM68HzoWLp0KSoqKnD06FEsWbIkYTwzHzoWLlyIkpIS9PT0DGkh5lPH/PnzEQwGEQqFhkzQlU8dRUVF8Hg8kFImzezIh47i4mK1pqKTOqqqquDz+RCNRocMsMqnjvj7DyllUsc5mq6PbGCpqSilbBv89yCAtQBmJvjOSinljFzGwOPOMVULMx864mmFZWVlSSfIz4eO+CxjqUZi5UNHfP+pQiX50BHv+Th9fcR1pOqJ5UNHPASVKlQymu7bkXJ9ZINhW9xCiCAAj5Ty2KB9DYCnh/kzDdpNMqdIpU7HnKaTTjUaiUS0J7edGDddrb2xsVErO+ussxLqBfT871AohGAwiEAgoOKYhYWF8Pl8luOqNMZ9zz33aGU0dmaueE5X+O7p6UF3dzf8fr8WU0tnpZVffvlF2WYPhi7Kaj4YaDy8p6cHzc3N8Hq9toYzA/pKK4sXL9bK6urqlE2n9wT06TPD4TA2bdqEwsJC9Pf348UXX0QwGITf77ec751qaDWNlyaKrccJhULw+/22V7wB9MVk6XQAgD7ewFwh6YUXXlB2JBLR3tvYgS7mS4d7A/q1aL57oeMvpJQpV5u3Aj3el19+uVZGV0+aNm2aVkbjzpFIBP39/RmdF3rfmisCbdmyRdlm3rq5MDmFPtzTuX+shErGA1g7eHH6ALwnpbQXUc+AUCg0IuKGhw8fVi9fotEofD6f9nItX3R0dGDdunUATp5wJ2LdHR0dtpZNyzbd3d149913AZw4HnQRgXzS2dnp+GpAwMiJKY8UwuGwrcbeSMXK0mV7APw7D1pSUlpaqmU3ZHtxA6tUV1drk9fQdTHzSW1trTYRkjmiLJ86aG/F7kiwTKmursZ9992ntpPF/HPNxIkTtRaWU0vtZWMio2zg9IvzOCUlJVqP3swEcht5aSrSFojZraLdOzM1zuy+Z7riCe2KmF2dVatWKdtcbspMd6Kro9CXHFa7OrQbfv7552tlCxYsULbZcjPTFOmio7TVb7VlQUMD5nB6uvipWd/DDz+sbe/cudPS/pJBe1JmCiTtkpvTJZjfpfXQbvNPP/2Utibzxqa9CnP1E/M82Um/o913OgUADVcAeqjE1EHTO7MBdbrm6jr0+DQ1NWll2V4Dld735iLfdEbE+vp6rWzDhg3adqbOmt7r5jW/fPlyZVdWVmplZsrfN998o2y7x4qHvDMMw7gMdtwMwzAugx03wzCMyxDZjkcBgBCiA0APAHtLGOuMtVDPZCllpfmfrGNE69hnsQ7WwTpOBR1WtCTUkRApZU4+ALaMhHpYx8jUwXVwHaOpjmzWI6XkUAnDMIzbYMfNMAzjMnLpuLM1g2Cm9bCO7P59NuvhOriO0VJHNuvJzctJhmEYJndwqIRhGMZl5MRxCyHmCSF2CCF2CSEez6CeFiHE/4QQvwghtgz/F6yDdbAO1nFq6UhIttJTSMqLF8BuAFMB+AFsBTDNZl0tAMayDtbBOljHaNSR7JOLFvdMALuklHuklP0APgDgxBR6rIN1sA7W4XYdCcmF464B0Eq29w/+nx3ia13+NLikEOtgHayDdYwmHQnJxbSuiSbgtZu6Muxal6yDdbAO1nEK60hILlrc+wFMItsTAbTZqUhaWOuSdbAO1sE6TmEdSSvN6gcnWvF7AEzByaD++TbqCQIoJfYmAPNYB+tgHaxjtOhI9sl6qERKGRFCLAGwASfezL4tpbSznlVGa12yDtbBOliH23Ukg0dOMgzDuAweOckwDOMy2HEzDMO4DHbcDMMwLoMdN8MwjMtgx80wDOMy2HEzDMO4DHbcDMMwLoMdN8MwjMv4PxGwa8rerC1wAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# do kmeans\n", "kmeans = KMeans(n_clusters=10, random_state=0).fit(x_train)\n", "\n", "# kmeans.labels_ - output label\n", "# kmeans.cluster_centers_ - cluster centers\n", "\n", "# draw cluster centers\n", "fig, axes = plt.subplots(nrows=1, ncols=10)\n", "for i in range(10):\n", " img = kmeans.cluster_centers_[i].reshape(8, 8)\n", " axes[i].imshow(img)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exerciese - How to caluate the accuracy?\n", "\n", "1. How to match cluster label to groundtruth label\n", "2. How to solve the uncertainty of some digital" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Value the performance of cluster\n", "\n", "Mehtod 1: If the data that has been valued have correct categories data, then use Adjusted Rand Index(ARI), ARI is similar to the method for accuracy calculating which considered the problem that the class cluster cannot correspond to the classification tag.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ari_train = 0.687021\n" ] } ], "source": [ "from sklearn.metrics import adjusted_rand_score\n", "\n", "ari_train = adjusted_rand_score(y_train, kmeans.labels_)\n", "print(\"ari_train = %f\" % ari_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the contingency table:\n", "![ARI_ct](images/ARI_ct.png)\n", "\n", "the adjusted index is:\n", "![ARI_define](images/ARI_define.png)\n", "\n", "* [ARI reference](https://davetang.org/muse/2017/09/21/adjusted-rand-index/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Method 2: if the value that has been evaluated do not have categories, Silhouette Coefficient will be used to evaluate the performance of cluster result. **Silhouette Coefficient take into account both the cohesion and the separation of the clusters, the value range is [-1,1], the higher Silhouette Coefficient represent the better clustering effect will be** \n", "\n", "Detailed steps for calculating Silhouette Coefficient\n", "1. For the ith smapel in the clusterded data$x_i$, calculate the average value between $x_i$ and all the other smaple in the same cluster, written as $a_i$, used to quantify the cohesion within a cluster\n", "2. Choose a cluster $b$ outside of $x_i$, calculate the average distance between $x_i$ and all samples in cluster $b$, traverse all other cluster, find the closest average distance and noted as $b_i$, which can be used to quantify the degree of separation between clusters.\n", "3. For sample $x_i$, Silhouette Coefficient is $sc_i = \\frac{b_i−a_i}{max(b_i,a_i)}$ \n", "4. Finally, calculate average value for all sample $\\mathbf{X}$, which will be the Silhouette Coefficient for current cluster result." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "from sklearn.cluster import KMeans\n", "from sklearn.metrics import silhouette_score\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize']=(10,10)\n", "plt.subplot(3,2,1)\n", "\n", "x1=np.array([1,2,3,1,5,6,5,5,6,7,8,9,7,9]) #Initialize original data\n", "x2=np.array([1,3,2,2,8,6,7,6,7,1,2,1,1,3])\n", "X=np.array(list(zip(x1,x2))).reshape(len(x1),2)\n", "\n", "plt.xlim([0,10])\n", "plt.ylim([0,10])\n", "plt.title('Instances')\n", "plt.scatter(x1,x2)\n", "\n", "colors=['b','g','r','c','m','y','k','b']\n", "markers=['o','s','D','v','^','p','*','+']\n", "\n", "clusters=[2,3,4,5,8]\n", "subplot_counter=1\n", "sc_scores=[]\n", "for t in clusters:\n", " subplot_counter +=1\n", " plt.subplot(3,2,subplot_counter)\n", " kmeans_model=KMeans(n_clusters=t).fit(X) #KMeans modeling\n", "\n", " for i,l in enumerate(kmeans_model.labels_):\n", " plt.plot(x1[i],x2[i],color=colors[l],marker=markers[l],ls='None')\n", "\n", " plt.xlim([0,10])\n", " plt.ylim([0,10])\n", "\n", " sc_score=silhouette_score(X,kmeans_model.labels_,metric='euclidean') #Calculate Silhouette Coefficient\n", " sc_scores.append(sc_score)\n", "\n", " plt.title('k=%s,silhouette coefficient=%0.03f'%(t,sc_score))\n", "\n", "plt.figure()\n", "plt.plot(clusters,sc_scores,'*-') #Draw the relationship between cluster numbers and corresponding Silhouette Coefficient\n", "plt.xlabel('Number of Clusters')\n", "plt.ylabel('Silhouette Coefficient Score')\n", "\n", "plt.show() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to determin the 'k'?\n", "\n", "Using \"Elbow observation\" can cursely determine the relatively reasonable numbers of cluster. K-means modeling are ultimately expecting that the sum of squares between all data points and their class clusters to be stable, so we could find best cluster numbers by observing this value. Under ideal condition, this broken line has an inflection point of slope as it falls and flattens out, this represents that from the K value that this inflection point represents, the increase of cluster center will not extremely broken the cluster inner structure.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEKCAYAAAARnO4WAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAEkZJREFUeJzt3XFsnHd9x/HPp65Zb23B2mqhxm1IxR9Bo1nr7uiYwqoSBCm0Ylb/2ECDaQwp0oagdFtYgjZ1TJUaFKlj0iS0qAUKLVBWUmuiUwNaOjFglNp1INAQTYMyeimKK2TRVt4I4bs/fNfYzt35zne/e+653/slWXHOl/t9r1I/9/j7fJ/f44gQAGD0XVB0AQCAwSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJm4sOgCVrvsssti27ZtRZcBAKUxPz//XERMdvLcoQr8bdu2aW5urugyAKA0bP+o0+fS0gGATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZGKqxTAAok9mFmg4eOalTS8vaMlHR3t3bNTM9VXRZLRH4ALAJsws17T98XMtnzkqSakvL2n/4uCQNbejT0gGATTh45ORLYd+wfOasDh45WVBFGyPwAWATTi0td/X4MCDwAWATtkxUunp8GBD4ALAJe3dvV2V8bM1jlfEx7d29vePXmF2oaeeBo7pq3yPaeeCoZhdq/S5zDU7aAsAmNE7MbnZKp4iTvgQ+AGzSzPRU23BuN7bZ7qQvgQ8AifVzrr7ZEfztDx7T3I9+qjtndhRy0pfABwD1v8XS7Ag+JD3wzf9R9VW/pi0TFdWahHvKk76ctAUA9X+uvtWRetTX6sdJ324R+ACg/s/VtztSP7W0rJnpKd116w5NTVRkSVMTFd11646kV+kmbenYnpB0j6SrtfLB9icR8Z8p1wSAzWjVYrnA1uxCresg3rt7u25/8JiixVrSxid9+y31Ef4/SHo0Il4j6RpJJxKvBwCb0qzFIklnI7T/8PGuZ+Rnpqf0h6/fKq97PHXbpp1kgW/7FZJukHSvJEXEzyNiKdV6ANCLRotlzOsjevO9/Dtndujv/+DagbZt2knZ0rlK0qKkT9q+RtK8pNsi4sWEawLAps1MT+n2B481/dlme/mDbtu0k7Klc6Gk6yR9PCKmJb0oad/6J9neY3vO9tzi4mLCcgDkZjNbF5Rxj5xOpQz8ZyQ9ExGP1//+kFY+ANaIiEMRUY2I6uTkZMJyAOSkMVdfW1pW6Nxc/UahX8S45KAkC/yI+ImkH9tu/Fd6k6SnUq0HAKttdq6+iHHJQUl9pe37JT1g+2WSfiDpPYnXAwBJvc3VD1PfvZ+SBn5EHJNUTbkGADSzma0LynaP2m5xpS2AkdRtL75Zz//2B49pW5MTvoPex75f2DwNwEjqdr/6VpudSWs3UpNUupuXNxD4AEZWN734jXr7q0/4Dnof+36hpQMA6mzO/tTScilvXt5A4AOAWu+ls9qWiUqpL8wi8AFAa+fvJbXc9KzMF2bRwweAutU9/41GNMs4vumIZrs1F6Narcbc3FzRZQBAadiej4iOrneipQMAmSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZSHqLQ9tPS3pe0llJv+j0riwAgP4bxD1t3xgRzw1gHQBAG7R0ACATqQM/JH3Z9rztPc2eYHuP7Tnbc4uLi4nLAYB8pQ78N0TEdZLeKul9tm9Y/4SIOBQR1YioTk5OJi4HAPKVNPAjolb/87SkhyVdn3I9AEBryQLf9sW2L218L+ktkr6baj0AQHspp3ReKelh2411PhsRjyZcDwDQRrLAj4gfSLom1esDALrDWCYAZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8AMgEgQ8AmSDwASATBD4AZCJ54Nses71g+0up1wIAtDaII/zbJJ0YwDoAgDaSBr7tKyTdLOmelOsAADZ2YeLX/5ikD0m6tNUTbO+RtEeStm7dmrgcYHBmF2o6eOSkTi0ta8tERXt3b9fM9FTRZSFjyY7wbd8i6XREzLd7XkQciohqRFQnJydTlQMM1OxCTfsPH1dtaVkhqba0rP2Hj2t2oVZ0achYypbOTklvt/20pM9L2mX7/oTrAUPj4JGTWj5zds1jy2fO6uCRkwVVBCRs6UTEfkn7Jcn2jZL+MiLelWo9oJUiWiunlpa7ehwYBObwMdKKaq1smah09TgwCAMJ/Ij494i4ZRBrAasV1VrZu3u7KuNjax6rjI9p7+7tSdcF2kk9pQMUqqjWSqNlxJQOhgmBj5G2ZaKiWpNwb9Za6Xevf2Z6ioDHUKGHj5HWaWuFMUrkgMDHSJldqGnngaO6at8j2nngqCTprlt3aGqiIkuamqjorlt3nHfk3U2vf/0afCigLBwRRdfwkmq1GnNzc0WXgZJqHKWvDu7K+FjTgF/vqn2PqNn/CZb0wwM3t13DkkIrHyb06TFotucjotrJcznCx8joZSKn0zHKZms0PihoA2HYEfgYGb1M5HTa69/otbiaFsOMKR2MjG4mctbrdIyy1Rqr1ZaWtfPAUcYxMXQIfIyMvbu3N+3hd3qxUydjlM3WWM/SSx8KjTZP4/WBIrVt6dh+ue1XN3n8N9OVBGzOzPRURxM5/VpDWgn31RoncFejzYNh0fII3/bva2U/+9O2xyX9cUQ8Uf/xpyRdl748oDvrWzONoO136Ddeb/3FWq3aPd1c2cs++kilXUvnw5J+KyKetX29pM/Y3h8RD+v8AxtgKKwfm0zdUlnfBtp54OimzyNIg68feWnX0hmLiGclKSK+JemNkv7a9gd0/m+twFAoeh/6XjdNK7p+jLZ2R/jP2351RPy3JNWP9G+UNCvptYMoDuhW0fvQ97ppWtH1Y7S1C/w/lXSB7d+IiKckKSKet32TpHcMpDqgS72MZvZLL5umDUP9GF0tWzoR8e2I+C9JX7D9V15RkXS3pD8bWIVAF8q+D33Z68dw6+RK29+WdKWkb0h6QtIprdyvFhg6gxjNTKns9WO4dXLh1RlJy5Iqki6S9MOI+GXSqoAelH0f+rLXj+HVyRH+E1oJ/NdJ+l1J77T9z0mrAgD0XSdH+O+NiMaexc9K+j3b705YEwAggQ2P8FeF/erHPpOmHABAKmyPDACZIPABIBNsjwxsAhucoYySBb7tiyR9VdKv1Nd5KCLuSLUeMChscIayStnS+T9JuyLiGknXSrrJ9usTrgcMBBucoaySHeFHREh6of7X8foXu2yi9NjgDGWV9KSt7THbxySdlvSViHg85XrAILTayIwNzjDskgZ+RJyNiGslXSHpettXr3+O7T2252zPLS4upiwH6As2OENZDWQsMyKWJD0m6aYmPzsUEdWIqE5OTg6iHKAnbHCGsko5pTMp6UxELNW3VX6zpI+mWg8YJDY4QxmlnMO/XNJ9tse08pvEFyLiSwnXAwC0kXJK5zuSplO9PgCgO2ytAACZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8AMgEgQ8AmSDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZSBb4tq+0/Zjtp2x/z/ZtqdYCAGzswoSv/QtJfxERT9q+VNK87a9ExFMJ1wQAtJDsCD8ino2IJ+vfPy/phKSpVOsBANobSA/f9jZJ05IeH8R6AIDzJQ9825dI+qKkD0bEz5r8fI/tOdtzi4uLqcsBgGwlDXzb41oJ+wci4nCz50TEoYioRkR1cnIyZTkAkLWUUzqWdK+kExFxd6p1AACdSXmEv1PSuyXtsn2s/vW2hOsBANpINpYZEV+T5FSvDwDoDlfaAkAmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANAJgh8AMgEgQ8AmUh5i8PszC7UdPDISZ1aWtaWiYr27t6umWlu8gVgOBD4fTK7UNP+w8e1fOasJKm2tKz9h49LEqEPYCjQ0umTg0dOvhT2DctnzurgkZMFVQQAa3GEv85m2zKnlpa7ehwABo0j/FUabZna0rJC59oyswu1Df/tlolKV48DwKAR+Kv00pbZu3u7KuNjax6rjI9p7+7tfa0RADaLls4qvbRlGm0fpnQADCsCf5UtExXVmoR7p22ZmekpAh7A0Br5ls7sQk07DxzVVfse0c4DR9v242nLABhlI32E3+1sPG0ZAKNspAO/3UnYViFOWwbAqBrplg6z8QBwzkgHPrPxAHDOSAc+J2EB4JxkgW/7E7ZP2/5uqjU2MjM9pbtu3aGpiYosaWqiortu3UGPHkCWUp60/ZSkf5T06YRrbIiTsACwIlngR8RXbW9L9fqdYH96ADin8LFM23sk7ZGkrVu39u112Z8eANYq/KRtRByKiGpEVCcnJ/v2uuxPDwBrFR74qbSata8tLXe03TEAjJqRDfx2s/ad7nEPAKMkWQ/f9uck3SjpMtvPSLojIu5NtV5D40RtbWlZlhRNnrN+e4VeT+5ychhAGaSc0nlnqtduZf2J2mZh39Bo+fR6cpeTwwDKYqRaOs1O1LbSaPn0enKXk8MAymKkAr/TTdFWb6/Q6wZrbNAGoCxGKvBbnaidqIy33F6h1w3W2KANQFmMVOC32iztb9/+Wn193y798MDN+vq+XWt6671usMYGbQDKovArbftpM3es6vUuV9wlC0BZOKLdLMtgVavVmJub2/S/ZzwSQG5sz0dEtZPnlv4Iv9XcPeORALBWqXv4jRn4Wn0iZv3vKoxHAsA5pQ78TubuGY8EgBWlDvxOwpzxSABYUerA3yjMGY8EgHNKHfjNZuBd/5P71wLAWqWe0mEGHgA6V+rAl7hJOQB0qtQtHQBA5wh8AMgEgQ8AmSDwASATBD4AZILAB4BMDNX2yLYXJf2o6DrWuUzSc0UXkQDvqzxG8T1Jo/m+inhPr4qIyU6eOFSBP4xsz3W613SZ8L7KYxTfkzSa72vY3xMtHQDIBIEPAJkg8Dd2qOgCEuF9lccovidpNN/XUL8nevgAkAmO8AEgEwR+C7Y/Yfu07e8WXUs/2b7S9mO2n7L9Pdu3FV1Tr2xfZPtbtr9df08fKbqmfrI9ZnvB9peKrqUfbD9t+7jtY7bniq6nX2xP2H7I9vdtn7D9O0XXtB4tnRZs3yDpBUmfjoiri66nX2xfLunyiHjS9qWS5iXNRMRTBZe2abYt6eKIeMH2uKSvSbotIr5ZcGl9YfvPJVUlvTwibim6nl7ZflpSNSJGagbf9n2S/iMi7rH9Mkm/GhFLRde1Gkf4LUTEVyX9tOg6+i0ino2IJ+vfPy/phKRS31AgVrxQ/+t4/WskjmRsXyHpZkn3FF0LWrP9Ckk3SLpXkiLi58MW9hKBnzXb2yRNS3q82Ep6V297HJN0WtJXIqL076nuY5I+JOmXRRfSRyHpy7bnbe8pupg+uUrSoqRP1ttv99i+uOii1iPwM2X7EklflPTBiPhZ0fX0KiLORsS1kq6QdL3t0rfhbN8i6XREzBddS5+9ISKuk/RWSe+rt0/L7kJJ10n6eERMS3pR0r5iSzofgZ+hep/7i5IeiIjDRdfTT/Vfox+TdFPRtfTBTklvr/e8Py9pl+37iy2pdxFRq/95WtLDkq4vtqK+eEbSM6t+s3xIKx8AQ4XAz0z9BOe9kk5ExN1F19MPtidtT9S/r0h6s6TvF1tV7yJif0RcERHbJL1D0tGIeFfBZfXE9sX1YQHVWx5vkVT6SbiI+ImkH9veXn/oTZKGbhCi9DcxT8X25yTdKOky289IuiMi7i22qr7YKendko7Xe96S9OGI+NcCa+rV5ZLusz2mlYOYL0TESIwwjqBXSnp45bhDF0r6bEQ8WmxJffN+SQ/UJ3R+IOk9BddzHsYyASATtHQAIBMEPgBkgsAHgEwQ+ACQCQIfADJB4AMdsP2o7aVR2bESeSLwgc4c1Mr1C0BpEfjAKrZfZ/s79T32L67vr391RPybpOeLrg/oBVfaAqtExBO2/0XSnZIqku6PiNJf+g9IBD7QzN9JekLS/0r6QMG1AH1DSwc4369LukTSpZIuKrgWoG8IfOB8/yTpbyQ9IOmjBdcC9A0tHWAV238k6UxEfLa+++Y3bO+S9BFJr5F0SX331PdGxJEiawW6xW6ZAJAJWjoAkAkCHwAyQeADQCYIfADIBIEPAJkg8AEgEwQ+AGSCwAeATPw/YAuxwZ+qdB8AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import numpy as np\n", "from sklearn.cluster import KMeans\n", "from scipy.spatial.distance import cdist\n", "import matplotlib.pyplot as plt\n", "\n", "cluster1=np.random.uniform(0.5,1.5,(2,10))\n", "cluster2=np.random.uniform(5.5,6.5,(2,10))\n", "cluster3=np.random.uniform(3,4,(2,10))\n", "\n", "X=np.hstack((cluster1,cluster2,cluster3)).T\n", "plt.scatter(X[:,0],X[:,1])\n", "plt.xlabel('x1')\n", "plt.ylabel('x2')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvqOYd8AAAIABJREFUeJzt3XmcFOWdx/HPlysCihejIqBojBp1TdTBeEVBvOMVD+LBEF1dY6KJuq5EzeZeE9fE7KpJNOKtiOIRNUo8kiBijMrhieBGUSIGuTzAA+X47R9PzTCMc/QM01Pd09/361Wv7qqu7vp2D/Sv63mqnlJEYGZmBtAl7wBmZlY6XBTMzKyOi4KZmdVxUTAzszouCmZmVsdFwczM6rgoVBhJIWmrIrzu+5K2LMLr/kjSLe38mn+U9PVmHr9B0n+tweufJOnxtj6/lduq+3uuae5S0p7vpRj/hjozF4UyJGkvSU9Iek/S25L+KmlwB27/UUmn1l8WEWtHxKyOyrAmIuLgiLgR1vwLXNKg7Iu5W/slXO31fyRpWVZ0a6d3i7Gttsje+/z6719S92xZQSdBdWQRtZa5KJQZSX2A+4ErgA2A/sCPgY/zzGVFdXtWdGun9fIO1MA7wMH15g/OllkZclEoP1sDRMTYiFgRER9FxMMR8XztCpL+VdIMSe9IekjS5o29kKTPSPqlpH9ImifpKkk96z1+hKRnJS2W9KqkgyRdBHwZ+HX2q/XX2boNmzF+I+kBSUskPSXps/Ve9wBJL2d7Or+VNLHhnkcTebtLGivpLkk9Gjy2haR3JXXJ5kdLml/v8ZslnZ3df1TSqZI+D1wF7N7IL/D1m8rfwGPZ7bvZa+xeb5u/zP4Gr0k6uN7ydSVdK2mupDcl/Zekri29/wL1lfRIlnti/b+9pD0kTc4+98mS9siWD5X0Qr31HpE0ud78JElHNrPNm4GR9eZHAjfVX6Gp99zWv0FT7yV7bIvsvS+R9AjQt9APz4CI8FRGE9AHWATcSPpFtn6Dx48AXgE+D3QD/hN4ot7jAWyV3f8f4D7SHsc6wB+An2eP7Qq8B+xP+vHQH9g2e+xR4NQG263/ujdkGXfNMowBbsse6wssBo7KHjsLWNbw9eq97o+AW4CewAPZa3dtYt1/ALtk918GZgGfr/fYTg3zAycBjzd4nSbzN7LNQdl771Zv2UnZe/o3oCvwTeCfgLLHfw/8DugNbAQ8DXyjufffzL+Hhp/7EmBv4DPAZbXvLfsbvwPUZO/p+Gx+w+yzXZr9bboD84A3s38TPYGPgA2b2f4O2XPWA9bP7u8ARL31mnzPrf0bNPdessf/Bvwq+wz2zj6TJj9DT6tP3lMoMxGxGNiL9J9xNLBA0n2SNs5WOZ30xT4jIpYDPwO+2HBvQZKA04BzIuLtiFiSrXtctsopwHUR8UhErIyINyNiZiui/j4ins4yjAG+mC0/BJgeEXdnj10OvNXCa/UBHgReBU6OiBVNrDcR2EfSJtn8ndn8FtlrPNcO+Qs1OyJGZ1lvBPoBG2d/p0OAsyPig4iYTyrOxzXzWsOzvaDaaUIz6z4QEY9FxMfA90i/wAcCXwH+HhE3R8TyiBgLzAQOi4iPgMmkL9BdSJ/TX4E9gd2y5y1qZptLST8ovpZN92XLAGjje4am/wZNvhdJmwGDge9HxMcR8ViWzQpUlM4xK66ImEH6dYWkbUm/pP+X9Itpc+AySZfWe4pIv/Rn11tWBfQCpqb6ULdebTPGQGD8GsSs/0X/IbB2dn9T4I167yUkzWnhtXYj/YI9PrKfgk2YCBwOzCE16zxK+jW5FJgUESvbIX+rnx8RH2af8dqkX7ndgbn1Pvcu1PtMGjEuIkYUuN36n+37kt4mfeabsvrfn2y+f3Z/IjCE9NlNJP3y3ofUVzWxgO3eBPyc9G/ouw0e25zWv2do/t9QU+9lU+CdiPigwWMDW34LBi4KZS8iZkq6AfhGtugN4KKIGNPCUxeSmgW2j4g3G3n8DaCpdvQ1GVp3LjCgdibbYxnQ9OoAPAw8D/xZ0pCImNfEehOBX7Dqi+1xUnv1Upr+YlvTYYJb+/w3SF+0fbNfwO2t7stPUm0R+mc2Nexb2oy0Bwbp87mU1Mx2MakojM6y/qaA7U4i7Q0F6XOv/2+npffc2s+wufcyl9QX0bteYdisDduoWG4+KjOStpV0rqQB2fxA0h7Ck9kqVwEXSNo+e3xdScc2fJ3sV/No4H8kbZSt21/Sgdkq1wInSxomqUv22LbZY/OAtp6T8ADwL5KOVDqM8QxgkxaeQ0RcAtxKKgyNdhxGxN9JhW4EMDFrapsHHE3TRWEeMKBhx3UrLABWUuDnERFzSUXuUkl9ss/2s5L2aeP2GzpE6ZDlHsBPgScj4g3SXt/Wkk6Q1E3S14DtSEeyATwBbENqw386IqaTvni/xKrO9ObeVwCHAYc33Jsr4D239m/Q5HuJiNnAFODHknpI2ivLZQVyUSg/S0j/UZ+S9AGpGLwInAsQEb8H/hu4TdLi7LGDm3it75I6pZ/M1v0T6YuBiHgaOJnU9vse6Uu19tfZZcAx2ZE1l7cmfEQsBI4FLiF1JG5H+k/c4iG1EfFT4B7gT5I2aGK1icCi7Iuwdl7AtCbW/wswHXhL0sJC30e9TB8CFwF/zdr7dyvgaSOBHsBLpF/kd5J+ZTfla1r9PIX3awt5I24Ffgi8TeofGJHlXAQcSvp3sggYBRya/T3IflVPI/X3fJK91t9IfSPzKUBETM+KSWvfc6v+Bi29F+AE0v+Rt0mfxU2NvY41TtFsE61ZcSkdQjoHODEimutANbMO4D0F63CSDpS0nqTPABeSfsk/2cLTzKwDuChYHnYnHV66kNTee2R2WKSZ5czNR2ZmVsd7CmZmVqfszlPo27dvDBo0KO8YZmZlZerUqQsjoqql9cquKAwaNIgpU6bkHcPMrKxIangWeKPcfGRmZnVcFMzMrI6LgpmZ1XFRMDOzOi4KZmZWp9MXhUsugQkNRtSZMCEtNzOz1RWtKEgaKGmCpJckTZd0ViPrDMmusfpsNv2gvXMMHgzDh68qDBMmpPnBg9t7S2Zm5a+Y5yksB86NiGmS1iFd4euRiHipwXqTIuLQYoUYOhTGjYOjj4att4ZXX03zQ4cWa4tmZuWraHsKETE3IqZl95cAM1h16b8ONXQoHHggPPVUuu+CYGbWuA7pU5A0CNgJeKqRh3eX9JykP9ZeLayR558maYqkKQsWLGj19idMgD/9CXr1gnvv/XQfg5mZJUUvCtl1Yu8Czs4uj1jfNGDziPgCcAXpqlqfEhFXR0R1RFRXVbU4dMdqavsQxo2D006DlSvh2GNdGMzMGlPUoiCpO6kgjImIuxs+HhGLI+L97P54oHtT199tq8mTV/Uh1NTA8uXpdvLk9tyKmVnnULSOZkkiXfx9RkT8qol1NgHmRURI2pVUpBa1Z45Ro1bd32kn2H57ePpp+Otf23MrZmadQzH3FPYEaoB96x1yeoik0yWdnq1zDPCipOeAy4HjoohX/ZHSXsITT8ArrxRrK2Zm5avsrrxWXV0dazJ09pw5sNlm8IMfwI9+1H65zMxKmaSpEVHd0nqd/ozmhgYMgH33hZtvhjKrh2ZmRVdxRQFSE9KsWakZyczMVqnIonDUUemchZtvzjuJmVlpqciisM468NWvwu23w9KleacxMysdFVkUIDUhvfsuPPBA3knMzEpHxRaFYcOgXz+46aa8k5iZlY6KLQrdusEJJ8D48bBwYd5pzMxKQ8UWBYCRI9OwF7ffnncSM7PSUNFFYccd0+QmJDOzpKKLAqQO56efhpdfzjuJmVn+Kr4onHACdOnicxbMzMBFgU03hf32g1tuSddaMDOrZBVfFCB1OM+eDY8/nncSM7N8uSgARx4JvXu7w9nMzEWBVBCOPhruuAM++ijvNGZm+XFRyIwcCYsXw3335Z3EzCw/LgqZIUOgf38fhWRmlc1FIdO1K4wYAQ8+CPPm5Z3GzCwfLgr11NTAihVw2215JzEzy4eLQj3bbw877+wmJDOrXC4KDdTUwNSp8NJLeScxM+t4LgoNHH986l/w3oKZVSIXhQY23hgOPNDDXphZZXJRaERNDcyZA48+mncSM7OO5aLQiCOOgD593IRkZpXHRaERPXvCMcfAnXfChx/mncbMrOO4KDShpgbefx/uuSfvJGZmHcdFoQl77w2bbeaRU82ssrgoNKFLlzTsxSOPwNy5eacxM+sYLgrNqKlJh6WOHZt3EjOzjuGi0Ixtt4XBg92EZGaVw0WhBTU18Nxz8PzzeScxMys+F4UWHHccdOvmcxbMrDIUrShIGihpgqSXJE2XdFYj60jS5ZJekfS8pJ2Llaetqqrg4INhzJg0rLaZWWdWzD2F5cC5EbEdsBtwhqTtGqxzMPC5bDoNuLKIedps5Mh0BNKf/5x3EjOz4ipaUYiIuRExLbu/BJgB9G+w2hHATZE8CawnqV+xMrXVoYfCuuu6CcnMOr8O6VOQNAjYCXiqwUP9gTfqzc/h04UDSadJmiJpyoIFC4oVs0lrrQXDh8Pdd6eznM3MOquiFwVJawN3AWdHxOK2vEZEXB0R1RFRXVVV1b4BCzRyZBoH6e67c9m8mVmHKGpRkNSdVBDGRERjX6dvAgPrzQ/IlpWcPfeELbZwE5KZdW7FPPpIwLXAjIj4VROr3QeMzI5C2g14LyJKclAJKZ2z8Oc/p2stmJl1RsXcU9gTqAH2lfRsNh0i6XRJp2frjAdmAa8Ao4FvFTHPGhsxAiLg1lvzTmJmVhyKiLwztEp1dXVMmTIlt+3vsQcsXgwvvJD2HszMyoGkqRFR3dJ6PqO5lWpqYPp0ePbZvJOYmbU/F4VWGj4cund3h7OZdU4uCq204YbpZLZbb4Xly/NOY2bWvlwU2qCmBubNSxfgMTPrTLq1tIKkrYHzgM3rrx8R+xYxV0k75BDYYIPUhHTwwXmnMTNrPy0WBeAO4CrSIaMeJxT4zGfga1+D669PRyL16ZN3IjOz9lFI89HyiLgyIp6OiKm1U9GTlbiaGli6FO66K+8kZmbtp5Ci8AdJ35LUT9IGtVPRk5W43XaDrbbypTrNrHMppCh8ndSn8AQwNZvyO3usRNQOe/HoozB7dt5pzMzaR4tFISK2aGTasiPClboRI9LtmDH55jAzay8tFgVJ3SV9R9Kd2XRmNvppxdtyS9hrr3QUUpmNFmJm1qhCmo+uBHYBfptNu1Cil83MQ00NzJwJUyu+693MOoNCisLgiPh6RPwlm04GBhc7WLk49th0iKo7nM2sMyikKKyQ9NnaGUlb4vMV6qy/Phx2GIwdC8uW5Z3GzGzNFFIUzgMmSHpU0kTgL8C5xY1VXkaOhIUL4cEH805iZrZmWjyjOSL+LOlzwDbZopcj4uPixiovBx0EffumDufDDss7jZlZ2zVZFCTtGxF/kXRUg4e2kkQT11yuSN27w3HHwejR8O67sN56eScyM2ub5pqP9sluD2tkOrTIucrOyJHw8cdwxx15JzEzaztfjrOdRMDnPw8bbQSPPZZ3GjOz1bXb5TglnSWpj5JrJE2TdED7xOw8pLS3MGkSvPZa3mnMzNqmkKOP/jUiFgMHABsCNcDFRU1Vpk48Md3ecku+OczM2qqQoqDs9hDgpoiYXm+Z1bP55rDPPh72wszKVyFFYaqkh0lF4SFJ6wArixurfI0cCX//Ozz1VN5JzMxar9miIEnAD4DzScNdfAj0AE7ugGxl6ZhjYK210t6CmVm5abYoRDo0aXxETIuId7NliyLi+Q5JV4b69IEjj4TbboNPPsk7jZlZ6xTSfDRNkgfAa4WaGnj7bRg/Pu8kZmatU0hR+BLwpKRXJT0v6QVJ3lNoxgEHpPMV3IRkZuWmxbGPgAOLnqKT6dYNTjgBfvObtMewQcVf0drMykUhl+OcDQwE9s3uf1jI8ypdTU0aSnvcuLyTmJkVrpAzmn8IfBe4IFvUHfDpWS3YaSfYfntffMfMykshv/i/ChwOfAAQEf8E1ilmqM5ASnsLf/sbvPJK3mnMzApTSFH4JDs0NQAk9S5upM7jxBNTcXCHs5mVi0KKwjhJvwPWk/RvwJ+A0cWN1TkMGAD77pvGQvKwF2ZWDgrpaP4lcCdwF7A18IOIuKKl50m6TtJ8SS828fgQSe9JejabftDa8OWgpgZmzYInnsg7iZlZywo9iugFYBLwWHa/EDcAB7WwzqSI+GI2/aTA1y0rRx0FvXq5w9nMykMhRx+dCjwNHAUcQzqR7V9bel5EPAa8vcYJy9w668BXv5oOTV26NO80ZmbNK2RP4Txgp4g4KSK+DuxCOkS1Pewu6TlJf5S0fVMrSTpN0hRJUxYsWNBOm+44I0emazfff3/eSczMmldIUVgELKk3vyRbtqamAZtHxBeAK4B7mloxIq6OiOqIqK6qqmqHTXesYcOgXz8fhWRmpa+QovAK8JSkH2Unsj0J/J+kf5f0723dcEQsjoj3s/vjge6S+rb19UpZ165p2Ivx42HhwrzTmJk1rZCi8CrpV3ztQZX3Aq+RTmBr80lskjbJrteApF2zLO2xB1KSRo6E5cvTkNpmZqWqxQHxIuLHtfcldQHWzq7Z3CxJY4EhQF9Jc4AfkobIICKuInVaf1PScuAj4LjsJLlOaccd03TzzXDmmXmnMTNrXItFQdKtwOnACmAy0EfSZRHxi+aeFxHHt/D4r4FftyJr2Rs5Ev7jP+Dll2GbbfJOY2b2aYU0H22X7RkcCfwR2AKoKWqqTuqEE6BLF3c4m1npKqQodJfUnVQU7ouIZazqX7BW6NcP9tsvDXuxcmXeaczMPq2QovA74HWgN/CYpM2BFvsUrHEjR8Ls2TBpUt5JzMw+rZCxjy6PiP4RcUgks4GhHZCtUzrySOjd201IZlaamuxoljQiIm5p5lyEXxUpU6fWuzcccwzccQdccQX07Jl3IjOzVZrbU6i9bsI6TUzWRjU1sHgx3Hdf3knMzFancjs1oLq6OqZMmZJ3jDWyYgUMGpTOW3jggbzTmFklkDQ1IqpbWq/ZPgVJQyXdJWl6Nt0paUi7paxQXbumq7I99BDMm5d3GjOzVZosCpK+AlwH3A+cAJwIjAeuk3RIx8TrvGpq0h6Dh70ws1LS3J7CecCREXF9RDwXEc9GxHWk8xXaa+jsirX99rDzzr74jpmVluaKwiYR8VzDhRHxPLBx8SJVjpoamDYNpk/PO4mZWdJcUfigjY9ZAS65BAYOTP0LtecsTJiQlpuZ5aW5AfE+K6mxgyYFbFmkPBVj8GAYPhyqq2HMGNh/fzjuuHTZTjOzvDRXFI5o5rFftneQSjN0aCoARxwBS5bAUUfBPfek5WZmeWmyKETExI4MUomGDoVvfxt+9jNYZx0YMiTvRGZW6QoZEM+KZMIEuPpqOOggePNNuOyyvBOZWaVzUcjJhAmpT2HcOLj7blhvPRg1Ki03M8tLwUVBUq9iBqk0kyengjB0aBoU7/zzYdmyVCDMzPLSYlGQtIekl4CZ2fwXJP226Mk6uVGjVu9U/uY3097CnDn5ZTIzK2RP4X+AA4FFANkJbXsXM1Ql6tMndTrfc49PZjOz/BTUfBQRbzRYtKIIWSred74DvXrBxRfnncTMKlUhReENSXsAIam7pP8AZhQ5V0Xq2xdOPx3GjoVZs/JOY2aVqJCicDpwBtAfeBP4YjZvRXDuuWnoCw93YWZ5KOQazQsj4sSI2DgiNoqIERGxqCPCVaJNN4WTT4brr4d//jPvNGZWaZob5gIASZc3svg9YEpE3Nv+kWzUKBg9Gi69NE1mZh2lkOajtUhNRn/Pph2BAcApkv63iNkq1pZbwvHHw1VXwSLvk5lZByqkKOwIDI2IKyLiCmA/YFvgq8ABxQxXyS64AD780ENfmFnHKqQorA+sXW++N7BBRKwAPi5KKmP77eHII+GKK2Dx4rzTmFmlKKQoXAI8K+l6STcAzwC/kNQb+FMxw1W6Cy+Ed99NzUhmZh1BEdHySlI/YNdsdnJE5HZcTHV1dUyZMiWvzXe4Aw6A55+H115LYySZmbWFpKkRUd3SeoUOiLcUmAu8A2wlycNcdJALL4R58+C66/JOYmaVoJAB8U4FHgMeAn6c3f6ouLGs1j77wB57pJPZli3LO42ZdXaF7CmcBQwGZkfEUGAn4N2iprI6Utpb+Mc/0rWczcyKqZCisDQilgJI+kxEzAS2aelJkq6TNF/Si008LkmXS3pF0vOSdm5d9MpxyCHwhS+kgfJWeChCMyuiQorCHEnrAfcAj0i6F5hdwPNuAA5q5vGDgc9l02nAlQW8ZkWq3Vt4+WVfhMfMiqugo4/qVpb2AdYFHoyITwpYfxBwf0Ts0MhjvwMejYix2fzLwJCImNvca1ba0Ue1VqyA7bZLQ2tPm5YKhZlZodrl6CNJXSXNrJ2PiIkRcV8hBaEA/YH612mYky1rLMdpkqZImrJgwYJ22HT56do1XbLz2WfhwQfzTmNmnVWzRSE7a/llSZt1UJ6mclwdEdURUV1VVZVnlFydeCIMHAgXXQSt2MEzMytYocNcTJf0Z0n31U7tsO03gYH15gdky6wJPXqkEVT/+leYNCnvNGbWGbU4dDbw/SJt+z7gTEm3AV8C3mupP8HglFPgpz9Newt7+xRCM2tnhVxkZyLwOtA9uz8ZmNbS8ySNBf4GbCNpjqRTJJ0u6fRslfHALOAVYDTwrba9hcrSsyeccw48/DBUYH+7mRVZi0cfSfo30iGjG0TEZyV9DrgqIoZ1RMCGKvXoo/oWL4bNNoNhw+Cuu/JOY2bloD3HPjoD2BNYDBARfwc2WrN4tib69IFvfzuds/DSS3mnMbPOpJCi8HH9Q1AldQN87EvOzjornbNw8cV5JzGzzqSQojBR0oVAT0n7A3cAfyhuLGtJ377wjW/ArbfCrFl5pzGzzqKQonA+sAB4AfgGqYP4P4sZygpz7rnppLZf/CLvJGbWWRRSFI4EboqIYyPimIgYHa0ZG8OKpn9/OOmkdK2FuT6Y18zaQSFF4TDg/yTdLOnQrE/BSsSoUbB8OVx6ad5JzKwzKOQ8hZOBrUh9CccDr0q6ptjBrDCf/Swcf3y6jvOiRXmnMbNyV9DlOCNiGfBH4DZgKqlJyUrE+efDBx/AFVfkncTMyl0hl+M8WNINwN+Bo4FrgE2KnMtaYYcd4Igj4PLLYcmSvNOYWTkrZE9hJOkCO9tExEkRMT4ilhc5l7XShRfCO++kZiQzs7YqpE/h+Ii4JyI+BpC0l6TfFD+atcauu8J++6UO548+yjuNmZWrgvoUJO0k6ReSXgd+Csxs4SmWgwsvhHnz4Prr805iZuWqyaIgaWtJP8yuvHYF8A/SAHpDI8JdmiVoyBDYfXe45BJYtizvNGZWjprbU5gJ7AscGhF7ZYVgRcfEsraQ0t7C7Nlp+Aszs9ZqrigcBcwFJkgaLWkY4MvFl7ivfAV23BF+/nNY4RJuZq3UZFHIOpePA7YFJgBnAxtJulLSAR0V0Fqndm/h5Zfh97/PO42ZlZtCjj76ICJujYjDSNdRfgb4btGTWZsdcwx87nPws5+BR6kys9Yo6OijWhHxTkRcnddV16wwXbums5yfeQYeeijvNGZWTlpVFKx8jBgBAwfCRRflncTMyomLQifVowecdx48/jhMmpR3GjMrFy4Kndgpp0BVVepbMDMrhItCJ9arF5xzDjz4IEydmncaMysHLgqd3Le+Beuum85bMDNriYtCJ7fuunDmmXD33TBjRt5pzKzUuShUgLPPhp494eKL805iZqXORaEC9O0Lp50GY8bA66/nncbMSpmLQoU491zo0iWNoGpm1hQXhQoxYACcdBJcdx3MnZt3GjMrVS4KFWTUqHSdhV/9Ku8kZlaqXBQqyFZbwXHHwZVXwttv553GzEqRi0KFOf98+OADuMLXzjOzRrgoVJh/+Rc4/HC47DJYsiTvNGZWaopaFCQdJOllSa9IOr+Rx0+StEDSs9l0ajHzWHLhhfDOO/C73+WdxMxKTdGKgqSuwG+Ag4HtgOMlbdfIqrdHxBez6Zpi5bFVvvQlGDYMLr0Uli7NO42ZlZJi7insCrwSEbMi4hPgNuCIIm7PWuF734O33oLrr887iZmVkmIWhf7AG/Xm52TLGjpa0vOS7pQ0sLEXknSapCmSpixYsKAYWSvOkCGw227pZLZly/JOY2alIu+O5j8AgyJiR+AR4MbGVsouAVodEdVVVVUdGrCzklLfwuuvw9ixeacxs1JRzKLwJlD/l/+AbFmdiFgUER9ns9cAuxQxjzVw6KGw445pWO2VK/NOY2aloJhFYTLwOUlbSOoBHAfcV38FSf3qzR4OeHDnDiTBBRfAzJlwzz15pzGzUlC0ohARy4EzgYdIX/bjImK6pJ9IOjxb7TuSpkt6DvgOcFKx8ljjjj02nel80UUQkXcaM8ubosy+Caqrq2PKlCl5x+hUrr0WTj01XbbzwAPzTmNmxSBpakRUt7Re3h3NVgJqatIoqhddlHcSM8ubi4LRowecdx5MmpQmM6tcLgoGpOajqqp0JJKZVS4XBQOgVy845xz44x9h2rS805hZXlwUrM63vgV9+nhvwaySuShYnXXXhTPPhLvuSucumFnlcVGw1Zx9Nqy1Flx8cd5JzCwPLgq2mqoq2HlnuPnmNC5SrQkT0uB5Zta5uSjYp3znO2kspLPPTvMTJsDw4TB4cL65zKz4uuUdwErP8OFw441w771QXQ0zZsApp8CCBalAVFWlacMNoZv/BZl1Kh7mwho1ezbsuSe8+WbT60iwwQarisRGG62633B+o43WvIhccknaWxk6dNWyCRNg8mQYNartr2tWCQod5sK/86xRs2bBxx/D978PV14JV10F22yT9hbmz0+39af58+Gll9L9RYuaHlyvtog0V0Bq5/v2Xb2IDB6c9mLGjUuFobZZa9y4jvlMzCqBi4J9Sv0v26FD01R/viUrVqTC0LBoNJyfOTMNq7FoUdPXc1h//dULxm67petAfPnL8MQT8JOfQL9+8O676ZBaqX0/C7NK4+Yj+5SObqZZsQLefrvxvY/G5ufPb/x1evSAjTdVqbAxAAAIBklEQVQubNpgAxcQqyyFNh+5KFhZqd2LGTkSrr8evvc92GQTmDev8Wn+fFi+/NOv061b2gNprGBsssnq8xtuCF2aOU7PfR1WDtynYJ1Ow2atQw9dNX/iiY0/Z+VKeOedTxeLt95aff7FF9PtsmWffo2uXVPTVVN7HQBHHw2jR8MRR6QmMfd1WLnynoKVjWL/Io9IfRON7XE0LCLz5qWO+MZI0L8/bL756p3nVVWp87zhsrXWWvPsZi1x85FZEUXA4sWrF4lrr01Xr6uuhi22SP0fCxeuul2xovHXWnvtpgtGY8vXWaew/hA3a1l9bj4yKyIpHe207rqw9dbpy3bKlFWH8F5yyepfxitXpr2Q2k7z2mLRcHrrLXjhhXR/6dLGt92jR/N7HvUP6z32WLj9dhg2rHQO4XWxKm0uCmZrqJBDeLt0SUc8bbBBOt+jJRHwwQctF5GFC+G119L9xYsbf6399oPu3VOH+0YbwRlnQM+e6Roa7X3bvXvL783nm5Q2FwWzNTR58uoFYOjQND95cmHndTRGSs1Ka6+dmqIK8fHHqUg0LCB33pk6v3fZBXbYAT78ED76KN0uWZKO0Kq/7KOP0tQWXbum4tBSARk8GA45BHbaCZ57DkaMgOnTU4GrfX7D16k/rbVW+x9S7D2YxH0KZp1Y7a/wb34zNWsVegLiypWpyDQsFh9+2PiyQm/r31+0qOnO+kI0LBaNFY/WLJs+HS64AH7729Tc9vTTcNJJhX9mxdJexcp9CmYVbk3OTO/SJX1R9uxZ3GyjRqVideON6YuvtnA0LCStWVbb7NZweVN9NA0df/zq8/vvnz6HtdZa/baxZa1Zp6Xn1w7x0tHNbS4KZp1UMZq12sOaDqPSVitXpsLQUlG59Va4//7UF7P33uk5H3306dva+4sWNb1OU8O3FKJbt1UFokuXVJz23Reeeaa4n5Wbj8ysQ5Vy231bm9saE5FOhmyqYDQsLi2tM3lyGnTy+99PY361ls9TMDNrhYZ7MA3nSyHbmhSrQouCr7xmZkbzzW15ql+cfvKTdDt8eFpeDN5TMDMrYR199JGLgplZBXDzkZmZtZqLgpmZ1XFRMDOzOi4KZmZWx0XBzMzqlN3RR5IWALPb+PS+wMJ2jNNeSjUXlG4252od52qdzphr84ioammlsisKa0LSlEIOyepopZoLSjebc7WOc7VOJedy85GZmdVxUTAzszqVVhSuzjtAE0o1F5RuNudqHedqnYrNVVF9CmZm1rxK21MwM7NmuCiYmVmdiigKkq6TNF/Si3lnqU/SQEkTJL0kabqks/LOBCBpLUlPS3ouy/XjvDPVJ6mrpGck3Z93llqSXpf0gqRnJZXMML6S1pN0p6SZkmZI2r0EMm2TfU6102JJZ+edC0DSOdm/+RcljZW0Vt6ZACSdlWWaXuzPqiL6FCTtDbwP3BQRO+Sdp5akfkC/iJgmaR1gKnBkRLyUcy4BvSPifUndgceBsyLiyTxz1ZL070A10CciDs07D6SiAFRHREmd8CTpRmBSRFwjqQfQKyLezTtXLUldgTeBL0VEW09Kba8s/Un/1reLiI8kjQPGR8QNOefaAbgN2BX4BHgQOD0iXinG9ipiTyEiHgPezjtHQxExNyKmZfeXADOA/vmmgkjez2a7Z1NJ/HqQNAD4CnBN3llKnaR1gb2BawEi4pNSKgiZYcCreReEeroBPSV1A3oB/8w5D8Dngaci4sOIWA5MBI4q1sYqoiiUA0mDgJ2Ap/JNkmRNNM8C84FHIqIkcgH/C4wCVuYdpIEAHpY0VdJpeYfJbAEsAK7PmtuukdQ771ANHAeMzTsEQES8CfwS+AcwF3gvIh7ONxUALwJflrShpF7AIcDAYm3MRaEESFobuAs4OyIW550HICJWRMQXgQHArtkubK4kHQrMj4ipeWdpxF4RsTNwMHBG1mSZt27AzsCVEbET8AFwfr6RVsmasw4H7sg7C4Ck9YEjSMV0U6C3pBH5poKImAH8N/AwqenoWWBFsbbnopCzrM3+LmBMRNydd56GsuaGCcBBeWcB9gQOz9rvbwP2lXRLvpGS7FcmETEf+D2p/Tdvc4A59fby7iQViVJxMDAtIublHSSzH/BaRCyIiGXA3cAeOWcCICKujYhdImJv4B3g/4q1LReFHGUdutcCMyLiV3nnqSWpStJ62f2ewP7AzHxTQURcEBEDImIQqdnhLxGR+y85Sb2zAwXImmcOIO3y5yoi3gLekLRNtmgYkOtBDA0cT4k0HWX+AewmqVf2f3MYqZ8vd5I2ym43I/Un3FqsbXUr1guXEkljgSFAX0lzgB9GxLX5pgLSL98a4IWs/R7gwogYn2MmgH7AjdmRIV2AcRFRMod/lqCNgd+n7xG6AbdGxIP5RqrzbWBM1lQzCzg55zxAXfHcH/hG3llqRcRTku4EpgHLgWconeEu7pK0IbAMOKOYBwxUxCGpZmZWGDcfmZlZHRcFMzOr46JgZmZ1XBTMzKyOi4KZmdVxUTBrB5IGldoovGZt4aJgZmZ1XBTM2pmkLbMB6AbnncWstSrijGazjpINKXEbcFJEPJd3HrPWclEwaz9VwL3AUXlfKMmsrdx8ZNZ+3iMNqrZX3kHM2sp7Cmbt5xPgq8BDkt6PiKKNZGlWLC4KZu0oIj7ILgb0SFYY7ss7k1lreJRUMzOr4z4FMzOr46JgZmZ1XBTMzKyOi4KZmdVxUTAzszouCmZmVsdFwczM6vw/q9uciIjZDX0AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "K=range(1,10)\n", "meandistortions=[]\n", "\n", "for k in K:\n", " kmeans=KMeans(n_clusters=k)\n", " kmeans.fit(X)\n", " meandistortions.append(\\\n", " sum(np.min(cdist(X,kmeans.cluster_centers_,'euclidean'),\\\n", " axis=1))/X.shape[0])\n", "\n", "plt.plot(K,meandistortions,'bx-')\n", "plt.xlabel('k')\n", "plt.ylabel('Average Dispersion')\n", "plt.title('Selecting k with the Elbow Method')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the figure above, in the process that cluster number decrease from 1 to 2 and 3, the change of K value can make a big difference to the whole cluster structure, which means new cluster number make algorithm have larger convergence space and this K means can not represent real cluster members. When K=3, if we increase K, the decrease speed of average distance are slow down obviously, which means that a further increase in K is no longer conducive to the convergence of the algorithm, at the same time, it also mplies that K=3 is the relative optimal number of class clusters\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext_formats": "ipynb,py", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }