| @@ -220,7 +220,8 @@ header, .context-menu, .megamenu-content, footer { font-family: "Segoe UI", Aria | |||||
| </style> | </style> | ||||
| </head> | </head> | ||||
| <body class='typora-export' > | <body class='typora-export' > | ||||
| <div id='write' class = 'is-node show-fences-line-number'><h1><a name='header-n284' class='md-header-anchor '></a>吴恩达(Andrew Ng)机器学习公开课中文笔记</h1><p><a href='https://github.com/scruel/ML-AndrewNg-Notes/'>GitHub 项目首页</a></p><p> </p><p><a href='./week1.html'>week1</a></p><ol start='' ><li>引言(Introduction)</li><li>单变量线性回归(Linear Regression with One Variable)</li></ol><p><a href='./week2.html'>week2</a></p><ol start='3' ><li>线性代数回顾(Linear Algebra Review)</li><li>多变量线性回归(Linear Regression with Multiple Variables)</li><li>Octave/Matlab 指南(Octave/Matlab Tutorial)</li></ol><p><a href='./week3.html'>week3</a></p><ol start='6' ><li>逻辑回归(Logistic Regression)</li><li>正则化(Regularization)</li></ol><p><a href='./week4.html'>week4</a></p><ol start='8' ><li>神经网络:表达(Neural Networks: Representation)</li></ol><p><a href='./week5.html'>week5</a></p><ol start='9' ><li>神经网络:学习(Neural Networks: Learning)</li></ol><p><a href='./week6.html'>week6</a></p><ol start='10' ><li>机器学习应用的建议(Advice for Applying Machine Learning)</li><li>机器学习系统设计(Machine Learning System Design)</li></ol><p><a href='./week7.html'>week7</a></p><ol start='12' ><li>支持向量机(Support Vector Machines)</li></ol><p><a href='./week8.html'>week8</a></p><ol start='13' ><li>无监督学习(Unsupervised Learning)</li><li>降维(Dimensionality Reduction)</li></ol><p><a href='./week9.html'>week9</a></p><ol start='15' ><li>异常检测(Anomaly Detection)</li><li>推荐系统(Recommender Systems)</li></ol><p><a href='./week10.html'>week10</a></p><ol start='17' ><li>大规模机器学习(Large Scale Machine Learning)</li></ol><p><a href='./week11.html'>week11</a></p><ol start='18' ><li>实战:图像光学识别(Application Example: Photo OCR)</li></ol><p> </p><p> </p><p><a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'><img src='https://i.creativecommons.org/l/by-nc/4.0/88x31.png' alt='Creative Commons License' /></a></p><p>This work is licensed under a <a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'>Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p><p> </p><p><a href='https://zhuanlan.zhihu.com/p/32781741'>知乎文章</a></p><p>By: Scruel</p><p> </p><p><div style="display:none"> | |||||
| <div id='write' class = 'is-node show-fences-line-number'><h1><a name='header-n136' class='md-header-anchor '></a>吴恩达(Andrew Ng)机器学习公开课中文笔记</h1><p><a href='https://github.com/scruel/ML-AndrewNg-Notes/'>GitHub 项目首页</a></p><p> </p><p><a href='./week1.html'>week1</a></p><ol start='' ><li>引言(Introduction)</li><li>单变量线性回归(Linear Regression with One Variable)</li></ol><p><a href='./week2.html'>week2</a></p><ol start='3' ><li>线性代数回顾(Linear Algebra Review)</li><li>多变量线性回归(Linear Regression with Multiple Variables)</li><li>Octave/Matlab 指南(Octave/Matlab Tutorial)</li></ol><p><a href='./week3.html'>week3</a></p><ol start='6' ><li>逻辑回归(Logistic Regression)</li><li>正则化(Regularization)</li></ol><p><a href='./week4.html'>week4</a></p><ol start='8' ><li>神经网络:表达(Neural Networks: Representation)</li></ol><p><a href='./week5.html'>week5</a></p><ol start='9' ><li>神经网络:学习(Neural Networks: Learning)</li></ol><p><a href='./week6.html'>week6</a></p><ol start='10' ><li>机器学习应用的建议(Advice for Applying Machine Learning)</li><li>机器学习系统设计(Machine Learning System Design)</li></ol><p><a href='./week7.html'>week7</a></p><ol start='12' ><li>支持向量机(Support Vector Machines)</li></ol><p><a href='./week8.html'>week8</a></p><ol start='13' ><li>无监督学习(Unsupervised Learning)</li><li>降维(Dimensionality Reduction)</li></ol><p><a href='./week9.html'>week9</a></p><ol start='15' ><li>异常检测(Anomaly Detection)</li><li>推荐系统(Recommender Systems)</li></ol><p><a href='./week10.html'>week10</a></p><ol start='17' ><li>大规模机器学习(Large Scale Machine Learning)</li></ol><p><a href='./week11.html'>week11</a></p><ol start='18' ><li>实战:图像光学识别(Application Example: Photo OCR)</li></ol><p> </p><p> </p><p><a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'><img src='https://i.creativecommons.org/l/by-nc/4.0/88x31.png' alt='Creative Commons License' /></a></p><p>This work is licensed under a <a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'>Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p><p> </p><p><a href='https://zhuanlan.zhihu.com/p/32781741'>知乎文章</a></p><p>By: Scruel</p><p> </p><p><div style="display:none"> | |||||
| <script>document.title = document.body.firstElementChild.firstElementChild.innerText</script> | |||||
| <script src="https://s19.cnzz.com/z_stat.php?id=1272117433&web_id=1272117433" language="JavaScript"></script> | <script src="https://s19.cnzz.com/z_stat.php?id=1272117433&web_id=1272117433" language="JavaScript"></script> | ||||
| </div></p></div> | </div></p></div> | ||||
| </body> | </body> | ||||
| @@ -79,9 +79,9 @@ $$ | |||||
| 除了以上图人工选择并除以一个参数的方式,**均值归一化(Mean normalization)**方法更为便捷,可采用它来对所有特征值统一缩放: | 除了以上图人工选择并除以一个参数的方式,**均值归一化(Mean normalization)**方法更为便捷,可采用它来对所有特征值统一缩放: | ||||
| $x_i=\frac{x_i-average(x)}{maximum(x)-minimum(x)}, 使得 $ $x_i \in (-1,1)$ | |||||
| $x_i:=\frac{x_i-average(x)}{maximum(x)-minimum(x)}, 使得 $ $x_i \in (-1,1)$ | |||||
| 对于特征的范围,并不一定需要使得 $-1 \leqslant x \leqslant 1$,类似于 $1\leqslant x \leqslant 3$ 等也是可取的,而诸如 $-100 \leqslant x \leqslant 100 $,$-0.00001 \leqslant x \leqslant 0.00001$,就显得过大/过小了。 | |||||
| 对于特征的范围,并不一定需要使得 $-1 \leqslant x \leqslant 1$,类似于 $1\leqslant x \leqslant 3$ 等也是可取的,而诸如 $-100 \leqslant x \leqslant 100 $,$-0.00001 \leqslant x \leqslant 0.00001$,就显得过大/过小了。 | |||||
| 另外注意,一旦采用特征缩放,我们就需对所有的输入采用特征缩放,包括训练集、测试集、预测输入等。 | 另外注意,一旦采用特征缩放,我们就需对所有的输入采用特征缩放,包括训练集、测试集、预测输入等。 | ||||
| @@ -173,29 +173,28 @@ $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^ | |||||
| 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | ||||
| 将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||||
| 忆及 $z=\Theta^Tx^{(i)}$,将 $h_\theta(x^{(i)}) = g(z) =\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||||
| $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | ||||
| $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ | |||||
| $=-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)$ | |||||
| 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | ||||
| $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||||
| $\frac{\partial z}{\partial {\theta_{j}}}=\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||||
| 所以有: | 所以有: | ||||
| $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ | |||||
| $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)]$ | |||||
| $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left(-z \right) e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left(z \right){e^{z}}}{1+e^{z}}$ | |||||
| $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ | |||||
| $={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$ | |||||
| $={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| $=-{{y}^{(i)}}\frac{-x^{(i)}_je^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{x^{(i)}_j}{1+e^{-z}}$ | |||||
| $=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$ | |||||
| $=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$ | |||||
| $=\left(\frac{{{y}^{(i)}}(e^{-z}+1)-1}{1+e^{-z}}\right)x^{(i)}_j$ | |||||
| $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-z}}})x_j^{(i)}}$ | |||||
| $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | ||||
| $={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||||
| $=-{\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||||
| 则可得代价函数的导数: | 则可得代价函数的导数: | ||||
| @@ -100,7 +100,7 @@ $h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{ | |||||
| ${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}+{\theta_{3}}x_3 \right)$ | ${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}+{\theta_{3}}x_3 \right)$ | ||||
| 是不是除了符号表示,其他都完全一样?其实神经网络就好似回归模型,只不过输入变成了中间单元 $a_1^{(j)}, a_2^{(j)}, \dots, a_n^{(j)}$。从输入 $x$ 开始,下一层的每个激活单元都包含了上一层的所有信息(单元值),通过最优化算法不断迭代计算,激活单元能得出关于输入 $x$ 的更多信息,这就好像是在给假设函数加多项式。中间层的这些单元好似升级版的初始特征,从而能给出更好的预测。 | |||||
| 是不是除了符号表示,其他都完全一样?其实神经网络就好似回归模型,只不过输入变成了中间单元 $a_1^{(j)}, a_2^{(j)}, \dots, a_n^{(j)}$。从输入 $x$ 开始,下一层的每个激活单元都包含了上一层的所有信息(单元值),通过最优化算法不断迭代计算,激活单元能得出关于输入 $x$ 的更多信息,这就好像是在给假设函数加多项式。隐藏层的这些单元好似升级版的初始特征,从而能给出更好的预测。 | |||||