Browse Source

optimize

fix and optimize the derivation of the cost function of week3
master
scruel 8 years ago
parent
commit
a6b0edaf6d
9 changed files with 29 additions and 30 deletions
  1. BIN
      image/20180120_105744.png
  2. BIN
      image/20180121_110111.png
  3. +2
    -1
      index.html
  4. +2
    -2
      week2.html
  5. +2
    -2
      week2.md
  6. +10
    -11
      week3.html
  7. +11
    -12
      week3.md
  8. +1
    -1
      week4.html
  9. +1
    -1
      week4.md

BIN
image/20180120_105744.png View File

Before After
Width: 653  |  Height: 317  |  Size: 74 kB

BIN
image/20180121_110111.png View File

Before After
Width: 625  |  Height: 331  |  Size: 76 kB

+ 2
- 1
index.html View File

@@ -220,7 +220,8 @@ header, .context-menu, .megamenu-content, footer { font-family: "Segoe UI", Aria
</style>
</head>
<body class='typora-export' >
<div id='write' class = 'is-node show-fences-line-number'><h1><a name='header-n284' class='md-header-anchor '></a>吴恩达(Andrew Ng)机器学习公开课中文笔记</h1><p><a href='https://github.com/scruel/ML-AndrewNg-Notes/'>GitHub 项目首页</a></p><p>&nbsp;</p><p><a href='./week1.html'>week1</a></p><ol start='' ><li>引言(Introduction)</li><li>单变量线性回归(Linear Regression with One Variable)</li></ol><p><a href='./week2.html'>week2</a></p><ol start='3' ><li>线性代数回顾(Linear Algebra Review)</li><li>多变量线性回归(Linear Regression with Multiple Variables)</li><li>Octave/Matlab 指南(Octave/Matlab Tutorial)</li></ol><p><a href='./week3.html'>week3</a></p><ol start='6' ><li>逻辑回归(Logistic Regression)</li><li>正则化(Regularization)</li></ol><p><a href='./week4.html'>week4</a></p><ol start='8' ><li>神经网络:表达(Neural Networks: Representation)</li></ol><p><a href='./week5.html'>week5</a></p><ol start='9' ><li>神经网络:学习(Neural Networks: Learning)</li></ol><p><a href='./week6.html'>week6</a></p><ol start='10' ><li>机器学习应用的建议(Advice for Applying Machine Learning)</li><li>机器学习系统设计(Machine Learning System Design)</li></ol><p><a href='./week7.html'>week7</a></p><ol start='12' ><li>支持向量机(Support Vector Machines)</li></ol><p><a href='./week8.html'>week8</a></p><ol start='13' ><li>无监督学习(Unsupervised Learning)</li><li>降维(Dimensionality Reduction)</li></ol><p><a href='./week9.html'>week9</a></p><ol start='15' ><li>异常检测(Anomaly Detection)</li><li>推荐系统(Recommender Systems)</li></ol><p><a href='./week10.html'>week10</a></p><ol start='17' ><li>大规模机器学习(Large Scale Machine Learning)</li></ol><p><a href='./week11.html'>week11</a></p><ol start='18' ><li>实战:图像光学识别(Application Example: Photo OCR)</li></ol><p>&nbsp;</p><p>&nbsp;</p><p><a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'><img src='https://i.creativecommons.org/l/by-nc/4.0/88x31.png' alt='Creative Commons License' /></a></p><p>This work is licensed under a <a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'>Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p><p>&nbsp;</p><p><a href='https://zhuanlan.zhihu.com/p/32781741'>知乎文章</a></p><p>By: Scruel</p><p>&nbsp;</p><p><div style="display:none">
<div id='write' class = 'is-node show-fences-line-number'><h1><a name='header-n136' class='md-header-anchor '></a>吴恩达(Andrew Ng)机器学习公开课中文笔记</h1><p><a href='https://github.com/scruel/ML-AndrewNg-Notes/'>GitHub 项目首页</a></p><p>&nbsp;</p><p><a href='./week1.html'>week1</a></p><ol start='' ><li>引言(Introduction)</li><li>单变量线性回归(Linear Regression with One Variable)</li></ol><p><a href='./week2.html'>week2</a></p><ol start='3' ><li>线性代数回顾(Linear Algebra Review)</li><li>多变量线性回归(Linear Regression with Multiple Variables)</li><li>Octave/Matlab 指南(Octave/Matlab Tutorial)</li></ol><p><a href='./week3.html'>week3</a></p><ol start='6' ><li>逻辑回归(Logistic Regression)</li><li>正则化(Regularization)</li></ol><p><a href='./week4.html'>week4</a></p><ol start='8' ><li>神经网络:表达(Neural Networks: Representation)</li></ol><p><a href='./week5.html'>week5</a></p><ol start='9' ><li>神经网络:学习(Neural Networks: Learning)</li></ol><p><a href='./week6.html'>week6</a></p><ol start='10' ><li>机器学习应用的建议(Advice for Applying Machine Learning)</li><li>机器学习系统设计(Machine Learning System Design)</li></ol><p><a href='./week7.html'>week7</a></p><ol start='12' ><li>支持向量机(Support Vector Machines)</li></ol><p><a href='./week8.html'>week8</a></p><ol start='13' ><li>无监督学习(Unsupervised Learning)</li><li>降维(Dimensionality Reduction)</li></ol><p><a href='./week9.html'>week9</a></p><ol start='15' ><li>异常检测(Anomaly Detection)</li><li>推荐系统(Recommender Systems)</li></ol><p><a href='./week10.html'>week10</a></p><ol start='17' ><li>大规模机器学习(Large Scale Machine Learning)</li></ol><p><a href='./week11.html'>week11</a></p><ol start='18' ><li>实战:图像光学识别(Application Example: Photo OCR)</li></ol><p>&nbsp;</p><p>&nbsp;</p><p><a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'><img src='https://i.creativecommons.org/l/by-nc/4.0/88x31.png' alt='Creative Commons License' /></a></p><p>This work is licensed under a <a href='http://creativecommons.org/licenses/by-nc/4.0/' target='_blank'>Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p><p>&nbsp;</p><p><a href='https://zhuanlan.zhihu.com/p/32781741'>知乎文章</a></p><p>By: Scruel</p><p>&nbsp;</p><p><div style="display:none">
<script>document.title = document.body.firstElementChild.firstElementChild.innerText</script>
<script src="https://s19.cnzz.com/z_stat.php?id=1272117433&web_id=1272117433" language="JavaScript"></script>
</div></p></div>
</body>

+ 2
- 2
week2.html
File diff suppressed because it is too large
View File


+ 2
- 2
week2.md View File

@@ -79,9 +79,9 @@ $$

除了以上图人工选择并除以一个参数的方式,**均值归一化(Mean normalization)**方法更为便捷,可采用它来对所有特征值统一缩放:

$x_i=\frac{x_i-average(x)}{maximum(x)-minimum(x)}, 使得 $ $x_i \in (-1,1)$
$x_i:=\frac{x_i-average(x)}{maximum(x)-minimum(x)}, 使得 $ $x_i \in (-1,1)$

对于特征的范围,并不一定需要使得 $-1 \leqslant x \leqslant 1$,类似于 $1\leqslant x \leqslant 3$ 等也是可取的,而诸如 $-100 \leqslant x \leqslant 100 $,$-0.00001 \leqslant x \leqslant 0.00001$,就显得过大/过小了。
对于特征的范围,并不一定需要使得 $-1 \leqslant x \leqslant 1$,类似于 $1\leqslant x \leqslant 3$ 等也是可取的,而诸如 $-100 \leqslant x \leqslant 100 $,$-0.00001 \leqslant x \leqslant 0.00001$,就显得过大/过小了。

另外注意,一旦采用特征缩放,我们就需对所有的输入采用特征缩放,包括训练集、测试集、预测输入等。



+ 10
- 11
week3.html
File diff suppressed because it is too large
View File


+ 11
- 12
week3.md View File

@@ -173,29 +173,28 @@ $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^

令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$

将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得
忆及 $z=\Theta^Tx^{(i)}$,将 $h_\theta(x^{(i)}) = g(z) =\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得

$f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$
$=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$
$=-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)$

根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得:

$\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$
$\frac{\partial z}{\partial {\theta_{j}}}=\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$

所以有:

$\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$
$\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)]$

$=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$
$=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left(-z \right) e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left(z \right){e^{z}}}{1+e^{z}}$

$=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$
$={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$
$={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$
$={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$
$={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$
$={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$
$=-{{y}^{(i)}}\frac{-x^{(i)}_je^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{x^{(i)}_j}{1+e^{-z}}$
$=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$
$=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$
$=\left(\frac{{{y}^{(i)}}(e^{-z}+1)-1}{1+e^{-z}}\right)x^{(i)}_j$
$={({{y}^{(i)}}-\frac{1}{1+{{e}^{-z}}})x_j^{(i)}}$
$={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$
$={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$
$=-{\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$

则可得代价函数的导数:



+ 1
- 1
week4.html
File diff suppressed because it is too large
View File


+ 1
- 1
week4.md View File

@@ -100,7 +100,7 @@ $h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{

${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}+{\theta_{3}}x_3 \right)$

是不是除了符号表示,其他都完全一样?其实神经网络就好似回归模型,只不过输入变成了中间单元 $a_1^{(j)}, a_2^{(j)}, \dots, a_n^{(j)}$。从输入 $x$ 开始,下一层的每个激活单元都包含了上一层的所有信息(单元值),通过最优化算法不断迭代计算,激活单元能得出关于输入 $x$ 的更多信息,这就好像是在给假设函数加多项式。中间层的这些单元好似升级版的初始特征,从而能给出更好的预测。
是不是除了符号表示,其他都完全一样?其实神经网络就好似回归模型,只不过输入变成了中间单元 $a_1^{(j)}, a_2^{(j)}, \dots, a_n^{(j)}$。从输入 $x$ 开始,下一层的每个激活单元都包含了上一层的所有信息(单元值),通过最优化算法不断迭代计算,激活单元能得出关于输入 $x$ 的更多信息,这就好像是在给假设函数加多项式。隐藏层的这些单元好似升级版的初始特征,从而能给出更好的预测。





Loading…
Cancel
Save