Browse Source

optimize

master
scruel 7 years ago
parent
commit
ea3b38e99a
4 changed files with 21 additions and 13 deletions
  1. +11
    -11
      week3.html
  2. +3
    -1
      week3.md
  3. +1
    -1
      week4.html
  4. +6
    -0
      week4.md

+ 11
- 11
week3.html
File diff suppressed because it is too large
View File


+ 3
- 1
week3.md View File

@@ -178,7 +178,7 @@ $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^
$f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$
$=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$


根据求偏导的性质,没有 $\theta_j$ 的项都消去,则得:
根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得:


$\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$


@@ -186,6 +186,8 @@ $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_


$\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$


$=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$

$=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$
$={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$
$={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$


+ 1
- 1
week4.html
File diff suppressed because it is too large
View File


+ 6
- 0
week4.md View File

@@ -176,6 +176,12 @@ $\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \newline 10 & -20 & -20\end{bmatrix}


可见,特征值能不断升级,并抽取出更多信息,直到计算出结果。而如此不断组合,我们就可以逐渐构造出越来越复杂、强大的神经网络,比如用于手写识别的神经网络。 可见,特征值能不断升级,并抽取出更多信息,直到计算出结果。而如此不断组合,我们就可以逐渐构造出越来越复杂、强大的神经网络,比如用于手写识别的神经网络。


> 任何布尔函数都可由两层神经网络准确表达,但所需的中间单元的数量随输入呈指数级增长;
>
> 任何连续函数都可由两层神经网络以任意精度逼近;
>
> 任何函数都可由三层神经网络以任意程度逼近。

## 8.7 多类别分类(Multiclass Classification) ## 8.7 多类别分类(Multiclass Classification)


之前讨论的都是预测结果为单值情况下的神经网络,要实现多类别分类,其实只要修改一下输出层,让输出层包含多个输出单元即可。 之前讨论的都是预测结果为单值情况下的神经网络,要实现多类别分类,其实只要修改一下输出层,让输出层包含多个输出单元即可。


Loading…
Cancel
Save