| @@ -315,25 +315,25 @@ $\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; &{{\th | |||
|  | |||
| 当 $j = 0, j = 1$ 时,**线性回归中代价函数求导的推导过程:** | |||
| $\frac{\partial}{\partial\theta_j} J(\theta_1, \theta_2)=\frac{\partial}{\partial\theta_j} \left(\frac{1}{2m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}^{2}} \right)=$ | |||
| $\left(\frac{1}{2m}*2\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} =$ | |||
| $\left(\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left(\theta_0{x_0^{(i)}} + \theta_1{x_1^{(i)}}-{{y}^{(i)}} \right)}}$ | |||
| 所以当 $j = 0$ 时: | |||
| $\frac{\partial}{\partial\theta_0} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_0^{(i)}$ | |||
| 所以当 $j = 1$ 时: | |||
| $\frac{\partial}{\partial\theta_1} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_1^{(i)}$ | |||
| > 当 $j = 0, j = 1$ 时,**线性回归中代价函数求导的推导过程:** | |||
| > | |||
| > $\frac{\partial}{\partial\theta_j} J(\theta_1, \theta_2)=\frac{\partial}{\partial\theta_j} \left(\frac{1}{2m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}^{2}} \right)=$ | |||
| > | |||
| > $\left(\frac{1}{2m}*2\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} =$ | |||
| > | |||
| > $\left(\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left(\theta_0{x_0^{(i)}} + \theta_1{x_1^{(i)}}-{{y}^{(i)}} \right)}}$ | |||
| > | |||
| > 所以当 $j = 0$ 时: | |||
| > | |||
| > $\frac{\partial}{\partial\theta_0} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_0^{(i)}$ | |||
| > | |||
| > 所以当 $j = 1$ 时: | |||
| > | |||
| > $\frac{\partial}{\partial\theta_1} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_1^{(i)}$ | |||
| 上文中所提到的梯度下降,都为批量梯度下降(Batch Gradient Descent),即每次计算都使用**所有**的数据集 $\left(\sum\limits_{i=1}^{m}\right)$ 更新。 | |||
| 上文中所提到的梯度下降,都为批量梯度下降(Batch Gradient Descent),即每次计算都使用**所有**的数据集 $\left(\sum\limits_{i=1}^{m}\right)$ 更新。 | |||
| 由于线性回归函数呈现**碗状**,且**只有一个**全局的最优值,所以函数**一定总会**收敛到全局最小值(学习速率不可过大)。同时,函数 $J$ 被称为**凸二次函数**,而线性回归函数求解最小值问题属于**凸函数优化问题**。 | |||
| @@ -147,33 +147,33 @@ $$ | |||
| [^1]: 一般来说,当 $n$ 超过 10000 时,对于正规方程而言,特征量较大。 | |||
| [^2]: 梯度下降算法的普适性好,而对于特定的线性回归模型,正规方程是很好的替代品。 | |||
| **正规方程法的推导过程**: | |||
| $\begin{aligned} & J\left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{\left( {h_{\theta}}\left( {x^{(i)}} \right)-{y^{(i)}} \right)}^{2}}}\newline \; & =\frac{1}{2m}||X\theta-y||^2 \newline \; & =\frac{1}{2m}(X\theta-y)^T(X\theta-y) &\newline \end{aligned}$ | |||
| 展开上式可得 | |||
| $J(\theta )= \frac{1}{2m}\left( {{\theta }^{T}}{{X}^{T}}X\theta -{{\theta}^{T}}{{X}^{T}}y-{{y}^{T}}X\theta + {{y}^{T}}y \right)$ | |||
| 注意到 ${{\theta}^{T}}{{X}^{T}}y$ 与 ${{y}^{T}}X\theta$ 都为标量,实际上是等价的,则 | |||
| $J(\theta) = \frac{1}{2m}[X^TX\theta-2\theta^TX^Ty+y^Ty]$ | |||
| 接下来对$J(\theta )$ 求偏导,根据矩阵的求导法则: | |||
| $\frac{dX^TAX}{dX}=(A+A^\mathrm{T})X$ | |||
| $\frac{dX^TA}{dX}={A}$ | |||
| 所以有: | |||
| $\frac{\partial J\left( \theta \right)}{\partial \theta }=\frac{1}{2m}\left(2{{X}^{T}}X\theta -2{{X}^{T}}y \right)={{X}^{T}}X\theta -{{X}^{T}}y$ | |||
| 令$\frac{\partial J\left( \theta \right)}{\partial \theta }=0$, 则有 | |||
| $$ | |||
| \theta ={{\left( {X^{T}}X \right)}^{-1}}{X^{T}}y | |||
| $$ | |||
| > **正规方程法的推导过程**: | |||
| > | |||
| > $\begin{aligned} & J\left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{\left( {h_{\theta}}\left( {x^{(i)}} \right)-{y^{(i)}} \right)}^{2}}}\newline \; & =\frac{1}{2m}||X\theta-y||^2 \newline \; & =\frac{1}{2m}(X\theta-y)^T(X\theta-y) &\newline \end{aligned}$ | |||
| > | |||
| > 展开上式可得 | |||
| > | |||
| > $J(\theta )= \frac{1}{2m}\left( {{\theta }^{T}}{{X}^{T}}X\theta -{{\theta}^{T}}{{X}^{T}}y-{{y}^{T}}X\theta + {{y}^{T}}y \right)$ | |||
| > | |||
| > 注意到 ${{\theta}^{T}}{{X}^{T}}y$ 与 ${{y}^{T}}X\theta$ 都为标量,实际上是等价的,则 | |||
| > | |||
| > $J(\theta) = \frac{1}{2m}[X^TX\theta-2\theta^TX^Ty+y^Ty]$ | |||
| > | |||
| > 接下来对$J(\theta )$ 求偏导,根据矩阵的求导法则: | |||
| > | |||
| > $\frac{dX^TAX}{dX}=(A+A^\mathrm{T})X$ | |||
| > | |||
| > $\frac{dX^TA}{dX}={A}$ | |||
| > | |||
| > 所以有: | |||
| > | |||
| > $\frac{\partial J\left( \theta \right)}{\partial \theta }=\frac{1}{2m}\left(2{{X}^{T}}X\theta -2{{X}^{T}}y \right)={{X}^{T}}X\theta -{{X}^{T}}y$ | |||
| > | |||
| > 令$\frac{\partial J\left( \theta \right)}{\partial \theta }=0$, 则有 | |||
| > $$ | |||
| > \theta ={{\left( {X^{T}}X \right)}^{-1}}{X^{T}}y | |||
| > $$ | |||
| > | |||
| ## 4.7 不可逆性正规方程(Normal Equation Noninvertibility) | |||
| @@ -167,39 +167,39 @@ $\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \thet | |||
| **逻辑回归中代价函数求导的推导过程:** | |||
| $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$ | |||
| 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | |||
| 将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||
| $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | |||
| $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ | |||
| 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | |||
| $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||
| 所以有: | |||
| $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ | |||
| $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ | |||
| $={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$ | |||
| $={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||
| $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||
| $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | |||
| $={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||
| 则可得代价函数的导数: | |||
| $\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} $ | |||
| > **逻辑回归中代价函数求导的推导过程:** | |||
| > | |||
| > $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$ | |||
| > | |||
| > 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | |||
| > | |||
| > 将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||
| > | |||
| > $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | |||
| > $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ | |||
| > | |||
| > 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | |||
| > | |||
| > $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||
| > | |||
| > 所以有: | |||
| > | |||
| > $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ | |||
| > | |||
| > $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| > | |||
| > $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| > $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||
| > $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ | |||
| > $={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$ | |||
| > $={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||
| > $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||
| > $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | |||
| > $={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||
| > | |||
| > 则可得代价函数的导数: | |||
| > | |||
| > $\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} $ | |||
| ## 6.6 进阶优化(Advanced Optimization) | |||
| @@ -50,8 +50,8 @@ BrainPort 系统:帮助失明人士通过摄像头以及舌尖感官“看” | |||
| 下面列出一些已有概念在神经网络中的别称: | |||
| - $x_0$: 偏置单元(bias unit),$x_0$=1 | |||
| - $\theta$: 权重(weight),即参数。 | |||
| - $x_0$: 偏置单元(bias unit),$x_0$=1 | |||
| - $\Theta$: 权重(weight),即参数。 | |||
| - 激活函数: $g$,即逻辑函数等。 | |||
| - 输入层: 对应于训练集中的特征 $x$。 | |||
| - 输出层: 对应于训练集中的结果 $y$。 | |||
| @@ -108,7 +108,7 @@ ${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2 | |||
| 定义 $a^{(1)}=x=\left[ \begin{matrix}x_0\\ x_1 \\ x_2 \\ x_3 \end{matrix} \right]$,$\Theta^{(1)}=\left[\begin{matrix}\Theta^{(1)}_{10}& \Theta^{(1)}_{11}& \Theta^{(1)}_{12}& \Theta^{(1)}_{13}\\ \Theta^{(1)}_{20}& \Theta^{(1)}_{21}& \Theta^{(1)}_{22}& \Theta^{(1)}_{23}\\ \Theta^{(1)}_{30}& \Theta^{(1)}_{31}& \Theta^{(1)}_{32} & \Theta^{(1)}_{33}\end{matrix}\right]$, | |||
| $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}$,$z^{(2)}=\left[ \begin{matrix}z_1^{(2)}\\ z_1^{(2)} \\ z_1^{(2)}\end{matrix} \right]$ | |||
| $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}$,$z^{(2)}=\left[ \begin{matrix}z_1^{(2)}\\ z_1^{(2)} \\ z_1^{(2)}\end{matrix} \right]$ | |||
| 则有 $a^{(2)}= g(\Theta^{(1)}a^{(1)})=g(z^{(2)})$ | |||
| @@ -120,8 +120,6 @@ $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newli | |||
| $z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$,$a^{(j)} = g(z^{(j)})$,通过该式即可计算神经网络中每一层的值。 | |||
| 结果即 $h_\Theta(x) = a^{(j)} = g(\Theta^{(j-1)}a^{(j-1)}) = g(z^{(j)})$ | |||
| 扩展到所有样本实例: | |||
| ${{z}^{\left( 2 \right)}}={{\Theta }^{\left( 1 \right)}} {{X}^{T}}$,这时 $z^{(2)}$ 是一个 $s_2 \times m$ 维矩阵。 | |||