| @@ -315,25 +315,25 @@ $\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; &{{\th | |||||
|  |  | ||||
| 当 $j = 0, j = 1$ 时,**线性回归中代价函数求导的推导过程:** | |||||
| $\frac{\partial}{\partial\theta_j} J(\theta_1, \theta_2)=\frac{\partial}{\partial\theta_j} \left(\frac{1}{2m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}^{2}} \right)=$ | |||||
| $\left(\frac{1}{2m}*2\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} =$ | |||||
| $\left(\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left(\theta_0{x_0^{(i)}} + \theta_1{x_1^{(i)}}-{{y}^{(i)}} \right)}}$ | |||||
| 所以当 $j = 0$ 时: | |||||
| $\frac{\partial}{\partial\theta_0} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_0^{(i)}$ | |||||
| 所以当 $j = 1$ 时: | |||||
| $\frac{\partial}{\partial\theta_1} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_1^{(i)}$ | |||||
| > 当 $j = 0, j = 1$ 时,**线性回归中代价函数求导的推导过程:** | |||||
| > | |||||
| > $\frac{\partial}{\partial\theta_j} J(\theta_1, \theta_2)=\frac{\partial}{\partial\theta_j} \left(\frac{1}{2m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}^{2}} \right)=$ | |||||
| > | |||||
| > $\left(\frac{1}{2m}*2\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} =$ | |||||
| > | |||||
| > $\left(\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} \right)*\frac{\partial}{\partial\theta_j}{{\left(\theta_0{x_0^{(i)}} + \theta_1{x_1^{(i)}}-{{y}^{(i)}} \right)}}$ | |||||
| > | |||||
| > 所以当 $j = 0$ 时: | |||||
| > | |||||
| > $\frac{\partial}{\partial\theta_0} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_0^{(i)}$ | |||||
| > | |||||
| > 所以当 $j = 1$ 时: | |||||
| > | |||||
| > $\frac{\partial}{\partial\theta_1} J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m}{{\left( {{h}_{\theta }}\left( {{x}^{(i)}} \right)-{{y}^{(i)}} \right)}} *x_1^{(i)}$ | |||||
| 上文中所提到的梯度下降,都为批量梯度下降(Batch Gradient Descent),即每次计算都使用**所有**的数据集 $\left(\sum\limits_{i=1}^{m}\right)$ 更新。 | |||||
| 上文中所提到的梯度下降,都为批量梯度下降(Batch Gradient Descent),即每次计算都使用**所有**的数据集 $\left(\sum\limits_{i=1}^{m}\right)$ 更新。 | |||||
| 由于线性回归函数呈现**碗状**,且**只有一个**全局的最优值,所以函数**一定总会**收敛到全局最小值(学习速率不可过大)。同时,函数 $J$ 被称为**凸二次函数**,而线性回归函数求解最小值问题属于**凸函数优化问题**。 | 由于线性回归函数呈现**碗状**,且**只有一个**全局的最优值,所以函数**一定总会**收敛到全局最小值(学习速率不可过大)。同时,函数 $J$ 被称为**凸二次函数**,而线性回归函数求解最小值问题属于**凸函数优化问题**。 | ||||
| @@ -147,33 +147,33 @@ $$ | |||||
| [^1]: 一般来说,当 $n$ 超过 10000 时,对于正规方程而言,特征量较大。 | [^1]: 一般来说,当 $n$ 超过 10000 时,对于正规方程而言,特征量较大。 | ||||
| [^2]: 梯度下降算法的普适性好,而对于特定的线性回归模型,正规方程是很好的替代品。 | [^2]: 梯度下降算法的普适性好,而对于特定的线性回归模型,正规方程是很好的替代品。 | ||||
| **正规方程法的推导过程**: | |||||
| $\begin{aligned} & J\left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{\left( {h_{\theta}}\left( {x^{(i)}} \right)-{y^{(i)}} \right)}^{2}}}\newline \; & =\frac{1}{2m}||X\theta-y||^2 \newline \; & =\frac{1}{2m}(X\theta-y)^T(X\theta-y) &\newline \end{aligned}$ | |||||
| 展开上式可得 | |||||
| $J(\theta )= \frac{1}{2m}\left( {{\theta }^{T}}{{X}^{T}}X\theta -{{\theta}^{T}}{{X}^{T}}y-{{y}^{T}}X\theta + {{y}^{T}}y \right)$ | |||||
| 注意到 ${{\theta}^{T}}{{X}^{T}}y$ 与 ${{y}^{T}}X\theta$ 都为标量,实际上是等价的,则 | |||||
| $J(\theta) = \frac{1}{2m}[X^TX\theta-2\theta^TX^Ty+y^Ty]$ | |||||
| 接下来对$J(\theta )$ 求偏导,根据矩阵的求导法则: | |||||
| $\frac{dX^TAX}{dX}=(A+A^\mathrm{T})X$ | |||||
| $\frac{dX^TA}{dX}={A}$ | |||||
| 所以有: | |||||
| $\frac{\partial J\left( \theta \right)}{\partial \theta }=\frac{1}{2m}\left(2{{X}^{T}}X\theta -2{{X}^{T}}y \right)={{X}^{T}}X\theta -{{X}^{T}}y$ | |||||
| 令$\frac{\partial J\left( \theta \right)}{\partial \theta }=0$, 则有 | |||||
| $$ | |||||
| \theta ={{\left( {X^{T}}X \right)}^{-1}}{X^{T}}y | |||||
| $$ | |||||
| > **正规方程法的推导过程**: | |||||
| > | |||||
| > $\begin{aligned} & J\left( \theta \right)=\frac{1}{2m}\sum\limits_{i=1}^{m}{{{\left( {h_{\theta}}\left( {x^{(i)}} \right)-{y^{(i)}} \right)}^{2}}}\newline \; & =\frac{1}{2m}||X\theta-y||^2 \newline \; & =\frac{1}{2m}(X\theta-y)^T(X\theta-y) &\newline \end{aligned}$ | |||||
| > | |||||
| > 展开上式可得 | |||||
| > | |||||
| > $J(\theta )= \frac{1}{2m}\left( {{\theta }^{T}}{{X}^{T}}X\theta -{{\theta}^{T}}{{X}^{T}}y-{{y}^{T}}X\theta + {{y}^{T}}y \right)$ | |||||
| > | |||||
| > 注意到 ${{\theta}^{T}}{{X}^{T}}y$ 与 ${{y}^{T}}X\theta$ 都为标量,实际上是等价的,则 | |||||
| > | |||||
| > $J(\theta) = \frac{1}{2m}[X^TX\theta-2\theta^TX^Ty+y^Ty]$ | |||||
| > | |||||
| > 接下来对$J(\theta )$ 求偏导,根据矩阵的求导法则: | |||||
| > | |||||
| > $\frac{dX^TAX}{dX}=(A+A^\mathrm{T})X$ | |||||
| > | |||||
| > $\frac{dX^TA}{dX}={A}$ | |||||
| > | |||||
| > 所以有: | |||||
| > | |||||
| > $\frac{\partial J\left( \theta \right)}{\partial \theta }=\frac{1}{2m}\left(2{{X}^{T}}X\theta -2{{X}^{T}}y \right)={{X}^{T}}X\theta -{{X}^{T}}y$ | |||||
| > | |||||
| > 令$\frac{\partial J\left( \theta \right)}{\partial \theta }=0$, 则有 | |||||
| > $$ | |||||
| > \theta ={{\left( {X^{T}}X \right)}^{-1}}{X^{T}}y | |||||
| > $$ | |||||
| > | |||||
| ## 4.7 不可逆性正规方程(Normal Equation Noninvertibility) | ## 4.7 不可逆性正规方程(Normal Equation Noninvertibility) | ||||
| @@ -167,39 +167,39 @@ $\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \thet | |||||
| **逻辑回归中代价函数求导的推导过程:** | |||||
| $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$ | |||||
| 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | |||||
| 将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||||
| $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | |||||
| $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ | |||||
| 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | |||||
| $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||||
| 所以有: | |||||
| $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ | |||||
| $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ | |||||
| $={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$ | |||||
| $={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | |||||
| $={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||||
| 则可得代价函数的导数: | |||||
| $\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} $ | |||||
| > **逻辑回归中代价函数求导的推导过程:** | |||||
| > | |||||
| > $J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$ | |||||
| > | |||||
| > 令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ | |||||
| > | |||||
| > 将 $h_\theta(x^{(i)}) = g\left(\theta^{T}x^{(i)} \right)=\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} $ 带入得 | |||||
| > | |||||
| > $f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}} \right)$ | |||||
| > $=-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^T}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^T}{{x}^{(i)}}}} \right)$ | |||||
| > | |||||
| > 根据求偏导的性质,没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: | |||||
| > | |||||
| > $\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ | |||||
| > | |||||
| > 所以有: | |||||
| > | |||||
| > $\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{{\theta^{T}}{{x}^{(i)}}}} \right)]$ | |||||
| > | |||||
| > $=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)\cdot{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| > | |||||
| > $=-{{y}^{(i)}}\frac{-x_{j}^{(i)}{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}{1+{{e}^{-{\theta^{T}}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| > $={{y}^{(i)}}\frac{x_j^{(i)}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}-\left( 1-{{y}^{(i)}} \right)\frac{x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}$ | |||||
| > $={\frac{{{y}^{(i)}}x_j^{(i)}-x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}+{{y}^{(i)}}x_j^{(i)}{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}}$ | |||||
| > $={\frac{{{y}^{(i)}}\left( 1\text{+}{{e}^{{\theta^T}{{x}^{(i)}}}} \right)-{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}}x_j^{(i)}}$ | |||||
| > $={({{y}^{(i)}}-\frac{{{e}^{{\theta^T}{{x}^{(i)}}}}}{1+{{e}^{{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| > $={({{y}^{(i)}}-\frac{1}{1+{{e}^{-{\theta^T}{{x}^{(i)}}}}})x_j^{(i)}}$ | |||||
| > $={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ | |||||
| > $={\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ | |||||
| > | |||||
| > 则可得代价函数的导数: | |||||
| > | |||||
| > $\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} $ | |||||
| ## 6.6 进阶优化(Advanced Optimization) | ## 6.6 进阶优化(Advanced Optimization) | ||||
| @@ -50,8 +50,8 @@ BrainPort 系统:帮助失明人士通过摄像头以及舌尖感官“看” | |||||
| 下面列出一些已有概念在神经网络中的别称: | 下面列出一些已有概念在神经网络中的别称: | ||||
| - $x_0$: 偏置单元(bias unit),$x_0$=1 | |||||
| - $\theta$: 权重(weight),即参数。 | |||||
| - $x_0$: 偏置单元(bias unit),$x_0$=1 | |||||
| - $\Theta$: 权重(weight),即参数。 | |||||
| - 激活函数: $g$,即逻辑函数等。 | - 激活函数: $g$,即逻辑函数等。 | ||||
| - 输入层: 对应于训练集中的特征 $x$。 | - 输入层: 对应于训练集中的特征 $x$。 | ||||
| - 输出层: 对应于训练集中的结果 $y$。 | - 输出层: 对应于训练集中的结果 $y$。 | ||||
| @@ -108,7 +108,7 @@ ${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2 | |||||
| 定义 $a^{(1)}=x=\left[ \begin{matrix}x_0\\ x_1 \\ x_2 \\ x_3 \end{matrix} \right]$,$\Theta^{(1)}=\left[\begin{matrix}\Theta^{(1)}_{10}& \Theta^{(1)}_{11}& \Theta^{(1)}_{12}& \Theta^{(1)}_{13}\\ \Theta^{(1)}_{20}& \Theta^{(1)}_{21}& \Theta^{(1)}_{22}& \Theta^{(1)}_{23}\\ \Theta^{(1)}_{30}& \Theta^{(1)}_{31}& \Theta^{(1)}_{32} & \Theta^{(1)}_{33}\end{matrix}\right]$, | 定义 $a^{(1)}=x=\left[ \begin{matrix}x_0\\ x_1 \\ x_2 \\ x_3 \end{matrix} \right]$,$\Theta^{(1)}=\left[\begin{matrix}\Theta^{(1)}_{10}& \Theta^{(1)}_{11}& \Theta^{(1)}_{12}& \Theta^{(1)}_{13}\\ \Theta^{(1)}_{20}& \Theta^{(1)}_{21}& \Theta^{(1)}_{22}& \Theta^{(1)}_{23}\\ \Theta^{(1)}_{30}& \Theta^{(1)}_{31}& \Theta^{(1)}_{32} & \Theta^{(1)}_{33}\end{matrix}\right]$, | ||||
| $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}$,$z^{(2)}=\left[ \begin{matrix}z_1^{(2)}\\ z_1^{(2)} \\ z_1^{(2)}\end{matrix} \right]$ | |||||
| $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}$,$z^{(2)}=\left[ \begin{matrix}z_1^{(2)}\\ z_1^{(2)} \\ z_1^{(2)}\end{matrix} \right]$ | |||||
| 则有 $a^{(2)}= g(\Theta^{(1)}a^{(1)})=g(z^{(2)})$ | 则有 $a^{(2)}= g(\Theta^{(1)}a^{(1)})=g(z^{(2)})$ | ||||
| @@ -120,8 +120,6 @@ $\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newli | |||||
| $z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$,$a^{(j)} = g(z^{(j)})$,通过该式即可计算神经网络中每一层的值。 | $z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$,$a^{(j)} = g(z^{(j)})$,通过该式即可计算神经网络中每一层的值。 | ||||
| 结果即 $h_\Theta(x) = a^{(j)} = g(\Theta^{(j-1)}a^{(j-1)}) = g(z^{(j)})$ | |||||
| 扩展到所有样本实例: | 扩展到所有样本实例: | ||||
| ${{z}^{\left( 2 \right)}}={{\Theta }^{\left( 1 \right)}} {{X}^{T}}$,这时 $z^{(2)}$ 是一个 $s_2 \times m$ 维矩阵。 | ${{z}^{\left( 2 \right)}}={{\Theta }^{\left( 1 \right)}} {{X}^{T}}$,这时 $z^{(2)}$ 是一个 $s_2 \times m$ 维矩阵。 | ||||