|
|
|
@@ -52,10 +52,12 @@ $$ |
|
|
|
|
|
|
|
逻辑回归模型中,$h_\theta \left( x \right)$ 的作用是,根据输入 $x$ 以及参数 $\theta$,计算得出”输出 $y=1$“的可能性(estimated probability),概率学中表示为: |
|
|
|
|
|
|
|
$\begin{align*}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \newline & P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1\end{align*}$ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \\ |
|
|
|
& P(y = 0 | x;\theta) + P(y = 1 | x ; \theta) = 1 |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
以肿瘤诊断为例,$h_\theta \left( x \right)=0.7$ 表示病人有 $70\%$ 的概率得了恶性肿瘤。 |
|
|
|
|
|
|
|
[1]: https://en.wikipedia.org/wiki/Logistic_function |
|
|
|
@@ -69,17 +71,25 @@ $\begin{align*}& h_\theta(x) = P(y=1 | x ; \theta) = 1 - P(y=0 | x ; \theta) \ne |
|
|
|
|
|
|
|
为了得出分类的结果,这里和前面一样,规定以 $0.5$ 为阈值: |
|
|
|
|
|
|
|
|
|
|
|
$\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}$ |
|
|
|
|
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& h_\theta(x) \geq 0.5 \rightarrow y = 1 \\ |
|
|
|
& h_\theta(x) < 0.5 \rightarrow y = 0 \\ |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
回忆一下 sigmoid 函数的图像: |
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
观察可得当 $g(z) \geq 0.5$ 时,有 $z \geq 0$,即 $\theta^Tx \geq 0$。 |
|
|
|
|
|
|
|
同线性回归模型的不同点在于: $\begin{align*}z \to +\infty, e^{-\infty} \to 0 \Rightarrow g(z)=1 \newline z \to -\infty, e^{\infty}\to \infty \Rightarrow g(z)=0 \end{align*}$ |
|
|
|
|
|
|
|
同线性回归模型的不同点在于: |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
z \to +\infty, e^{-\infty} \to 0 \Rightarrow g(z)=1 \\ |
|
|
|
z \to -\infty, e^{\infty}\to \infty \Rightarrow g(z)=0 |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
直观一点来个例子,${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}\right)$ 是下图模型的假设函数: |
|
|
|
|
|
|
|
 |
|
|
|
@@ -94,8 +104,9 @@ $\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < |
|
|
|
|
|
|
|
为了拟合下图数据,建模多项式假设函数: |
|
|
|
|
|
|
|
${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}+{\theta_{3}}x_{1}^{2}+{\theta_{4}}x_{2}^{2} \right)$ |
|
|
|
|
|
|
|
$$ |
|
|
|
{h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2}}+{\theta_{3}}x_{1}^{2}+{\theta_{4}}x_{2}^{2} \right) |
|
|
|
$$ |
|
|
|
这里取 $\theta = \begin{bmatrix} -1\\0\\0\\1\\1\end{bmatrix}$,决策边界对应了一个在原点处的单位圆(${x_1}^2+{x_2}^2 = 1$),如此便可给出分类结果,如图中品红色曲线: |
|
|
|
|
|
|
|
|
|
|
|
@@ -128,8 +139,13 @@ ${h_\theta}\left( x \right)=g\left( {\theta_0}+{\theta_1}{x_1}+{\theta_{2}}{x_{2 |
|
|
|
|
|
|
|
对于逻辑回归,更换平方损失函数为**对数损失函数**,可由统计学中的最大似然估计方法推出代价函数 $J(\theta)$: |
|
|
|
|
|
|
|
$\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}$ |
|
|
|
|
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \\ |
|
|
|
& \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \\ |
|
|
|
& \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0} |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
则有关于 $J(\theta)$ 的图像如下: |
|
|
|
|
|
|
|
 |
|
|
|
@@ -155,11 +171,25 @@ $h = g(X\theta)$,$J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\ |
|
|
|
|
|
|
|
为了最优化 $\theta$,仍使用梯度下降法,算法同线性回归中一致: |
|
|
|
|
|
|
|
$\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; &{{\theta }_{j}}:={{\theta }_{j}}-\alpha \frac{\partial }{\partial {{\theta }_{j}}}J\left( {\theta} \right) \newline \rbrace \end{align*}$ |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& \text{Repeat until convergence:} \; \lbrace \\ |
|
|
|
&{{\theta }_{j}}:={{\theta }_{j}}-\alpha \frac{\partial }{\partial {{\theta }_{j}}}J\left( {\theta} \right) \\ |
|
|
|
\rbrace |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
解出偏导得: |
|
|
|
|
|
|
|
$\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \; & \text{for j := 0,1...n}\newline \rbrace\end{align*}$ |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& \text{Repeat until convergence:} \; \lbrace \\ |
|
|
|
& \theta_j := \theta_j - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} \; & \text{for j := 0,1...n}\\ |
|
|
|
\rbrace |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
注意,虽然形式上梯度下降算法同线性回归一样,但其中的假设函不同,即$h_\theta(x) = g\left(\theta^{T}x \right)$,不过求导后的结果也相同。 |
|
|
|
|
|
|
|
@@ -167,38 +197,49 @@ $\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \thet |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**逻辑回归中代价函数求导的推导过程:** |
|
|
|
|
|
|
|
$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$ |
|
|
|
**逻辑回归中代价函数求导的推导过程:**[]() |
|
|
|
|
|
|
|
$$ |
|
|
|
J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))] |
|
|
|
$$ |
|
|
|
令 $f(\theta) = {{y}^{(i)}}\log \left( {h_\theta}\left( {{x}^{(i)}} \right) \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-{h_\theta}\left( {{x}^{(i)}} \right) \right)$ |
|
|
|
|
|
|
|
忆及 $h_\theta(x) = g(z)$,$g(z) = \frac{1}{1+e^{(-z)}}$,则 |
|
|
|
|
|
|
|
$f(\theta)={{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-z}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-z}}} \right)$ |
|
|
|
$=-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)$ |
|
|
|
|
|
|
|
忆及 $z=\theta^Tx^{(i)}$,对 $\theta_j$ 求偏导则没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
f(\theta) &= {{y}^{(i)}}\log \left( \frac{1}{1+{{e}^{-z}}} \right)+\left( 1-{{y}^{(i)}} \right)\log \left( 1-\frac{1}{1+{{e}^{-z}}} \right) \\ |
|
|
|
&= -{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right) |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
$\frac{\partial z}{\partial {\theta_{j}}}=\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j$ |
|
|
|
忆及 $z=\theta^Tx^{(i)}$,对 $\theta_j$ 求偏导,则没有 $\theta_j$ 的项求偏导即为 $0$,都消去,则得: |
|
|
|
|
|
|
|
$$ |
|
|
|
\frac{\partial z}{\partial {\theta_{j}}}=\frac{\partial }{\partial {\theta_{j}}}\left( \theta^Tx^{(i)} \right)=x^{(i)}_j |
|
|
|
$$ |
|
|
|
所以有: |
|
|
|
|
|
|
|
$\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)]$ |
|
|
|
|
|
|
|
$=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left(-z \right) e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left(z \right){e^{z}}}{1+e^{z}}$ |
|
|
|
|
|
|
|
$=-{{y}^{(i)}}\frac{-x^{(i)}_je^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{x^{(i)}_j}{1+e^{-z}}$ |
|
|
|
$=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$ |
|
|
|
$=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j$ |
|
|
|
$=\left(\frac{{{y}^{(i)}}(e^{-z}+1)-1}{1+e^{-z}}\right)x^{(i)}_j$ |
|
|
|
$={({{y}^{(i)}}-\frac{1}{1+{{e}^{-z}}})x_j^{(i)}}$ |
|
|
|
$={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}}$ |
|
|
|
$=-{\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}}$ |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
\frac{\partial }{\partial {\theta_{j}}}f\left( \theta \right)&=\frac{\partial }{\partial {\theta_{j}}}[-{{y}^{(i)}}\log \left( 1+{{e}^{-z}} \right)-\left( 1-{{y}^{(i)}} \right)\log \left( 1+{{e}^{z}} \right)] \\ |
|
|
|
&=-{{y}^{(i)}}\frac{\frac{\partial }{\partial {\theta_{j}}}\left(-z \right) e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{\frac{\partial }{\partial {\theta_{j}}}\left(z \right){e^{z}}}{1+e^{z}} \\ |
|
|
|
&=-{{y}^{(i)}}\frac{-x^{(i)}_je^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{x^{(i)}_j}{1+e^{-z}} \\ |
|
|
|
&=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j \\ |
|
|
|
&=\left({{y}^{(i)}}\frac{e^{-z}}{1+e^{-z}}-\left( 1-{{y}^{(i)}} \right)\frac{1}{1+e^{-z}}\right)x^{(i)}_j \\ |
|
|
|
&=\left(\frac{{{y}^{(i)}}(e^{-z}+1)-1}{1+e^{-z}}\right)x^{(i)}_j \\ |
|
|
|
&={({{y}^{(i)}}-\frac{1}{1+{{e}^{-z}}})x_j^{(i)}} \\ |
|
|
|
&={\left({{y}^{(i)}}-{h_\theta}\left( {{x}^{(i)}} \right)\right)x_j^{(i)}} \\ |
|
|
|
&=-{\left({h_\theta}\left( {{x}^{(i)}} \right)-{{y}^{(i)}}\right)x_j^{(i)}} |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
则可得代价函数的导数: |
|
|
|
|
|
|
|
$\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} $ |
|
|
|
$$ |
|
|
|
\frac{\partial }{\partial {\theta_{j}}}J(\theta) = -\frac{1}{m}\sum\limits_{i=1}^{m}{\frac{\partial }{\partial {\theta_{j}}}f(\theta)}=\frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
## 6.6 进阶优化(Advanced Optimization) |
|
|
|
|
|
|
|
@@ -354,7 +395,10 @@ exitFlag = 1 |
|
|
|
|
|
|
|
为了保留各个参数的信息,不修改假设函数,改而修改代价函数: |
|
|
|
|
|
|
|
$min_\theta\ \dfrac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + 1000\cdot\theta_3^2 + 1000\cdot\theta_4^2$ |
|
|
|
$$ |
|
|
|
min_\theta\ \dfrac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + 1000\cdot\theta_3^2 + 1000\cdot\theta_4^2 |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
上式中,我们在代价函数中增加了 $\theta_3$、$\theta_4$ 的惩罚项(penalty term) $1000\cdot\theta_3^2 + 1000\cdot\theta_4^2$,如果要最小化代价函数,那么势必需要极大地**减小 $\theta_3$、$\theta_4$**,从而使得假设函数中的 $\theta_3x^3$、$\theta_4x^4$ 这两项的参数非常小,就相当于没有了,假设函数也就**“变得”简单**了,从而在保留各参数的情况下避免了过拟合问题。 |
|
|
|
|
|
|
|
@@ -366,7 +410,10 @@ $min_\theta\ \dfrac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + 1000\cd |
|
|
|
|
|
|
|
代价函数: |
|
|
|
|
|
|
|
$J\left( \theta \right)=\frac{1}{2m}[\sum\limits_{i=1}^{m}{{{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})}^{2}}+\lambda \sum\limits_{j=1}^{n}{\theta_{j}^{2}}]}$ |
|
|
|
$$ |
|
|
|
J\left( \theta \right)=\frac{1}{2m}[\sum\limits_{i=1}^{m}{{{({h_\theta}({{x}^{(i)}})-{{y}^{(i)}})}^{2}}+\lambda \sum\limits_{j=1}^{n}{\theta_{j}^{2}}]} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
> $\lambda$: 正则化参数(Regularization Parameter),$\lambda > 0$ |
|
|
|
> |
|
|
|
@@ -392,11 +439,20 @@ $\lambda$ 正则化参数类似于学习速率,也需要我们自行对其选 |
|
|
|
|
|
|
|
应用正则化的线性回归梯度下降算法: |
|
|
|
|
|
|
|
$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right], \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$ |
|
|
|
|
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& \text{Repeat}\ \lbrace \\ |
|
|
|
& \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \\ |
|
|
|
& \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right], \ \ \ j \in \lbrace 1,2...n\rbrace\\ |
|
|
|
& \rbrace |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
也可以移项得到更新表达式的另一种表示形式 |
|
|
|
|
|
|
|
$\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$ |
|
|
|
$$ |
|
|
|
\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
> $\frac{\lambda}{m}\theta_j$: 正则化项 |
|
|
|
|
|
|
|
@@ -404,7 +460,17 @@ $\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1} |
|
|
|
|
|
|
|
应用正则化的正规方程法[^2]: |
|
|
|
|
|
|
|
$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$ |
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \\ |
|
|
|
& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \\ |
|
|
|
& 1 & & & \\ |
|
|
|
& & 1 & & \\ |
|
|
|
& & & \ddots & \\ |
|
|
|
& & & & 1 \\ \end{bmatrix} |
|
|
|
\end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
> $\lambda\cdot L$: 正则化项 |
|
|
|
> |
|
|
|
@@ -432,14 +498,20 @@ L = |
|
|
|
|
|
|
|
为逻辑回归的代价函数添加正则化项: |
|
|
|
|
|
|
|
$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$ |
|
|
|
|
|
|
|
$$ |
|
|
|
J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2 |
|
|
|
$$ |
|
|
|
前文已经证明过逻辑回归和线性回归的代价函数的求导结果是一样的,此处通过给正则化项添加常数 $\frac{1}{2}$,则其求导结果也就一样了。 |
|
|
|
|
|
|
|
从而有应用正则化的逻辑回归梯度下降算法: |
|
|
|
|
|
|
|
$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right], \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$ |
|
|
|
|
|
|
|
$$ |
|
|
|
\begin{align*} |
|
|
|
& \text{Repeat}\ \lbrace \\ |
|
|
|
& \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \\ |
|
|
|
& \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right], \ \ \ j \in \lbrace 1,2...n\rbrace\\ |
|
|
|
& \rbrace \end{align*} |
|
|
|
$$ |
|
|
|
|
|
|
|
|
|
|
|
[^1]: https://en.wikipedia.org/wiki/List_of_algorithms#Optimization_algorithms |