8

dd20ffcb · wizardforcel · 0f96fea4 · dd20ffcb
隐藏空白更改
内联并排

Showing with 69 addition and 81 deletion

docs/8.md docs/8.md +69 -81

未找到文件。
--- a/docs/8.md
+++ b/docs/8.md
-# 8\. Regression
+# 8\. 回归

-Chinese proverb
+> **千里之行，始于足下。** – 《老子》

-**A journey of a thousand miles begins with a single step.** – old Chinese proverb
+在统计建模中，回归分析侧重于研究因变量与一个或多个自变量之间的关系。[维基百科的回归分析](https://en.wikipedia.org/wiki/Regression_analysis)。

-In statistical modeling, regression analysis focuses on investigating the relationship between a dependent variable and one or more independent variables. [Wikipedia Regression analysis](https://en.wikipedia.org/wiki/Regression_analysis)
+在数据挖掘中，回归是一种模型，用于表示标签（或目标，它是数值变量）的值与一个或多个特征（或预测变量，它们可以是数值和分类变量）之间的关系。

-In data mining, Regression is a model to represent the relationship between the value of lable ( or target, it is numerical variable) and on one or more features (or predictors they can be numerical and categorical variables).
+## 8.1\. 线性回归

-## 8.1\. Linear Regression
+### 8.1.1\. 简介

-### 8.1.1\. Introduction
-
-Given that a data set ![{\displaystyle \{\,x_{i1},\ldots ,x_{in},y_{i}\}_{i=1}^{m}}](img/4b454255e179a3626e205ce324184acf.jpg) which contains n features (variables) and m samples (data points), in simple linear regression model for modeling ![{\displaystyle m}](img/2649ef98f720c129d663f5d82add4129.jpg) data points with ![j](img/aec897e37f71d43694de4db49ed3be3e.jpg) independent variables: ![{\displaystyle x_{ij}}](img/91d663abfef497e13ec41f9300a5c354.jpg), the formula is given by:
+给定数据集 ![{\displaystyle \{\,x_{i1},\ldots ,x_{in},y_{i}\}_{i=1}^{m}}](img/4b454255e179a3626e205ce324184acf.jpg)，它包含`n`个特征（变量）和`m`个样本（数据点），在简单线性回归模型中，使用`j`个自变量建模`m`个数据点 ：![{\displaystyle x_{ij}}](img/91d663abfef497e13ec41f9300a5c354.jpg)，公式由下式给出：

 > ![y_i = \beta_0 + \beta_j x_{ij}, \text{where}, i= 1, \cdots m, j= 1, \cdots n.](img/59ebd939c24bf4d59d82b0daf4874daf.jpg)

-In matrix notation, the data set is written as ![\X = [\x_1,\cdots, \x_n]](img/80a25ad6329d3836f4e625a1c93e7898.jpg) with ![\x_j = {\displaystyle \{x_{ij}\}_{i=1}^{m}}](img/c4660874124a448ac14209f4a59e367a.jpg), ![\y = {\displaystyle \{y_{i}\}_{i=1}^{m}}](img/82a22af158d760e46ae93ba1663a6487.jpg) (see Fig. [Feature matrix and label](#fig-fm)) and ![\Bbeta^\top = {\displaystyle \{\beta_{j}\}_{j=1}^{n}}](img/fad9e18cebad821450ed0f34abdb3988.jpg). Then the matrix format equation is written as
+在矩阵表示法中，数据集写为 ![\X = [\x_1,\cdots, \x_n]](img/80a25ad6329d3836f4e625a1c93e7898.jpg)，其中 ![\x_j = {\displaystyle \{x_{ij}\}_{i=1}^{m}}](img/c4660874124a448ac14209f4a59e367a.jpg)，![\y = {\displaystyle \{y_{i}\}_{i=1}^{m}}](img/82a22af158d760e46ae93ba1663a6487.jpg)，并且 ![\Bbeta^\top = {\displaystyle \{\beta_{j}\}_{j=1}^{n}}](img/fad9e18cebad821450ed0f34abdb3988.jpg)。之后矩阵形式的方程写为：

 > (1)![\y = \X \Bbeta.](img/2d776487e1a2ee4683c3c6f51fca7e48.jpg)

 ![https://runawayhorse001.github.io/LearningApacheSpark/_images/fm.png](img/3b99ee07cd783026d41b65651ee5d293.jpg)

-Feature matrix and label
-
-### 8.1.2\. How to solve it?
-
-1.  Direct Methods (For more information please refer to my [Prelim Notes for Numerical Analysis](http://web.utk.edu/~wfeng1/doc/PrelimNum.pdf))
-
-    &gt; *   For squared or rectangular matrices
-    &gt;     
-    &gt;     
-    &gt;     
-    &gt;     &gt; *   Singular Value Decomposition
-    &gt;     &gt; *   Gram-Schmidt orthogonalization
-    &gt;     &gt; *   QR Decomposition
-    &gt;     
-    &gt;     
-    &gt; *   For squared matrices
-    &gt;     
-    &gt;     
-    &gt;     
-    &gt;     &gt; *   LU Decomposition
-    &gt;     &gt; *   Cholesky Decomposition
-    &gt;     &gt; *   Regular Splittings
-
-2.  Iterative Methods
-
-    &gt; *   Stationary cases iterative method
-    &gt;     
-    &gt;     
-    &gt;     
-    &gt;     &gt; *   Jacobi Method
-    &gt;     &gt; *   Gauss-Seidel Method
-    &gt;     &gt; *   Richardson Method
-    &gt;     &gt; *   Successive Over Relaxation (SOR) Method
-    &gt;     
-    &gt;     
-    &gt; *   Dynamic cases iterative method
-    &gt;     
-    &gt;     
-    &gt;     
-    &gt;     &gt; *   Chebyshev iterative Method
-    &gt;     &gt; *   Minimal residuals Method
-    &gt;     &gt; *   Minimal correction iterative method
-    &gt;     &gt; *   Steepest Descent Method
-    &gt;     &gt; *   Conjugate Gradients Method
-
-### 8.1.3\. Ordinary Least Squares
-
-In mathematics, [(1)](#equation-eq-ax) is a overdetermined system. The method of ordinary least squares can be used to find an approximate solution to overdetermined systems. For the system overdetermined system [(1)](#equation-eq-ax), the least squares formula is obtained from the problem
+特征矩阵和标签
+
+### 8.1.2\. 如何求解
+
+1.  直接法 （更多信息请参考我的[数值分析预备笔记](http://web.utk.edu/~wfeng1/doc/PrelimNum.pdf)）。
+
+    *   对于方阵或长方阵
+
+        *   奇异值分解
+        *   格兰施密特正交化
+        *   QR 分解
+
+    *   对于方阵
+
+        *   LU 分解
+        *   Cholesky 分解
+        *   正则分割
+
+2.  迭代方法
+
+    *   静态案例迭代法
+     
+        *   Jacobi 方法
+        *   Gauss-Seidel 方法
+        *   Richardson 方法
+        *   连续过度放松 (SOR) 方法
+     
+    *   动态案例迭代法
+     
+        *   Chebyshev 迭代法
+        *   最小残差法
+        *   最小修正迭代法
+        *   最速下降法
+        *   共轭梯度法
+
+### 8.1.3\. 普通最小二乘
+
+在数学中，[（1）](#equation-eq-ax)是一个超定系统。 普通最小二乘法可用于找到超定系统的近似解。 对于系统超定系统[（1）](#equation-eq-ax)，从问题中获得最小二乘公式

 (2)![{\displaystyle \min _{\Bbeta}  ||\X \Bbeta-\y||} ,](img/b8bf446d4a625497f28f2347b7ca0c92.jpg)

-the solution of which can be written with the normal equations:
+其解决方案可以用正规方程式编写：

 (3)![\Bbeta  = (\X^T\X)^{-1}\X^T\y](img/d2f9799d371fde446e6dc8292ba07393.jpg)

-where ![{\displaystyle {\mathrm {T} }}](img/d09c46ec94d638e4ddcecfbba1c11ea8.jpg) indicates a matrix transpose, provided ![{\displaystyle (\X^{\mathrm {T} }\X)^{-1}}](img/d003fed20e7f2d040ccc24412cb854d1.jpg) exists (that is, provided ![\X](img/501025688da0cf9e2b3937cd7da9580d.jpg) has full column rank).
+其中 ![{\displaystyle {\mathrm {T} }}](img/d09c46ec94d638e4ddcecfbba1c11ea8.jpg) 表示矩阵转置，假设 ![{\displaystyle (\X^{\mathrm {T} }\X)^{-1}}](img/d003fed20e7f2d040ccc24412cb854d1.jpg) 存在（也就是假设 ![\X](img/501025688da0cf9e2b3937cd7da9580d.jpg) 是列满秩的）。

-Note
-
-Actually, [(3)](#equation-eq-solax) can be derivated by the following way: multiply ![\X^T](img/d142da9aae51c6d3c3c736fc82252862.jpg) on side of [(1)](#equation-eq-ax) and then multiply ![(\X^T\X)^{-1}](img/16dd8d60ea9b042c3ce0652c9f0571e8.jpg) on both side of the former result. You may also apply the `Extreme Value Theorem` to [(2)](#equation-eq-minax) and find the solution [(3)](#equation-eq-solax).
+> 注意
+> 
+> 实际上，[(3)](#equation-eq-solax) 可以用下面的方式导出：将 ![\X^T](img/d142da9aae51c6d3c3c736fc82252862.jpg) 和 [(1)](#equation-eq-ax) 相乘，之后在之前结果的两边乘上 ![(\X^T\X)^{-1}](img/16dd8d60ea9b042c3ce0652c9f0571e8.jpg)。你也可以对 [(2)](#equation-eq-minax) 应用极值定理，并寻找 [(3)](#equation-eq-solax) 的解。

-### 8.1.4\. Gradient Descent
+### 8.1.4\. 梯度下降

-Let’s use the following hypothesis:
+让我们使用下列假设：

 ![h_\Bbeta = \beta_0 + \beta_j \x_{j}, \text{where}, j= 1, \cdots n.](img/a5fda7453d5707d5e8985434c789ba48.jpg)

-Then, solving [(2)](#equation-eq-minax) is equivalent to minimize the following `cost fucntion` :
+之后求解 [(2)](#equation-eq-minax) 等价于最小化下面的损失函数：

-### 8.1.5\. Cost Function
+### 8.1.5\. 损失函数

 (4)![J(\Bbeta) = \frac{1}{2m}\sum_{i=1}^m \left( h_\Bbeta(x^{(i)})-y^{(i)}) \right)^2](img/77c47cf9cfec8ec740c5a18dc4386670.jpg)

-Note
-
-The reason why we prefer to solve [(4)](#equation-eq-lreg-cost) rather than [(2)](#equation-eq-minax) is because [(4)](#equation-eq-lreg-cost) is convex and it has some nice properties, such as it’s uniquely solvable and energy stable for small enough learning rate. the interested reader who has great interest in non-convex cost function (energy) case. is referred to [[Feng2016PSD]](reference.html#feng2016psd) for more details.
+> 注意
+> 
+> 我们倾向求解 [(4)](#equation-eq-lreg-cost) 而不是 [(2)](#equation-eq-minax) 的原因是，[(4)](#equation-eq-lreg-cost) 是凸的，并且属性良好，例如它是个唯一可解，对于足够小的学习率是能量稳定的。如果读者对非凸损失函数（能量）案例感兴趣，可以参考 [[Feng2016PSD]](reference.html#feng2016psd)。

 ![https://runawayhorse001.github.io/LearningApacheSpark/_images/gradient1d.png](img/875e532ac3b299876d209507d595df14.jpg)

-Gradient Descent in 1D
+一维中的梯度下降

 ![https://runawayhorse001.github.io/LearningApacheSpark/_images/gradient2d.png](img/d4b34834b440d5d60f25912180e7e130.jpg)

-Gradient Descent in 2D
+二维中的梯度下降

-### 8.1.6\. Batch Gradient Descent
+### 8.1.6\. 批量梯度下降

-Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. It searchs with the direction of the steepest desscent which is defined by the `negative of the gradient` (see Fig. [Gradient Descent in 1D](#fig-gd1d) and [Gradient Descent in 2D](#fig-gd2d) for 1D and 2D, respectively) and with learning rate (search step) ![\alpha](img/aef64ee73dc1b1a03a152855f685113e.jpg).
+梯度下降是用于找到函数最小值的一阶迭代优化算法。 它沿着最陡峭的方向搜索，该方向由“梯度的相反数”（参见图[ 1D 中的梯度下降](#fig-gd1d)和[ 2D 中的梯度下降](#fig-gd2d)）和学习率（搜索步长）![\ alpha](img/aef64ee73dc1b1a03a152855f685113e.jpg) 定义。

-### 8.1.7\. Stochastic Dradient Descent
+### 8.1.7\. 随机梯度下降

-### 8.1.8\. Mini-batch Gradient Descent
+### 8.1.8\. 小批量梯度下降

-### 8.1.9\. Demo
+### 8.1.9\. 示例

-*   The Jupyter notebook can be download from [Linear Regression](_static/LinearRegression.ipynb) which was implemented without using Pipeline.
-*   The Jupyter notebook can be download from [Linear Regression with Pipeline](_static/LinearRegressionWpipeline.ipynb) which was implemented with using Pipeline.
-*   I will only present the code with pipeline style in the following.
-*   For more details about the parameters, please visit [Linear Regression API](http://takwatanabe.me/pyspark/generated/generated/ml.regression.LinearRegression.html) .
+*   Jupyter 笔记本可以从[线性回归](_static/LinearRegression.ipynb) 下载，它不使用流水线实现。
+*   upyter 笔记本可以从[带流水线的线性回归](_static/LinearRegressionWpipeline.ipynb)，它使用流水线实现。
+*   我下面仅仅展示流水线风格的代码。
+*   参数的更多信息请见[线性回归 API](http://takwatanabe.me/pyspark/generated/generated/ml.regression.LinearRegression.html)。

 建立 spark 上下文和 SparkSession