In statistical modeling, regression analysis focuses on investigating the relationship between a dependent variable and one or more independent variables. [Wikipedia Regression analysis](https://en.wikipedia.org/wiki/Regression_analysis)
In data mining, Regression is a model to represent the relationship between the value of lable ( or target, it is numerical variable) and on one or more features (or predictors they can be numerical and categorical variables).
## 8.1\. 线性回归
## 8.1\. Linear Regression
### 8.1.1\. 简介
### 8.1.1\. Introduction
Given that a data set ![{\displaystyle \{\,x_{i1},\ldots ,x_{in},y_{i}\}_{i=1}^{m}}](img/4b454255e179a3626e205ce324184acf.jpg) which contains n features (variables) and m samples (data points), in simple linear regression model for modeling ![{\displaystyle m}](img/2649ef98f720c129d663f5d82add4129.jpg) data points with ![j](img/aec897e37f71d43694de4db49ed3be3e.jpg) independent variables: ![{\displaystyle x_{ij}}](img/91d663abfef497e13ec41f9300a5c354.jpg), the formula is given by:
In matrix notation, the data set is written as ![\X = [\x_1,\cdots, \x_n]](img/80a25ad6329d3836f4e625a1c93e7898.jpg) with ![\x_j = {\displaystyle \{x_{ij}\}_{i=1}^{m}}](img/c4660874124a448ac14209f4a59e367a.jpg), ![\y = {\displaystyle \{y_{i}\}_{i=1}^{m}}](img/82a22af158d760e46ae93ba1663a6487.jpg)(see Fig. [Feature matrix and label](#fig-fm)) and ![\Bbeta^\top = {\displaystyle \{\beta_{j}\}_{j=1}^{n}}](img/fad9e18cebad821450ed0f34abdb3988.jpg). Then the matrix format equation is written as
1. Direct Methods (For more information please refer to my [Prelim Notes for Numerical Analysis](http://web.utk.edu/~wfeng1/doc/PrelimNum.pdf))
> * For squared or rectangular matrices
>
>
>
> > * Singular Value Decomposition
> > * Gram-Schmidt orthogonalization
> > * QR Decomposition
>
>
> * For squared matrices
>
>
>
> > * LU Decomposition
> > * Cholesky Decomposition
> > * Regular Splittings
2. Iterative Methods
> * Stationary cases iterative method
>
>
>
> > * Jacobi Method
> > * Gauss-Seidel Method
> > * Richardson Method
> > * Successive Over Relaxation (SOR) Method
>
>
> * Dynamic cases iterative method
>
>
>
> > * Chebyshev iterative Method
> > * Minimal residuals Method
> > * Minimal correction iterative method
> > * Steepest Descent Method
> > * Conjugate Gradients Method
### 8.1.3\. Ordinary Least Squares
In mathematics, [(1)](#equation-eq-ax) is a overdetermined system. The method of ordinary least squares can be used to find an approximate solution to overdetermined systems. For the system overdetermined system [(1)](#equation-eq-ax), the least squares formula is obtained from the problem
where ![{\displaystyle {\mathrm {T} }}](img/d09c46ec94d638e4ddcecfbba1c11ea8.jpg) indicates a matrix transpose, provided ![{\displaystyle (\X^{\mathrm {T} }\X)^{-1}}](img/d003fed20e7f2d040ccc24412cb854d1.jpg) exists (that is, provided ![\X](img/501025688da0cf9e2b3937cd7da9580d.jpg) has full column rank).
Actually, [(3)](#equation-eq-solax) can be derivated by the following way: multiply ![\X^T](img/d142da9aae51c6d3c3c736fc82252862.jpg) on side of [(1)](#equation-eq-ax) and then multiply ![(\X^T\X)^{-1}](img/16dd8d60ea9b042c3ce0652c9f0571e8.jpg) on both side of the former result. You may also apply the `Extreme Value Theorem` to [(2)](#equation-eq-minax) and find the solution [(3)](#equation-eq-solax).
The reason why we prefer to solve [(4)](#equation-eq-lreg-cost) rather than [(2)](#equation-eq-minax) is because [(4)](#equation-eq-lreg-cost) is convex and it has some nice properties, such as it’s uniquely solvable and energy stable for small enough learning rate. the interested reader who has great interest in non-convex cost function (energy) case. is referred to [[Feng2016PSD]](reference.html#feng2016psd) for more details.
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. It searchs with the direction of the steepest desscent which is defined by the `negative of the gradient` (see Fig. [Gradient Descent in 1D](#fig-gd1d) and [Gradient Descent in 2D](#fig-gd2d) for 1D and 2D, respectively) and with learning rate (search step) ![\alpha](img/aef64ee73dc1b1a03a152855f685113e.jpg).
*The Jupyter notebook can be download from [Linear Regression](_static/LinearRegression.ipynb) which was implemented without using Pipeline.
*The Jupyter notebook can be download from [Linear Regression with Pipeline](_static/LinearRegressionWpipeline.ipynb) which was implemented with using Pipeline.
*I will only present the code with pipeline style in the following.
*For more details about the parameters, please visit [Linear Regression API](http://takwatanabe.me/pyspark/generated/generated/ml.regression.LinearRegression.html) .