# 八、如何实现线性回归

在上一课中，我们使用 scikit learn 的实现计算了糖尿病数据集的最小二乘线性回归。 今天，我们将看看如何编写自己的实现。

## 起步

```py
from sklearn import datasets, linear_model, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
import math, scipy, numpy as np
from scipy import linalg

np.set_printoptions(precision=6)

data = datasets.load_diabetes()

feature_names=['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

trn,test,y_trn,y_test = train_test_split(data.data, data.target, test_size=0.2)

trn.shape, test.shape

# ((353, 10), (89, 10))

def regr_metrics(act, pred):
    return (math.sqrt(metrics.mean_squared_error(act, pred)), 
    metrics.mean_absolute_error(act, pred))
```

### sklearn 如何实现它

sklearn 是如何做到这一点的？ 通过检查[源代码](https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/linear_model/base.py#L417)，你可以看到在密集的情况下，它调用[`scipy.linalg.lstqr`](https://github.com/scipy/scipy/blob/v0.19.0/scipy/linalg/basic.py#L892-L1058)，它调用 LAPACK 方法：

> 选项是`'gelsd'`，`'gelsy'`，`'gelss'`。默认值`'gelsd'`是个好的选择，但是，`'gelsy'`在许多问题上更快一些。`'gelss'`由于历史原因而使用。它通常更慢但是使用更少内存。

+   [gelsd](https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/_gelsd.htm)：使用 SVD 和分治方法
+   [gelsy](https://software.intel.com/en-us/node/521113)：使用 QR 分解
+   [gelss](https://software.intel.com/en-us/node/521114)：使用 SVD

### Scipy 稀疏最小二乘

我们不会详细介绍稀疏版本的最小二乘法。如果你有兴趣，请参考以下信息：

Scipy 稀疏最小二乘使用称为 [Golub 和 Kahan 双对角化](https://web.stanford.edu/class/cme324/paige-saunders2.pdf)的迭代方法。

Scipy 稀疏最小二乘源代码：预处理是减少迭代次数的另一种方法。如果有可能有效地求解相关系统`M*x = b`，其中`M`以某种有用的方式近似`A`（例如，`M-A`具有低秩或其元素相对于`A`的元素较小），则 LSQR 可以在系统`A*M(inverse)*z = b`更快地收敛。之后可以通过求解`M * x = z`来恢复`x`。

如果`A`是对称的，则不应使用 LSQR！替代方案是对称共轭梯度法（cg）和/或 SYMMLQ。 SYMMLQ 是对称 cg 的一种实现，适用于任何对称的`A`，并且比 LSQR 更快收敛。如果`A`是正定的，则​​存在对称 cg 的其他实现，每次迭代需要的工作量比 SYMMLQ 少一些（但需要相同的迭代次数）。

### `linalg.lstqr`

sklearn 实现为我们添加一个常数项（因为对于我们正在学习的直线，`y`截距可能不是 0）。 我们现在需要手工完成：

```py
trn_int = np.c_[trn, np.ones(trn.shape[0])]
test_int = np.c_[test, np.ones(test.shape[0])]
```

由于`linalg.lstsq`允许我们指定，我们想要使用哪个 LAPACK 例程，让我们尝试它们并进行一些时间比较：

```py
%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver="gelsd")

# 290 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver="gelsy")

# 140 µs ± 91.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver="gelss")

# 199 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

## 朴素解法

回想一下，我们想找到 ![\hat x](img/tex-c243886a288804343eee2af0ad8dcebc.gif)，来最小化：

![\big\vert\big\vert Ax - b \big\vert\big\vert_2](img/tex-4481186643f99234ebdfe2fef5baa9b8.gif)

另一种思考方式是，我们对向量`b`最接近`A`的子空间（称为`A`的范围）的地方感兴趣。 这是`b`在`A`上的投影。由于 ![b - A \hat x](img/tex-b21c98c8f055b92996a80407da83b1ca.gif) 必须垂直于`A`的子空间，我们可以看到：

![A^T (b - A\hat{x}) = 0](img/tex-1bc9193e34bc6ecc3df206e9fee737a1.gif)

使用了 ![A^T](img/tex-521a9daa8c1cd6293c6e22e8e8386c42.gif) 因为要相乘`A`和 ![b - A\hat{x}](img/tex-cb277091062ae1b5af8d8d6b6844c804.gif) 的每列。

这让我们得到正规方程：

![x = (A^TA)^{-1}A^T b](img/tex-39d9d43173d632ad7babc040b062f89c.gif)

```py
def ls_naive(A, b):
     return np.linalg.inv(A.T @ A) @ A.T @ b

%timeit coeffs_naive = ls_naive(trn_int, y_trn)

# 45.8 µs ± 4.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

coeffs_naive = ls_naive(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_naive)

# (57.94102134545707, 48.053565198516438)
```

## 正规方程（Cholesky）

正规方程：

![A^TA x = A^T b](img/tex-0668689340173b7ece8ed05b1fef41f2.gif)

如果`A`具有满秩，则伪逆 ![(A^TA)^{-1}A^T](img/tex-9a9e4519d91963584b808251dd695f2a.gif) 是正方形，埃尔米特正定矩阵。 解决这种系统的标准方法是 Cholesky 分解，它找到上三角形`R`，满足 ![A^TA = R^TR](img/tex-4c3150fb54600e58070931bf27225aa8.gif)。

以下步骤基于 Trefethen 的算法 11.1：

```py
A = trn_int

b = y_trn

AtA = A.T @ A
Atb = A.T @ b
```

警告：对于Cholesky，Numpy 和 Scipy 默认为不同的上/下三角。

```py
R = scipy.linalg.cholesky(AtA)

np.set_printoptions(suppress=True, precision=4)
R

'''
array([[  0.9124,   0.1438,   0.1511,   0.3002,   0.2228,   0.188 ,
         -0.051 ,   0.1746,   0.22  ,   0.2768,  -0.2583],
       [  0.    ,   0.8832,   0.0507,   0.1826,  -0.0251,   0.0928,
         -0.3842,   0.2999,   0.0911,   0.15  ,   0.4393],
       [  0.    ,   0.    ,   0.8672,   0.2845,   0.2096,   0.2153,
         -0.2695,   0.3181,   0.3387,   0.2894,  -0.005 ],
       [  0.    ,   0.    ,   0.    ,   0.7678,   0.0762,  -0.0077,
          0.0383,   0.0014,   0.165 ,   0.166 ,   0.0234],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.8288,   0.7381,
          0.1145,   0.4067,   0.3494,   0.158 ,  -0.2826],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.3735,
         -0.3891,   0.2492,  -0.3245,  -0.0323,  -0.1137],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.    ,
          0.6406,  -0.511 ,  -0.5234,  -0.172 ,  -0.9392],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.    ,
          0.    ,   0.2887,  -0.0267,  -0.0062,   0.0643],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.    ,
          0.    ,   0.    ,   0.2823,   0.0636,   0.9355],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.    ,
          0.    ,   0.    ,   0.    ,   0.7238,   0.0202],
       [  0.    ,   0.    ,   0.    ,   0.    ,   0.    ,   0.    ,
          0.    ,   0.    ,   0.    ,   0.    ,  18.7319]])
'''
```

检查我们的分解：

```py
np.linalg.norm(AtA - R.T @ R)

# 4.5140158187158533e-16
```

![A^T A x = A^T b \\ R^T R x = A^T b \\R^T w = A^T b \\R x = w](img/tex-82513c6fe20ef419fc103891436353ab.gif)

```py
w = scipy.linalg.solve_triangular(R, Atb, lower=False, trans='T')
```

检查我们的结果是否符合预期总是好的：（以防我们输入错误的参数，函数没有返回我们想要的东西，或者有时文档甚至过时）。

```py
np.linalg.norm(R.T @ w - Atb)

# 1.1368683772161603e-13

coeffs_chol = scipy.linalg.solve_triangular(R, w, lower=False)

np.linalg.norm(R @ coeffs_chol - w)

# 6.861429794408013e-14

def ls_chol(A, b):
    R = scipy.linalg.cholesky(A.T @ A)
    w = scipy.linalg.solve_triangular(R, A.T @ b, trans='T')
    return scipy.linalg.solve_triangular(R, w)

%timeit coeffs_chol = ls_chol(trn_int, y_trn)

# 111 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

coeffs_chol = ls_chol(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_chol)

# (57.9410213454571, 48.053565198516438)
```

## QR 分解

![A x = b \\ A = Q R \\ Q R x = b \\ R x = Q^T b](img/tex-d45f0a6200f855ffbed7dbbf8dc6094b.gif)

```py
def ls_qr(A,b):
    Q, R = scipy.linalg.qr(A, mode='economic')
    return scipy.linalg.solve_triangular(R, Q.T @ b)

%timeit coeffs_qr = ls_qr(trn_int, y_trn)

# 205 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

coeffs_qr = ls_qr(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_qr)

# (57.94102134545711, 48.053565198516452)
```

### SVD

![A x = b \\ A = U \Sigma V \\ \Sigma V x = U^T b \\ \Sigma w = U^T b \\ x = V^T w](img/tex-4b62075d74bf4d4e462ede239a6dea12.gif)

SVD 给出伪逆。

```py
def ls_svd(A,b):
    m, n = A.shape
    U, sigma, Vh = scipy.linalg.svd(A, full_matrices=False)
    w = (U.T @ b)/ sigma
    return Vh.T @ w

%timeit coeffs_svd = ls_svd(trn_int, y_trn)

# 1.11 ms ± 320 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit coeffs_svd = ls_svd(trn_int, y_trn)

# 266 µs ± 8.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

coeffs_svd = ls_svd(trn_int, y_trn)
regr_metrics(y_test, test_int @ coeffs_svd)

# (57.941021345457244, 48.053565198516687)
```

### 最小二乘回归的随机 Sketching 技巧

[线性 Sketching](http://researcher.watson.ibm.com/researcher/files/us-dpwoodru/journal.pdf)（Woodruff）

+   抽取`r×n`随机矩阵`S`，`r << n`
+   计算`S A`和`S b`
+   找到回归`SA x = Sb`的精确解`x`

### 时间比较

```py
import timeit
import pandas as pd

def scipylstq(A, b):
    return scipy.linalg.lstsq(A,b)[0]

row_names = ['Normal Eqns- Naive',
             'Normal Eqns- Cholesky', 
             'QR Factorization', 
             'SVD', 
             'Scipy lstsq']

name2func = {'Normal Eqns- Naive': 'ls_naive', 
             'Normal Eqns- Cholesky': 'ls_chol', 
             'QR Factorization': 'ls_qr',
             'SVD': 'ls_svd',
             'Scipy lstsq': 'scipylstq'}

m_array = np.array([100, 1000, 10000])
n_array = np.array([20, 100, 1000])

index = pd.MultiIndex.from_product([m_array, n_array], names=['# rows', '# cols'])

pd.options.display.float_format = '{:,.6f}'.format
df = pd.DataFrame(index=row_names, columns=index)
df_error = pd.DataFrame(index=row_names, columns=index)

# %%prun
for m in m_array:
    for n in n_array:
        if m >= n:        
            x = np.random.uniform(-10,10,n)
            A = np.random.uniform(-40,40,[m,n])   # removed np.asfortranarray
            b = np.matmul(A, x) + np.random.normal(0,2,m)
            for name in row_names:
                fcn = name2func[name]
                t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())
                df.set_value(name, (m,n), t)
                coeffs = locals()[fcn](A, b)
                reg_met = regr_metrics(b, A @ coeffs)
                df_error.set_value(name, (m,n), reg_met[0])

df
```

| # rows | 100 | | | 1000 | | | 10000 | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| # cols | 20 | 100 | 1000 | 20 | 100 | 1000 | 20 | 100 | 1000 |
| Normal Eqns- Naive | 0.001276 | 0.003634 | NaN | 0.000960 | 0.005172 | 0.293126 | 0.002226 | 0.021248 | 1.164655 |
| Normal Eqns- Cholesky | 0.001660 | 0.003958 | NaN | 0.001665 | 0.004007 | 0.093696 | 0.001928 | 0.010456 | 0.399464 |
| QR Factorization | 0.002174 | 0.006486 | NaN | 0.004235 | 0.017773 | 0.213232 | 0.019229 | 0.116122 | 2.208129 |
| SVD | 0.003880 | 0.021737 | NaN | 0.004672 | 0.026950 | 1.280490 | 0.018138 | 0.130652 | 3.433003 |
| Scipy lstsq | 0.004338 | 0.020198 | NaN | 0.004320 | 0.021199 | 1.083804 | 0.012200 | 0.088467 | 2.134780 |

```py
df_error
```


| # rows | 100 | | | 1000 | | | 10000 | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| # cols | 20 | 100 | 1000 | 20 | 100 | 1000 | 20 | 100 | 1000 |
| Normal Eqns- Naive | 1.702742 | 0.000000 | NaN | 1.970767 | 1.904873 | 0.000000 | 1.978383 | 1.980449 | 1.884440 |
| Normal Eqns- Cholesky | 1.702742 | 0.000000 | NaN | 1.970767 | 1.904873 | 0.000000 | 1.978383 | 1.980449 | 1.884440 |
| QR Factorization | 1.702742 | 0.000000 | NaN | 1.970767 | 1.904873 | 0.000000 | 1.978383 | 1.980449 | 1.884440 |
| SVD | 1.702742 | 0.000000 | NaN | 1.970767 | 1.904873 | 0.000000 | 1.978383 | 1.980449 | 1.884440 |
| Scipy lstsq | 1.702742 | 0.000000 | NaN | 1.970767 | 1.904873 | 0.000000 | 1.978383 | 1.980449 | 1.884440 |

```py
store = pd.HDFStore('least_squares_results.h5')
store['df'] = df

'''
C:\Users\rache\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2881: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->floating,key->block0_values] [items->[(100, 20), (100, 100), (100, 1000), (1000, 20), (1000, 100), (1000, 1000), (5000, 20), (5000, 100), (5000, 1000)]]

  exec(code_obj, self.user_global_ns, self.user_ns)
'''
```

### 注解

我用魔术指令`%prun`来测量我的代码。

替代方案：最小绝对偏差（L1 回归）

+   异常值的敏感度低于最小二乘法。
+   没有闭式解，但可以通过线性规划解决。

### 条件作用和稳定性

### 条件数

条件数是一个指标，衡量输入的小变化导致输出变化的程度。

问题：为什么我们在数值线性代数中，关心输入的小变化的相关行为？

相对条件数由下式定义：

![\kappa = \sup_{\delta x} \frac{\|\delta f\|}{\| f(x) \|}\bigg/ \frac{\| \delta x \|}{\| x \|}](img/tex-af9ef2d9d428087a16ff027ddd2a254f.gif)

其中 ![\delta x](img/tex-58b1e9f4b9a86690cb106e65a265a34f.gif) 是无穷小。

根据 Trefethen（第 91 页），如果`κ`很小（例如 `1, 10, 10^2`），问题是良态的，如果`κ`很大（例如`10^6, 10^16`），那么问题是病态的。

条件作用：数学问题的扰动行为（例如最小二乘）

稳定性：用于在计算机上解决该问题的算法的扰动行为（例如，最小二乘算法，householder，回代，高斯消除）

### 条件作用的例子

计算非对称矩阵的特征值的问题通常是病态的。

```py
A = [[1, 1000], [0, 1]]
B = [[1, 1000], [0.001, 1]]

wA, vrA = scipy.linalg.eig(A)
wB, vrB = scipy.linalg.eig(B)

wA, wB

'''
(array([ 1.+0.j,  1.+0.j]),
 array([  2.00000000e+00+0.j,  -2.22044605e-16+0.j]))
'''
```

### 矩阵的条件数

乘积 ![\| A\| \|A^{-1} \|](img/tex-5ca635990dbca0e677a639ea78c11b33.gif) 经常出现，它有自己的名字：`A`的条件数。注意，通常我们谈论问题的条件作用，而不是矩阵。

`A`的条件数涉及：

+   给定`Ax = b`中的`A`和`x`，计算`b`
+   给定`Ax = b`中的`A`和`b`，计算`x`

### 未交待清楚的事情

### 完整和简化分解

### SVD

来自 Trefethen 的图：

![](img/full_svd.JPG)

![](img/reduced_svd.JPG)

### 对于所有矩阵，QR 分解都存在

与 SVD 一样，有 QR 分解的完整版和简化版。

![](img/full_qr.JPG)

![](img/reduced_qr.JPG)

### 矩阵的逆是不稳定的

```py
from scipy.linalg import hilbert

n = 5
hilbert(n)

'''
array([[ 1.    ,  0.5   ,  0.3333,  0.25  ,  0.2   ],
       [ 0.5   ,  0.3333,  0.25  ,  0.2   ,  0.1667],
       [ 0.3333,  0.25  ,  0.2   ,  0.1667,  0.1429],
       [ 0.25  ,  0.2   ,  0.1667,  0.1429,  0.125 ],
       [ 0.2   ,  0.1667,  0.1429,  0.125 ,  0.1111]])
'''

n = 14
A = hilbert(n)
x = np.random.uniform(-10,10,n)
b = A @ x

A_inv = np.linalg.inv(A)

np.linalg.norm(np.eye(n) - A @ A_inv)

# 5.0516495470543212

np.linalg.cond(A)

# 2.2271635826494112e+17

A @ A_inv

'''
array([[ 1.    ,  0.    , -0.0001,  0.0005, -0.0006,  0.0105, -0.0243,
         0.1862, -0.6351,  2.2005, -0.8729,  0.8925, -0.0032, -0.0106],
       [ 0.    ,  1.    , -0.    ,  0.    ,  0.0035,  0.0097, -0.0408,
         0.0773, -0.0524,  1.6926, -0.7776, -0.111 , -0.0403, -0.0184],
       [ 0.    ,  0.    ,  1.    ,  0.0002,  0.0017,  0.0127, -0.0273,
         0.    ,  0.    ,  1.4688, -0.5312,  0.2812,  0.0117,  0.0264],
       [ 0.    ,  0.    , -0.    ,  1.0005,  0.0013,  0.0098, -0.0225,
         0.1555, -0.0168,  1.1571, -0.9656, -0.0391,  0.018 , -0.0259],
       [-0.    ,  0.    , -0.    ,  0.0007,  1.0001,  0.0154,  0.011 ,
        -0.2319,  0.5651, -0.2017,  0.2933, -0.6565,  0.2835, -0.0482],
       [ 0.    , -0.    ,  0.    , -0.0004,  0.0059,  0.9945, -0.0078,
        -0.0018, -0.0066,  1.1839, -0.9919,  0.2144, -0.1866,  0.0187],
       [-0.    ,  0.    , -0.    ,  0.0009, -0.002 ,  0.0266,  0.974 ,
        -0.146 ,  0.1883, -0.2966,  0.4267, -0.8857,  0.2265, -0.0453],
       [ 0.    ,  0.    , -0.    ,  0.0002,  0.0009,  0.0197, -0.0435,
         1.1372, -0.0692,  0.7691, -1.233 ,  0.1159, -0.1766, -0.0033],
       [ 0.    ,  0.    , -0.    ,  0.0002,  0.    , -0.0018, -0.0136,
         0.1332,  0.945 ,  0.3652, -0.2478, -0.1682,  0.0756, -0.0212],
       [ 0.    , -0.    , -0.    ,  0.0003,  0.0038, -0.0007,  0.0318,
        -0.0738,  0.2245,  1.2023, -0.2623, -0.2783,  0.0486, -0.0093],
       [-0.    ,  0.    , -0.    ,  0.0004, -0.0006,  0.013 , -0.0415,
         0.0292, -0.0371,  0.169 ,  1.0715, -0.09  ,  0.1668, -0.0197],
       [ 0.    , -0.    ,  0.    ,  0.    ,  0.0016,  0.0062, -0.0504,
         0.1476, -0.2341,  0.8454, -0.7907,  1.4812, -0.15  ,  0.0186],
       [ 0.    , -0.    ,  0.    , -0.0001,  0.0022,  0.0034, -0.0296,
         0.0944, -0.1833,  0.6901, -0.6526,  0.2556,  0.8563,  0.0128],
       [ 0.    ,  0.    ,  0.    , -0.0001,  0.0018, -0.0041, -0.0057,
        -0.0374, -0.165 ,  0.3968, -0.2264, -0.1538, -0.0076,  1.005 ]])
'''

row_names = ['Normal Eqns- Naive',
             'QR Factorization', 
             'SVD', 
             'Scipy lstsq']

name2func = {'Normal Eqns- Naive': 'ls_naive', 
             'QR Factorization': 'ls_qr',
             'SVD': 'ls_svd',
             'Scipy lstsq': 'scipylstq'}

pd.options.display.float_format = '{:,.9f}'.format
df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])

for name in row_names:
    fcn = name2func[name]
    t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())
    coeffs = locals()[fcn](A, b)
    df.set_value(name, 'Time', t)
    df.set_value(name, 'Error', regr_metrics(b, A @ coeffs)[0])
```

### SVD 在这里最好

不要重新运行。

```py
df
```


| | Time | Error |
| --- | --- | --- |
| Normal Eqns- Naive | 0.001334339 | 3.598901966 |
| QR Factorization | 0.002166139 | 0.000000000 |
| SVD | 0.001556937 | 0.000000000 |
| Scipy lstsq | 0.001871590 | 0.000000000 |

即使`A`是稀疏的，![A^{-1}](img/tex-1ff4e7c4ea49e4f89fcea2a90968d87f.gif) 通常是密集的。对于大型矩阵，![A^{-1}](img/tex-1ff4e7c4ea49e4f89fcea2a90968d87f.gif) 放不进内存。

## 运行时间

+   矩阵求逆：![2n^3](img/tex-212090d41dd7136dd288934d2aa0addd.gif)
+   矩阵乘法：![n^3](img/tex-25ebb7d03839869698867bbbf0a9932a.gif)
+   Cholesky：![1/3 n^3](img/tex-e2c7c210aaacf3c6067aab0b96f6caa6.gif)
+   QR，Gram Schmidt：![2mn^2](img/tex-8104a2c0754d95be6fc08c624a869f51.gif)，![m \ge n](img/tex-9856f83106c82c34de1010bf61297500.gif)（Trefethen 第 8 章）
+   QR，Householder：2![mn^2-2/3 n^3](img/tex-b6515bc4c1610897c8eb15e9a7c41f8a.gif)（Trefethen 第 10 章）
+   求解三角形系统：![n^2](img/tex-6595d679e306a127a3fe53268bcaddb2.gif)

为什么 Cholesky 较快：

![](img/cholesky_factorization_speed.png)

> 来源：[斯坦福凸优化：数值线性代数背景幻灯片](http://stanford.edu/class/ee364a/lectures/num-lin-alg.pdf)

### QR 最优的一个案例

```py
m=100
n=15
t=np.linspace(0, 1, m)

# 范德蒙矩阵
A=np.stack([t**i for i in range(n)], 1)
b=np.exp(np.sin(4*t))

# 这将使解决方案标准化为 1
b /= 2006.787453080206

from matplotlib import pyplot as plt
%matplotlib inline

plt.plot(t, b)

# [<matplotlib.lines.Line2D at 0x7fdfc1fa7eb8>]
```

![](img/8-1.png)

检查我们得到了 1：

```py
1 - ls_qr(A, b)[14]

# 1.4137685733217609e-07
```

不好的条件数：

```py
kappa = np.linalg.cond(A); kappa

# 5.827807196683593e+17

row_names = ['Normal Eqns- Naive',
             'QR Factorization', 
             'SVD', 
             'Scipy lstsq']

name2func = {'Normal Eqns- Naive': 'ls_naive', 
             'QR Factorization': 'ls_qr',
             'SVD': 'ls_svd',
             'Scipy lstsq': 'scipylstq'}

pd.options.display.float_format = '{:,.9f}'.format
df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])

for name in row_names:
    fcn = name2func[name]
    t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())
    coeffs = locals()[fcn](A, b)
    df.set_value(name, 'Time', t)
    df.set_value(name, 'Error', np.abs(1 - coeffs[-1]))

df
```


| | Time | Error |
| --- | --- | --- |
| Normal Eqns- Naive | 0.001565099 | 1.357066025 |
| QR Factorization | 0.002632104 | 0.000000116 |
| SVD | 0.003503785 | 0.000000116 |
| Scipy lstsq | 0.002763502 | 0.000000116 |

通过正规方程求解最小二乘的解决方案通常是不稳定的，尽管对于小条件数的问题是稳定的。

### 低秩

```py
m = 100
n = 10
x = np.random.uniform(-10,10,n)
A2 = np.random.uniform(-40,40, [m, int(n/2)])   # removed np.asfortranarray
A = np.hstack([A2, A2])

A.shape, A2.shape

# ((100, 10), (100, 5))

b = A @ x + np.random.normal(0,1,m)

row_names = ['Normal Eqns- Naive',
             'QR Factorization', 
             'SVD', 
             'Scipy lstsq']

name2func = {'Normal Eqns- Naive': 'ls_naive', 
             'QR Factorization': 'ls_qr',
             'SVD': 'ls_svd',
             'Scipy lstsq': 'scipylstq'}

pd.options.display.float_format = '{:,.9f}'.format
df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])

for name in row_names:
    fcn = name2func[name]
    t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())
    coeffs = locals()[fcn](A, b)
    df.set_value(name, 'Time', t)
    df.set_value(name, 'Error', regr_metrics(b, A @ coeffs)[0])

df
```

|  | Time | Error |
| --- | --- | --- |
| Normal Eqns- Naive | 0.001227640 | 300.658979382 |
| QR Factorization | 0.002315920 | 0.876019803 |
| SVD | 0.001745647 | 1.584746056 |
| Scipy lstsq | 0.002067989 | 0.804750398 |

## 比较

比较我们的结果和上面：

```py
df
```

| # rows | 100 | | | 1000 | | | 10000 | | |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| # cols | 20 | 100 | 1000 | 20 | 100 | 1000 | 20 | 100 | 1000 |
| Normal Eqns- Naive | 0.001276 | 0.003634 | NaN | 0.000960 | 0.005172 | 0.293126 | 0.002226 | 0.021248 | 1.164655 |
| Normal Eqns- Cholesky | 0.001660 | 0.003958 | NaN | 0.001665 | 0.004007 | 0.093696 | 0.001928 | 0.010456 | 0.399464 |
| QR Factorization | 0.002174 | 0.006486 | NaN | 0.004235 | 0.017773 | 0.213232 | 0.019229 | 0.116122 | 2.208129 |
| SVD | 0.003880 | 0.021737 | NaN | 0.004672 | 0.026950 | 1.280490 | 0.018138 | 0.130652 | 3.433003 |
| Scipy lstsq | 0.004338 | 0.020198 | NaN | 0.004320 | 0.021199 | 1.083804 | 0.012200 | 0.088467 | 2.134780 |

来自 Trefethen（第 84 页）：

正规方程式/ Cholesky 在生效时速度最快。 Cholesky 只能用于对称正定矩阵。 此外，对于具有高条件数或具有低秩的矩阵，正规方程/ Cholesky 是不稳定的。

数值分析师推荐通过 QR 进行线性回归，作为多年的标准方法。 它自然，优雅，适合“日常使用”。