提交 37af5e44 编写于 作者: W wizardforcel

6

上级 7c6b5239
# 6\. Statistics and Linear Algebra Preliminaries
# 6\. 统计与线性代数预备
Chinese proverb
**知彼知己,百战不殆;不知彼而知己,一胜一负;不知彼,不知己,每战必殆。** – 《孙子兵法》
**If you only know yourself, but not your opponent, you may win or may lose. If you know neither yourself nor your enemy, you will always endanger yourself** – idiom, from Sunzi’s Art of War
## 6.1\. 表示法
## 6.1\. Notations
* m:样本数
* n:特征数
* ![y_i](img/8f58cf98a539286a53e41582f194fbed.jpg):第`i`个标签
* ![\hat{y}_i](img/585d98b9749f0661bc9077e01f28eb15.jpg):第`i`个预测标签
* ![{\displaystyle {\bar {\y}}} = {\frac {1}{m}}\sum _{i=1}^{m}y_{i}](img/791424a3e5f6e2f4372471d96e5b4676.jpg):![\y](img/afa87c5126806e604709f243ab72848b.jpg) 的均值
* ![\y](img/afa87c5126806e604709f243ab72848b.jpg):标签向量
* ![\hat{\y}](img/bab25b7785bf747bc1caa1442874df74.jpg):预测标签向量
* m : the number of the samples
* n : the number of the features
* ![y_i](img/8f58cf98a539286a53e41582f194fbed.jpg) : i-th label
* ![\hat{y}_i](img/585d98b9749f0661bc9077e01f28eb15.jpg) : i-th predicted label
* ![{\displaystyle {\bar {\y}}} = {\frac {1}{m}}\sum _{i=1}^{m}y_{i}](img/791424a3e5f6e2f4372471d96e5b4676.jpg) : the mean of ![\y](img/afa87c5126806e604709f243ab72848b.jpg).
* ![\y](img/afa87c5126806e604709f243ab72848b.jpg) : the label vector.
* ![\hat{\y}](img/bab25b7785bf747bc1caa1442874df74.jpg) : the predicted label vector.
## 6.2\. 线性代数预备
## 6.2\. Linear Algebra Preliminaries
Since I have documented the Linear Algebra Preliminaries in my Prelim Exam note for Numerical Analysis, the interested reader is referred to [[Feng2014]](reference.html#feng2014) for more details (Figure. [Linear Algebra Preliminaries](#fig-linear-algebra)).
由于我在我的数值分析考试笔记中记录了线性代数预备,有兴趣的读者可以参考 [[Feng2014]](reference.html#feng2014)了解更多细节。
![https://runawayhorse001.github.io/LearningApacheSpark/_images/linear_algebra.png](img/c089ca6ef2f36b0394d7bcf41db78030.jpg)
Linear Algebra Preliminaries
线性代数预备
## 6.3\. Measurement Formula
## 6.3\. 测量公式
### 6.3.1\. Mean absolute error
### 6.3.1\. 平均绝对误差
In statistics, **MAE** ([Mean absolute error](https://en.wikipedia.org/wiki/Mean_absolute_error)) is a measure of difference between two continuous variables. The Mean Absolute Error is given by:
在统计学中,**MAE**[平均绝对误差](https://en.wikipedia.org/wiki/Mean_absolute_error))衡量两个连续变量间的差异。 平均绝对误差由下式给出:
![{\displaystyle \mathrm {MAE} ={\frac{1}{m} {\sum _{i=1}^{m}\left|\hat{y}_i-y_i\right|}}.}](img/61bccf1d55cc6636fce9585573c9981a.jpg)
### 6.3.2\. Mean squared error
### 6.3.2\. 均方误差
In statistics, the **MSE** ([Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error)) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.
在统计中,估计器(估计未观测量的过程)的 **MSE**[均方误差](https://en.wikipedia.org/wiki/Mean_squared_error))测量了误差或偏差的平方的平均值 - 即估计器与被估计值之间的差异。
![\text{MSE}=\frac{1}{m}\sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2](img/3152173a8fd696819c7a2c2b8c6ef005.jpg)
### 6.3.3\. Root Mean squared error
### 6.3.3\. 均方根误差
![\text{RMSE} = \sqrt{\text{MSE}}=\sqrt{\frac{1}{m}\sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2}](img/c8a2ccec457f128649ad30a2ba066a48.jpg)
### 6.3.4\. Total sum of squares
### 6.3.4\. 总体平方和
In statistical data analysis the **TSS** ([Total Sum of Squares](https://en.wikipedia.org/wiki/Total_sum_of_squares)) is a quantity that appears as part of a standard way of presenting results of such analyses. It is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean.
在统计数据分析中,**TSS**[总体平方和](https://en.wikipedia.org/wiki/Total_sum_of_squares))是一个数量,作为呈现此类分析结果的标准方式的一部分。 它被定义为在所有观察中,每个观测值与总体平均值的平方差的总和。
![\text{TSS} = \sum_{i=1}^m\left( y_i-\bar{\y}\right)^2](img/16fd7a4c078cf22fee09b636dc10d55c.jpg)
### 6.3.5\. Explained Sum of Squares
### 6.3.5\. 解释平方和
In statistics, the **ESS** ([Explained sum of squares](https://en.wikipedia.org/wiki/Explained_sum_of_squares)), alternatively known as the model sum of squares or sum of squares due to regression.
在统计学中,**ESS**[解释平方和](https://en.wikipedia.org/wiki/Explained_sum_of_squares)),或者称为模型平方和或回归平方和。
The ESS is the sum of the squares of the differences of the predicted values and the mean value of the response variable which is given by:
ESS 是预测值和响应变量的均值的差的平方和,由下式给出:
![\text{ESS}= \sum_{i=1}^m\left( \hat{y}_i-\bar{\y}\right)^2](img/8dc8e70e19ec4318b12b16f1c5bdb879.jpg)
### 6.3.6\. Residual Sum of Squares
### 6.3.6\. 残差平方和
In statistics, **RSS** ([Residual sum of squares](https://en.wikipedia.org/wiki/Residual_sum_of_squares)), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals which is given by:
在统计中,**RSS/SSR**[残差平方和](https://en.wikipedia.org/wiki/Residual_sum_of_squares)),也称为预测误差平方和 预测(SSE),由下式给出:
![\text{RSS}= \sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2](img/95594348fc6d49d2819be3d412a27e55.jpg)
### 6.3.7\. Coefficient of determination ![R^2](img/1ac835166928f502b55a31636602602a.jpg)
### 6.3.7\. 判定系数 ![R^2](img/1ac835166928f502b55a31636602602a.jpg)
![R^{2} := \frac{ESS}{TSS} = 1-{\text{RSS} \over \text{TSS}}.\,](img/fef76f108c095f250d8e9efb4cfcb710.jpg)
Note
In general (![\y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}](img/b288f19072faa2f8f373d5a8910c080b.jpg)), total sum of squares = explained sum of squares + residual sum of squares, i.e.:
> 注意
>
> 一般来说,(![\y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}](img/b288f19072faa2f8f373d5a8910c080b.jpg)),总体平方和,等于解释平方和加上残差平方和,也就是:
![\text{TSS} = \text{ESS} + \text{RSS} \text{ if and only if } {\displaystyle \y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}}.](img/4a1a112aa8490f7c8410b710845e8c7a.jpg)
More details can be found at [Partitioning in the general ordinary least squares model](https://en.wikipedia.org/wiki/Explained_sum_of_squares).
更多细节可以在[普通最小二乘模型中的分区](https://en.wikipedia.org/wiki/Explained_sum_of_squares)中找到。
## 6.4\. Confusion Matrix
## 6.4\. 混淆矩阵
![https://runawayhorse001.github.io/LearningApacheSpark/_images/confusion_matrix.png](img/c789e9bbaa3506dc90047b5cd487a42a.jpg)
Confusion Matrix
混淆矩阵
### 6.4.1\. Recall
### 6.4.1\. 召回率
![\text{Recall}=\frac{\text{TP}}{\text{TP+FN}}](img/3f26c9365c0603f014f3bba403ed27fb.jpg)
### 6.4.2\. Precision
### 6.4.2\. 精确率
![\text{Precision}=\frac{\text{TP}}{\text{TP+FP}}](img/1a8a8647a66b744ccd5c9137adb66255.jpg)
### 6.4.3\. Accuracy
### 6.4.3\. 准确率
![\text{Accuracy }=\frac{\text{TP+TN}}{\text{Total}}](img/5a13655c0030372e1b06cd77ff1e53e0.jpg)
### 6.4.4\. ![F_1](img/baa636adac3ad30302c0a36fc2f58751.jpg)-score
### 6.4.4\. F1 得分
![\text{F}_1=\frac{2*\text{Recall}*\text{Precision}}{\text{Recall}+ \text{Precision}}](img/1cef776388e6c2cba3cf00cab2199e3d.jpg)
## 6.5\. Statistical Tests
## 6.5\. 统计检验
### 6.5.1\. Correlational Test
### 6.5.1\. 互相关检验
* Pearson correlation: Tests for the strength of the association between two continuous variables.
* Spearman correlation: Tests for the strength of the association between two ordinal variables (does not rely on the assumption of normal distributed data).
* Chi-square: Tests for the strength of the association between two categorical variables.
* Pearson 互相关: 检验两个连续变量之间的相关度。
* Spearman 互相关: 检验两个序数变量之间的相关度(不依赖于正态分布数据的假设)。
* 卡方: 检验两个类别变量之间的相关度。
### 6.5.2\. Comparison of Means test
### 6.5.2\. 均值检验的比较
* Paired T-test: Tests for difference between two related variables.
* Independent T-test: Tests for difference between two independent variables.
* ANOVA: Tests the difference between group means after any other variance in the outcome variable is accounted for.
* 配对 T 检验: 检验两个相关变量之间的差异
* 独立 T 检验: 检验两个独立变量之间的差异
* ANOVA: 在考虑结果变量中的任何其他变化之后,检验组均值之间的差异。
### 6.5.3\. Non-parametric Test
### 6.5.3\. 非配对检验
* Wilcoxon rank-sum test: Tests for difference between two independent variables - takes into account magnitude and direction of difference.
* Wilcoxon sign-rank test: Tests for difference between two related variables - takes into account magnitude and direction of difference.
* Sign test: Tests if two related variables are different – ignores magnitude of change, only takes into account direction.
\ No newline at end of file
* Wilcoxon 秩和检验: 检验两个独立变量之间的差异 - 考虑差异的大小和方向。
* Wilcoxon 符号秩检验: 检验两个相关变量之间的差异 - 考虑差异的大小和方向。
* 符号检验: 检验两个相关变量是否不同 - 忽略变化大小,仅考虑方向。
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册