6

37af5e44 · wizardforcel · 7c6b5239 · 37af5e44
隐藏空白更改
内联并排

Showing with 50 addition and 52 deletion

docs/6.md docs/6.md +50 -52

未找到文件。
--- a/docs/6.md
+++ b/docs/6.md
-# 6\. Statistics and Linear Algebra Preliminaries
+# 6\. 统计与线性代数预备

-Chinese proverb
+**知彼知己，百战不殆；不知彼而知己，一胜一负；不知彼，不知己，每战必殆。** – 《孙子兵法》

-**If you only know yourself, but not your opponent, you may win or may lose. If you know neither yourself nor your enemy, you will always endanger yourself** – idiom, from Sunzi’s Art of War
+## 6.1\. 表示法

-## 6.1\. Notations
+*   m：样本数
+*   n：特征数
+*   ![y_i](img/8f58cf98a539286a53e41582f194fbed.jpg)：第`i`个标签
+*   ![\hat{y}_i](img/585d98b9749f0661bc9077e01f28eb15.jpg)：第`i`个预测标签
+*   ![{\displaystyle {\bar {\y}}} = {\frac {1}{m}}\sum _{i=1}^{m}y_{i}](img/791424a3e5f6e2f4372471d96e5b4676.jpg)：![\y](img/afa87c5126806e604709f243ab72848b.jpg) 的均值
+*   ![\y](img/afa87c5126806e604709f243ab72848b.jpg)：标签向量
+*   ![\hat{\y}](img/bab25b7785bf747bc1caa1442874df74.jpg)：预测标签向量

-*   m : the number of the samples
-*   n : the number of the features
-*   ![y_i](img/8f58cf98a539286a53e41582f194fbed.jpg) : i-th label
-*   ![\hat{y}_i](img/585d98b9749f0661bc9077e01f28eb15.jpg) : i-th predicted label
-*   ![{\displaystyle {\bar {\y}}} = {\frac {1}{m}}\sum _{i=1}^{m}y_{i}](img/791424a3e5f6e2f4372471d96e5b4676.jpg) : the mean of ![\y](img/afa87c5126806e604709f243ab72848b.jpg).
-*   ![\y](img/afa87c5126806e604709f243ab72848b.jpg) : the label vector.
-*   ![\hat{\y}](img/bab25b7785bf747bc1caa1442874df74.jpg) : the predicted label vector.
+## 6.2\. 线性代数预备

-## 6.2\. Linear Algebra Preliminaries
-
-Since I have documented the Linear Algebra Preliminaries in my Prelim Exam note for Numerical Analysis, the interested reader is referred to [[Feng2014]](reference.html#feng2014) for more details (Figure. [Linear Algebra Preliminaries](#fig-linear-algebra)).
+由于我在我的数值分析考试笔记中记录了线性代数预备，有兴趣的读者可以参考 [[Feng2014]](reference.html#feng2014)了解更多细节。

 ![https://runawayhorse001.github.io/LearningApacheSpark/_images/linear_algebra.png](img/c089ca6ef2f36b0394d7bcf41db78030.jpg)

-Linear Algebra Preliminaries
+线性代数预备

-## 6.3\. Measurement Formula
+## 6.3\. 测量公式

-### 6.3.1\. Mean absolute error
+### 6.3.1\. 平均绝对误差

-In statistics, **MAE** ([Mean absolute error](https://en.wikipedia.org/wiki/Mean_absolute_error)) is a measure of difference between two continuous variables. The Mean Absolute Error is given by:
+在统计学中，**MAE**（[平均绝对误差](https://en.wikipedia.org/wiki/Mean_absolute_error)）衡量两个连续变量间的差异。 平均绝对误差由下式给出：

 ![{\displaystyle \mathrm {MAE} ={\frac{1}{m} {\sum _{i=1}^{m}\left|\hat{y}_i-y_i\right|}}.}](img/61bccf1d55cc6636fce9585573c9981a.jpg)

-### 6.3.2\. Mean squared error
+### 6.3.2\. 均方误差

-In statistics, the **MSE** ([Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error)) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated.
+在统计中，估计器（估计未观测量的过程）的 **MSE**（[均方误差](https://en.wikipedia.org/wiki/Mean_squared_error)）测量了误差或偏差的平方的平均值 - 即估计器与被估计值之间的差异。

 ![\text{MSE}=\frac{1}{m}\sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2](img/3152173a8fd696819c7a2c2b8c6ef005.jpg)

-### 6.3.3\. Root Mean squared error
+### 6.3.3\. 均方根误差

 ![\text{RMSE} = \sqrt{\text{MSE}}=\sqrt{\frac{1}{m}\sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2}](img/c8a2ccec457f128649ad30a2ba066a48.jpg)

-### 6.3.4\. Total sum of squares
+### 6.3.4\. 总体平方和

-In statistical data analysis the **TSS** ([Total Sum of Squares](https://en.wikipedia.org/wiki/Total_sum_of_squares)) is a quantity that appears as part of a standard way of presenting results of such analyses. It is defined as being the sum, over all observations, of the squared differences of each observation from the overall mean.
+在统计数据分析中，**TSS**（[总体平方和](https://en.wikipedia.org/wiki/Total_sum_of_squares)）是一个数量，作为呈现此类分析结果的标准方式的一部分。 它被定义为在所有观察中，每个观测值与总体平均值的平方差的总和。

 ![\text{TSS} =  \sum_{i=1}^m\left( y_i-\bar{\y}\right)^2](img/16fd7a4c078cf22fee09b636dc10d55c.jpg)

-### 6.3.5\. Explained Sum of Squares
+### 6.3.5\. 解释平方和

-In statistics, the **ESS** ([Explained sum of squares](https://en.wikipedia.org/wiki/Explained_sum_of_squares)), alternatively known as the model sum of squares or sum of squares due to regression.
+在统计学中，**ESS**（[解释平方和](https://en.wikipedia.org/wiki/Explained_sum_of_squares)），或者称为模型平方和或回归平方和。

-The ESS is the sum of the squares of the differences of the predicted values and the mean value of the response variable which is given by:
+ESS 是预测值和响应变量的均值的差的平方和，由下式给出：

 ![\text{ESS}= \sum_{i=1}^m\left( \hat{y}_i-\bar{\y}\right)^2](img/8dc8e70e19ec4318b12b16f1c5bdb879.jpg)

-### 6.3.6\. Residual Sum of Squares
+### 6.3.6\. 残差平方和

-In statistics, **RSS** ([Residual sum of squares](https://en.wikipedia.org/wiki/Residual_sum_of_squares)), also known as the sum of squared residuals (SSR) or the sum of squared errors of prediction (SSE), is the sum of the squares of residuals which is given by:
+在统计中，**RSS/SSR**（[残差平方和](https://en.wikipedia.org/wiki/Residual_sum_of_squares)），也称为预测误差平方和 预测（SSE），由下式给出：

 ![\text{RSS}= \sum_{i=1}^m\left( \hat{y}_i-y_i\right)^2](img/95594348fc6d49d2819be3d412a27e55.jpg)

-### 6.3.7\. Coefficient of determination ![R^2](img/1ac835166928f502b55a31636602602a.jpg)
+### 6.3.7\. 判定系数 ![R^2](img/1ac835166928f502b55a31636602602a.jpg)

 ![R^{2} := \frac{ESS}{TSS} = 1-{\text{RSS} \over \text{TSS}}.\,](img/fef76f108c095f250d8e9efb4cfcb710.jpg)

-Note
-
-In general (![\y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}](img/b288f19072faa2f8f373d5a8910c080b.jpg)), total sum of squares = explained sum of squares + residual sum of squares, i.e.:
+> 注意
+> 
+> 一般来说，(![\y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}](img/b288f19072faa2f8f373d5a8910c080b.jpg))，总体平方和，等于解释平方和加上残差平方和，也就是：

 ![\text{TSS} = \text{ESS} + \text{RSS} \text{ if and only if } {\displaystyle \y^{T}{\bar {\y}}={\hat {\y}}^{T}{\bar {\y}}}.](img/4a1a112aa8490f7c8410b710845e8c7a.jpg)

-More details can be found at [Partitioning in the general ordinary least squares model](https://en.wikipedia.org/wiki/Explained_sum_of_squares).
+更多细节可以在[普通最小二乘模型中的分区](https://en.wikipedia.org/wiki/Explained_sum_of_squares)中找到。

-## 6.4\. Confusion Matrix
+## 6.4\. 混淆矩阵

 ![https://runawayhorse001.github.io/LearningApacheSpark/_images/confusion_matrix.png](img/c789e9bbaa3506dc90047b5cd487a42a.jpg)

-Confusion Matrix
+混淆矩阵

-### 6.4.1\. Recall
+### 6.4.1\. 召回率

 ![\text{Recall}=\frac{\text{TP}}{\text{TP+FN}}](img/3f26c9365c0603f014f3bba403ed27fb.jpg)

-### 6.4.2\. Precision
+### 6.4.2\. 精确率

 ![\text{Precision}=\frac{\text{TP}}{\text{TP+FP}}](img/1a8a8647a66b744ccd5c9137adb66255.jpg)

-### 6.4.3\. Accuracy
+### 6.4.3\. 准确率

 ![\text{Accuracy }=\frac{\text{TP+TN}}{\text{Total}}](img/5a13655c0030372e1b06cd77ff1e53e0.jpg)

-### 6.4.4\. ![F_1](img/baa636adac3ad30302c0a36fc2f58751.jpg)-score
+### 6.4.4\. F1 得分

 ![\text{F}_1=\frac{2*\text{Recall}*\text{Precision}}{\text{Recall}+ \text{Precision}}](img/1cef776388e6c2cba3cf00cab2199e3d.jpg)

-## 6.5\. Statistical Tests
+## 6.5\. 统计检验

-### 6.5.1\. Correlational Test
+### 6.5.1\. 互相关检验

-*   Pearson correlation: Tests for the strength of the association between two continuous variables.
-*   Spearman correlation: Tests for the strength of the association between two ordinal variables (does not rely on the assumption of normal distributed data).
-*   Chi-square: Tests for the strength of the association between two categorical variables.
+*   Pearson 互相关: 检验两个连续变量之间的相关度。
+*   Spearman 互相关: 检验两个序数变量之间的相关度（不依赖于正态分布数据的假设）。
+*   卡方: 检验两个类别变量之间的相关度。

-### 6.5.2\. Comparison of Means test
+### 6.5.2\. 均值检验的比较

-*   Paired T-test: Tests for difference between two related variables.
-*   Independent T-test: Tests for difference between two independent variables.
-*   ANOVA: Tests the difference between group means after any other variance in the outcome variable is accounted for.
+*   配对 T 检验: 检验两个相关变量之间的差异
+*   独立 T 检验: 检验两个独立变量之间的差异
+*   ANOVA: 在考虑结果变量中的任何其他变化之后，检验组均值之间的差异。

-### 6.5.3\. Non-parametric Test
+### 6.5.3\. 非配对检验

-*   Wilcoxon rank-sum test: Tests for difference between two independent variables - takes into account magnitude and direction of difference.
-*   Wilcoxon sign-rank test: Tests for difference between two related variables - takes into account magnitude and direction of difference.
-*   Sign test: Tests if two related variables are different – ignores magnitude of change, only takes into account direction.
\ No newline at end of file
+*   Wilcoxon 秩和检验: 检验两个独立变量之间的差异 - 考虑差异的大小和方向。
+*   Wilcoxon 符号秩检验: 检验两个相关变量之间的差异 - 考虑差异的大小和方向。
+*   符号检验: 检验两个相关变量是否不同 - 忽略变化大小，仅考虑方向。
\ No newline at end of file