diff --git a/10.md b/10.md index 7a6ff5454eba41d4dca88572ea3c5268f1b2493b..90c7fc9bdf58df36bb76fa8bd52a37f1339d7bef 100644 --- a/10.md +++ b/10.md @@ -656,12 +656,12 @@ P 值是在原假设下,检验统计量等于在数据中观察到的值,或 检验统计量。0.75 与花为紫色的植物的观察比例的距离: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Btest%20statistic%7D%20%7E%3D%7E%20%7C%5Cmbox%7Bobserved%20proportion%20purple%7D%20-%200.75%7C) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Btest%20statistic%7D%20%7E%3D%7E%20%7C%5Cmbox%7Bobserved%20proportion%20purple%7D%20-%200.75%7C) 样本量较大(929),所以如果孟德尔的模型好,那么观察到的紫色花的比例应该接近 0.75。 如果孟德尔的模型是错误的,则观察到的紫色比例不应该接近0.75,从而使统计值量更大。 因此,在这种情况下,“备选假设的方向”意味着“更大”。 -检验统计量的观测值(四舍五入到小数点后五位)是 ![](https://www.zhihu.com/equation?tex=%7C0.75888%20-%200.75%7C%20%7E%3D%7E%200.00888)。根据定义,P 值是从孟德尔的模型中抽取的样本,产生 0.00888 或更大的统计量的几率。 +检验统计量的观测值(四舍五入到小数点后五位)是 ![](http://latex.codecogs.com/gif.latex?%7C0.75888%20-%200.75%7C%20%7E%3D%7E%200.00888)。根据定义,P 值是从孟德尔的模型中抽取的样本,产生 0.00888 或更大的统计量的几率。 虽然我们还没有学会如何精确地计算这个几率,但我们可以通过模拟来逼近它,这就是我们在前一节中所做的。 以下是该部分的所有相关代码。 @@ -803,7 +803,7 @@ results.where('Random Sample Mean', are.between(12.99, 13.01)).num_rows 备选假设:硬币不均匀。 -假设你的数据基于 400 个硬币的投掷。 你会预计平等的硬币能够在 400 个次投掷中拥有 200 个正面,所以合理的检验统计量就是使用 ![](https://www.zhihu.com/equation?tex=%5Cmbox%7Btest%20statistic%7D%20%7E%3D%7E%20%7C%5Cmbox%7Bnumber%20of%20heads%7D%20-%20200%7C)。 +假设你的数据基于 400 个硬币的投掷。 你会预计平等的硬币能够在 400 个次投掷中拥有 200 个正面,所以合理的检验统计量就是使用 ![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Btest%20statistic%7D%20%7E%3D%7E%20%7C%5Cmbox%7Bnumber%20of%20heads%7D%20-%20200%7C)。 我们可以在均匀的原假设下模拟统计量。 @@ -866,7 +866,7 @@ results.hist(bins = np.arange(0, 45, 5)) ### 技术注解:识别拒绝域 -在上面的硬币投掷的例子中,我们基于 400 次投掷,使用 P 值的 3.5 倍的截断值来测试硬币的平等性。检验统计量是 ![](https://www.zhihu.com/equation?tex=%7C%5Cmbox%7Bnumber%20of%20heads%7D%20-%20200%7C)。我们在平等的原假设下模拟了这个统计量。 +在上面的硬币投掷的例子中,我们基于 400 次投掷,使用 P 值的 3.5 倍的截断值来测试硬币的平等性。检验统计量是 ![](http://latex.codecogs.com/gif.latex?%7C%5Cmbox%7Bnumber%20of%20heads%7D%20-%20200%7C)。我们在平等的原假设下模拟了这个统计量。 由于所有统计数据的前 3.5%,检验的结论是硬币是不平等的,在下面展示为红色。 diff --git a/11.md b/11.md index 8b5f2609944dd252bcd8a1c141a5ee7d53c8fe0a..4a9d2bdc976f92cec153e8377fb7bc0c1abbdafd 100644 --- a/11.md +++ b/11.md @@ -561,7 +561,7 @@ baby 这种关系的一个简单的衡量标准是出生体重与怀孕天数的比值。`ratios`表包含`baby`的前两列,以及一列`ratios`。 这一列的第一个条目按以下方式计算: -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B120%7E%5Cmbox%7Bounces%7D%7D%7B284%7E%5Cmbox%7Bdays%7D%7D%20%7E%5Capprox%20%7E%200.4225%7E%20%5Cmbox%7Bounces%20per%20day%7D) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B120%7E%5Cmbox%7Bounces%7D%7D%7B284%7E%5Cmbox%7Bdays%7D%7D%20%7E%5Capprox%20%7E%200.4225%7E%20%5Cmbox%7Bounces%20per%20day%7D) ```py ratios = baby.select('Birth Weight', 'Gestational Days').with_column( diff --git a/12.md b/12.md index a7e6c78ba1b89a31f5452c3023c1575c740c0137..5a978579f16794de6fda9040b2ef97a69db53676 100644 --- a/12.md +++ b/12.md @@ -80,7 +80,7 @@ np.mean(make_array(True, True, True, False)) 为了了解它,请注意,平均值可以用不同的方式计算。 -![](https://www.zhihu.com/equation?tex=%5Cbegin%7Balign*%7D%20%5Cmbox%7Bmean%7D%20%7E%20%26%3D%7E%204.25%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B2%20+%203%20+%203%20+%209%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B2%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%200.25%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%200.5%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%200.25%20%5Cend%7Balign*%7D) +![](http://latex.codecogs.com/gif.latex?%5Cbegin%7Balign*%7D%20%5Cmbox%7Bmean%7D%20%7E%20%26%3D%7E%204.25%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B2%20+%203%20+%203%20+%209%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%20%5Cfrac%7B2%7D%7B4%7D%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%20%5Cfrac%7B1%7D%7B4%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%202%20%5Ccdot%200.25%20%7E%7E%20+%20%7E%7E%203%20%5Ccdot%200.5%20%7E%7E%20+%20%7E%7E%209%20%5Ccdot%200.25%20%5Cend%7Balign*%7D) 最后一个表达式就是一个普遍事实的例子:当我们计算平均值时,集合中的每个不同的值都由它在集合中出现的时间比例加权。 @@ -484,7 +484,7 @@ nba13.sort('Age in 2013').show(3) 俄罗斯数学家切比雪夫(Pafnuty Chebychev,1821-1894)证明了这个结论,使我们的粗略陈述更加精确。 -对于所有列表和所有数字`z`,“均值上下`z`个标准差”范围内的条目比例至少为 ![](https://www.zhihu.com/equation?tex=1%20-%20%5Cfrac%7B1%7D%7Bz%5E2%7D)。 +对于所有列表和所有数字`z`,“均值上下`z`个标准差”范围内的条目比例至少为 ![](http://latex.codecogs.com/gif.latex?1%20-%20%5Cfrac%7B1%7D%7Bz%5E2%7D)。 值得注意的是,结果给出了一个界限,而不是一个确切的数值或近似值。 @@ -508,7 +508,7 @@ nba13.sort('Age in 2013').show(3) 要将一个值转换为标准单位,首先要求出距离平均值有多远,然后将该偏差与标准差比较。 -![](https://www.zhihu.com/equation?tex=z%20%7E%3D%7E%20%5Cfrac%7B%5Cmbox%7Bvalue%20%7D-%5Cmbox%7B%20average%7D%7D%7B%5Cmbox%7BSD%7D%7D) +![](http://latex.codecogs.com/gif.latex?z%20%7E%3D%7E%20%5Cfrac%7B%5Cmbox%7Bvalue%20%7D-%5Cmbox%7B%20average%7D%7D%7B%5Cmbox%7BSD%7D%7D) 我们将会看到,标准单位经常用于数据分析。 所以定义一个函数,将数值的数组转换为标准单位是很有用的。 @@ -632,7 +632,7 @@ plots.xticks(positions); 标准正态曲线的方程令人印象深刻。 但是现在,最好把它看作是变量直方图的平滑轮廓,变量以标准单位测量并具有钟形分布。 -![](https://www.zhihu.com/equation?tex=%5Cphi%28z%29%20%3D%20%7B%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%7D%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%7Dz%5E2%7D%2C%20%7E%7E%20-%5Cinfty%20%3C%20z%20%3C%20%5Cinfty) +![](http://latex.codecogs.com/gif.latex?%5Cphi%28z%29%20%3D%20%7B%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%7D%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%7Dz%5E2%7D%2C%20%7E%7E%20-%5Cinfty%20%3C%20z%20%3C%20%5Cinfty) ![](img/12-10.png) @@ -1148,7 +1148,7 @@ sd_comparison.plot('Sample Size n') 固定样本大小。如果样本是从总体中带放回随机抽取的: -![](https://www.zhihu.com/equation?tex=%7B%5Cmbox%7BSD%20of%20all%20possible%20sample%20means%7D%7D%20%7E%3D%7E%20%5Cfrac%7B%5Cmbox%7BPopulation%20SD%7D%7D%7B%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%7D) +![](http://latex.codecogs.com/gif.latex?%7B%5Cmbox%7BSD%20of%20all%20possible%20sample%20means%7D%7D%20%7E%3D%7E%20%5Cfrac%7B%5Cmbox%7BPopulation%20SD%7D%7D%7B%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%7D) 这是所有可能样本均值的标准差。 它大致衡量了样本均值与总体均值的差距。 @@ -1201,11 +1201,11 @@ sd_comparison.plot('Sample Size n') 我们愿意容忍`1% = 0.01`的宽度。因此,使用上一节中开发的公式: -![](https://www.zhihu.com/equation?tex=4%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%7D%20%7E%20%5Cle%20%7E%200.01) +![](http://latex.codecogs.com/gif.latex?4%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%7D%20%7E%20%5Cle%20%7E%200.01) 所以: -![](https://www.zhihu.com/equation?tex=%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B0.01%7D) +![](http://latex.codecogs.com/gif.latex?%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B0.01%7D) ### 01 集合的标准差 @@ -1265,8 +1265,8 @@ zero_one_sds.scatter("Population Proportion of 1's") ### 样本量 -我们知道了 ![](https://www.zhihu.com/equation?tex=%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B0.01%7D),并且 01 总体的标准差最大为 0.5,无论总体中 1 的比例。 所以这样是安全的: +我们知道了 ![](http://latex.codecogs.com/gif.latex?%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7BSD%20of%20the%200-1%20population%7D%7D%7B0.01%7D),并且 01 总体的标准差最大为 0.5,无论总体中 1 的比例。 所以这样是安全的: -![](https://www.zhihu.com/equation?tex=%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B0.5%7D%7B0.01%7D%20%7E%3D%7E%20200) +![](http://latex.codecogs.com/gif.latex?%5Csqrt%7B%5Cmbox%7Bsample%20size%7D%7D%20%7E%20%5Cge%20%7E%204%20%5Ctimes%20%5Cfrac%7B0.5%7D%7B0.01%7D%20%7E%3D%7E%20200) 所以样本量应该至少是`200 ^ 2 = 40,000`。 这是一个巨大的样本! 但是,如果你想以较高的置信度确保高精度,不管总体是什么样子,那就是你所需要的。 diff --git a/13.md b/13.md index 54f8a90658acbc1575f89f90ffac098d43d64a77..a2774e5b5939bb7a9bf4693fc9f1ff685c6137f1 100644 --- a/13.md +++ b/13.md @@ -604,17 +604,17 @@ regression_line(0.6) 在回归中,我们使用一个变量(我们称`x`)的值来预测另一个变量的值(我们称之为`y`)。 当变量`x`和`y`以标准单位测量时,基于`x`预测`y`的回归线斜率为`r`并通过原点。 因此,回归线的方程可写为: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bestimate%20of%20%7Dy%20%7E%3D%7E%20r%20%5Ccdot%20x%20%7E%7E%7E%20%5Cmbox%7Bwhen%20both%20variables%20are%20measured%20in%20standard%20units%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bestimate%20of%20%7Dy%20%7E%3D%7E%20r%20%5Ccdot%20x%20%7E%7E%7E%20%5Cmbox%7Bwhen%20both%20variables%20are%20measured%20in%20standard%20units%7D) 在数据的原始单位下,就变成了: -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Cmbox%7Bestimate%20of%7D%7Ey%20%7E-%7E%5Cmbox%7Baverage%20of%7D%7Ey%7D%7B%5Cmbox%7BSD%20of%7D%7Ey%7D%20%7E%3D%7E%20r%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7Bthe%20given%7D%7Ex%20%7E-%7E%5Cmbox%7Baverage%20of%7D%7Ex%7D%7B%5Cmbox%7BSD%20of%7D%7Ex%7D) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cmbox%7Bestimate%20of%7D%7Ey%20%7E-%7E%5Cmbox%7Baverage%20of%7D%7Ey%7D%7B%5Cmbox%7BSD%20of%7D%7Ey%7D%20%7E%3D%7E%20r%20%5Ctimes%20%5Cfrac%7B%5Cmbox%7Bthe%20given%7D%7Ex%20%7E-%7E%5Cmbox%7Baverage%20of%7D%7Ex%7D%7B%5Cmbox%7BSD%20of%7D%7Ex%7D) 原始单位的回归线的斜率和截距可以从上图中导出。 -![](https://www.zhihu.com/equation?tex=%5Cmathbf%7B%5Cmbox%7Bslope%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20r%20%5Ccdot%20%5Cfrac%7B%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmathbf%7B%5Cmbox%7Bslope%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20r%20%5Ccdot%20%5Cfrac%7B%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D) -![](https://www.zhihu.com/equation?tex=%5Cmathbf%7B%5Cmbox%7Bintercept%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20%7E-%7E%20%5Cmbox%7Bslope%7D%20%5Ccdot%20%5Cmbox%7Baverage%20of%20%7Dx) +![](http://latex.codecogs.com/gif.latex?%5Cmathbf%7B%5Cmbox%7Bintercept%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20%7E-%7E%20%5Cmbox%7Bslope%7D%20%5Ccdot%20%5Cmbox%7Baverage%20of%20%7Dx) 下面的三个函数计算相关性,斜率和截距。 它们都有三个参数:表的名称,包含`x`的列的标签以及包含`y`的列的标签。 @@ -651,7 +651,7 @@ galton_slope, galton_intercept 回归直线的方程是: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bestimate%20of%20child%27s%20height%7D%20%7E%3D%7E%200.64%20%5Ccdot%20%5Cmbox%7Bmidparent%20height%7D%20%7E+%7E%2022.64) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bestimate%20of%20child%27s%20height%7D%20%7E%3D%7E%200.64%20%5Ccdot%20%5Cmbox%7Bmidparent%20height%7D%20%7E+%7E%2022.64) 这也成为回归方程。回归方程的主要用途是根据`x`预测`y`。 @@ -769,13 +769,13 @@ slope(baby, 'Maternal Height', 'Maternal Pregnancy Weight') 为了计算回归线的方程,我们需要斜率和截距。 -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bslope%7D%20%7E%3D%7E%20%5Cfrac%7Br%20%5Ccdot%20%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D%20%7E%3D%7E%20%5Cfrac%7B0.5%20%5Ccdot%202%20%5Cmbox%7B%20inches%7D%7D%7B5%20%5Cmbox%7B%20pounds%7D%7D%20%7E%3D%7E%200.2%20%7E%5Cmbox%7Binches%20per%20pound%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bslope%7D%20%7E%3D%7E%20%5Cfrac%7Br%20%5Ccdot%20%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D%20%7E%3D%7E%20%5Cfrac%7B0.5%20%5Ccdot%202%20%5Cmbox%7B%20inches%7D%7D%7B5%20%5Cmbox%7B%20pounds%7D%7D%20%7E%3D%7E%200.2%20%7E%5Cmbox%7Binches%20per%20pound%7D) -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bintercept%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20-%20%5Cmbox%7Bslope%7D%5Ccdot%20%5Cmbox%7Baverage%20of%20%7D%20x%20%7E%3D%7E%2014%20%5Cmbox%7B%20inches%7D%20%7E-%7E%200.2%20%5Cmbox%7B%20inches%20per%20pound%7D%20%5Ccdot%2050%20%5Cmbox%7B%20pounds%7D%20%7E%3D%7E%204%20%5Cmbox%7B%20inches%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bintercept%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20-%20%5Cmbox%7Bslope%7D%5Ccdot%20%5Cmbox%7Baverage%20of%20%7D%20x%20%7E%3D%7E%2014%20%5Cmbox%7B%20inches%7D%20%7E-%7E%200.2%20%5Cmbox%7B%20inches%20per%20pound%7D%20%5Ccdot%2050%20%5Cmbox%7B%20pounds%7D%20%7E%3D%7E%204%20%5Cmbox%7B%20inches%7D) 回归线的方程允许我们,根据给定重量(磅)计算估计高度(英寸): -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bestimated%20height%7D%20%7E%3D%7E%200.2%20%5Ccdot%20%5Cmbox%7Bgiven%20weight%7D%20%7E+%7E%204) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bestimated%20height%7D%20%7E%3D%7E%200.2%20%5Ccdot%20%5Cmbox%7Bgiven%20weight%7D%20%7E+%7E%204) 线的斜率衡量随着重量的单位增长的估计高度的增长。 斜率是正值,重要的是要注意,这并不表示我们认为,如果体重增加巴塞特猎狗就会变得更高。 斜率反映了两组狗的平均身高的差异,这两组狗的体重相差 1 磅。 具体来说,考虑一组重量为`w`磅,以及另一组重量为`w + 1`磅的狗。 我们估计,第二组的均值高出 0.2 英寸。 对于样本中的所有`w`值都是如此。 @@ -956,7 +956,7 @@ Root mean squared error: 2701.69078531 首先注意,使均方根误差最小的直线,也是使平方误差最小的直线。 平方根对最小值没有任何影响。 所以我们会为自己节省一个计算步骤,并将平均方差 MSE 减到最小。 -我们试图根据《小女人》的句子数(`x`)来预测字符数量(`y`)。 如果我们使用 ![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bprediction%7D%20%7E%3D%7E%20ax%20+%20b) 直线,它将有一个 MSE,它取决于斜率`a`和截距`b`。 函数`lw_mse`以斜率和截距为参数,并返回相应的 MSE。 +我们试图根据《小女人》的句子数(`x`)来预测字符数量(`y`)。 如果我们使用 ![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bprediction%7D%20%7E%3D%7E%20ax%20+%20b) 直线,它将有一个 MSE,它取决于斜率`a`和截距`b`。 函数`lw_mse`以斜率和截距为参数,并返回相应的 MSE。 ```py def lw_mse(any_slope, any_intercept): @@ -1090,11 +1090,11 @@ array([ 0.09834382, 5.95962911]) 无论散点图的形状如何,都有一条独特的线,可以使估计的均方误差最小。 它被称为回归线,其斜率和截距由下式给出: -![](https://www.zhihu.com/equation?tex=%5Cmathbf%7B%5Cmbox%7Bslope%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20r%20%5Ccdot%20%5Cfrac%7B%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmathbf%7B%5Cmbox%7Bslope%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20r%20%5Ccdot%20%5Cfrac%7B%5Cmbox%7BSD%20of%20%7Dy%7D%7B%5Cmbox%7BSD%20of%20%7Dx%7D) > 译者注:也就是`cov(x, y)/var(x)`。 -![](https://www.zhihu.com/equation?tex=%5Cmathbf%7B%5Cmbox%7Bintercept%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20%7E-%7E%20%5Cmbox%7Bslope%7D%20%5Ccdot%20%5Cmbox%7Baverage%20of%20%7Dx) +![](http://latex.codecogs.com/gif.latex?%5Cmathbf%7B%5Cmbox%7Bintercept%20of%20the%20regression%20line%7D%7D%20%7E%3D%7E%20%5Cmbox%7Baverage%20of%20%7Dy%20%7E-%7E%20%5Cmbox%7Bslope%7D%20%5Ccdot%20%5Cmbox%7Baverage%20of%20%7Dx) ```py fitted = fit(shotput, 'Weight Lifted', 'Shot Put Distance') @@ -1160,13 +1160,13 @@ shotput.with_column('Best Quadratic Curve', shotput_fit).scatter(0) 假设数据科学家已经决定使用线性回归,基于预测变量估计响应变量的值。 为了了解这种估计方法的效果如何,数据科学家必须知道估计值距离实际值多远。 这些差异被称为残差。 -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bresidual%7D%20%7E%3D%7E%20%5Cmbox%7Bobserved%20value%7D%20%7E-%7E%20%5Cmbox%7Bregression%20estimate%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bresidual%7D%20%7E%3D%7E%20%5Cmbox%7Bobserved%20value%7D%20%7E-%7E%20%5Cmbox%7Bregression%20estimate%7D) 残差就是剩下的东西 - 估计之后的剩余。 残差是回归线和点的垂直距离。 散点图中的每个点都有残差。 残差是`y`的观测值与`y`的拟合值之间的差值,所以对于点`(x, y)`: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bresidual%7D%20%7E%7E%20%3D%20%7E%7E%20y%20%7E-%7E%20%5Cmbox%7Bfitted%20value%20of%20%7Dy%20%7E%7E%20%3D%20%7E%7E%20y%20%7E-%7E%20%5Cmbox%7Bheight%20of%20regression%20line%20at%20%7Dx) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bresidual%7D%20%7E%7E%20%3D%20%7E%7E%20y%20%7E-%7E%20%5Cmbox%7Bfitted%20value%20of%20%7Dy%20%7E%7E%20%3D%20%7E%7E%20y%20%7E-%7E%20%5Cmbox%7Bheight%20of%20regression%20line%20at%20%7Dx) `residual`函数计算残差。 该计算假设我们已经定义的所有相关函数:`standard_units`,`correlation`,`slope`,`intercept`和`fit`。 @@ -1379,9 +1379,9 @@ round(np.mean(dugong.column('Residual')), 10) ### 残差的标准差 -无论散点图的形状如何,残差的标准差是响应变量的标准差的一个比例。 比例是 ![](https://www.zhihu.com/equation?tex=%5Csqrt%7B1-r%5E2%7D)。 +无论散点图的形状如何,残差的标准差是响应变量的标准差的一个比例。 比例是 ![](http://latex.codecogs.com/gif.latex?%5Csqrt%7B1-r%5E2%7D)。 -![](https://www.zhihu.com/equation?tex=%5Cmbox%7BSD%20of%20residuals%7D%20%7E%3D%7E%20%5Csqrt%7B1%20-%20r%5E2%7D%20%5Ccdot%20%5Cmbox%7BSD%20of%20%7Dy) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7BSD%20of%20residuals%7D%20%7E%3D%7E%20%5Csqrt%7B1%20-%20r%5E2%7D%20%5Ccdot%20%5Cmbox%7BSD%20of%20%7Dy) 我们将很快看到,它如何衡量回归估计的准确性。 但首先,让我们通过例子来确认。 @@ -1426,11 +1426,11 @@ np.std(hybrid.column('residual')), np.sqrt(1 - r**2)*np.std(hybrid.column('mpg') 我们可以重写上面的结果,不管散点图的形状如何: -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Cmbox%7BSD%20of%20residuals%7D%7D%7B%5Cmbox%7BSD%20of%20%7Dy%7D%20%7E%3D%7E%20%5Csqrt%7B1-r%5E2%7D) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cmbox%7BSD%20of%20residuals%7D%7D%7B%5Cmbox%7BSD%20of%20%7Dy%7D%20%7E%3D%7E%20%5Csqrt%7B1-r%5E2%7D) 互补的结果是,无论散点图的形状如何,拟合值的标准差是观察值`y`的标准差的一个比例。比例是`|r|`。 -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Cmbox%7BSD%20of%20fitted%20values%7D%7D%7B%5Cmbox%7BSD%20of%20%7Dy%7D%20%7E%3D%7E%20%7Cr%7C) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cmbox%7BSD%20of%20fitted%20values%7D%7D%7B%5Cmbox%7BSD%20of%20%7Dy%7D%20%7E%3D%7E%20%7Cr%7C) 要查看比例在哪里出现,请注意拟合值全部位于回归线上,而`y`的观测值是散点图中所有点的高度,并且更加可变。 @@ -1471,8 +1471,8 @@ np.std(hybrid.column('fitted mpg'))/np.std(hybrid.column('mpg')) 解释这个结果的更标准的方法是,回想一下: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bvariance%7D%20%7E%3D%7E%20%5Cmbox%7Bmean%20squared%20deviation%20from%20average%7D%20%7E%3D%7E%20%5Cmbox%7BSD%7D%5E2) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bvariance%7D%20%7E%3D%7E%20%5Cmbox%7Bmean%20squared%20deviation%20from%20average%7D%20%7E%3D%7E%20%5Cmbox%7BSD%7D%5E2) 因此,对结果的两边取平方: -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Cmbox%7Bvariance%20of%20fitted%20values%7D%7D%7B%5Cmbox%7Bvariance%20of%20%7Dy%7D%20%7E%3D%7E%20r%5E2) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cmbox%7Bvariance%20of%20fitted%20values%7D%7D%7B%5Cmbox%7Bvariance%20of%20%7Dy%7D%20%7E%3D%7E%20r%5E2) diff --git a/15.md b/15.md index 4f07258a86228df1e2a23b78ebb2a64cad32a7f9..0e372ed3d1547ffda922494f55a13aa34a2b35eb 100644 --- a/15.md +++ b/15.md @@ -384,7 +384,7 @@ array([ 0.59610766, -0.19065363]) 我们如何实现呢? 在二维空间中,这非常简单。 如果我们在坐标`(x0, y0)`处有一个点,而在`(x1, y1)`处有另一个点,则它们之间的距离是: -![](https://www.zhihu.com/equation?tex=D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%7D) +![](http://latex.codecogs.com/gif.latex?D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%7D) (这是从哪里来的?它来自勾股定理:我们有一个直角三角形,边长为`x0 - x1`和`y0 - y1`,我们想要求出斜边的长度。) @@ -635,11 +635,11 @@ ax.scatter(banknotes.column('WaveletSkew'), 我们知道如何在二维空间中计算距离。 如果我们在坐标`(x0, y0)`处有一个点,而在`(x1, y1)`处有另一个点,则它们之间的距离是: -![](https://www.zhihu.com/equation?tex=D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%7D) +![](http://latex.codecogs.com/gif.latex?D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%7D) 在三维空间中,点是`(x0, y0, z0)`和`(x1, y1, z1)`,它们之间的距离公式为: -![](https://www.zhihu.com/equation?tex=D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%20+%20%28z_0-z_1%29%5E2%7D) +![](http://latex.codecogs.com/gif.latex?D%20%3D%20%5Csqrt%7B%28x_0-x_1%29%5E2%20+%20%28y_0-y_1%29%5E2%20+%20%28z_0-z_1%29%5E2%7D) 在 N 维空间中,东西有点难以可视化,但我想你可以看到公式是如何推广的:我们总结每个独立坐标差的平方,然后取平方根。 diff --git a/17.md b/17.md index 5151a6c96a96b7f8c1082581625cb779a03af2f1..4c1de680ab13b156542082ee972b0c3e31041a96 100644 --- a/17.md +++ b/17.md @@ -129,13 +129,13 @@ students.pivot('Major', 'Year') 后验概率。这些是考虑专业声明状态的信息后,二年级的概率。我们计算了其中的一个: -假设学生已经声明,学生是三年级的后验概率表示为 ![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7BThird%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29),计算如下。 +假设学生已经声明,学生是三年级的后验概率表示为 ![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7BThird%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29),计算如下。 -![](https://www.zhihu.com/equation?tex=%5Cbegin%7Balign*%7D%20P%28%5Cmbox%7BThird%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29%20%7E%20%26%3D%7E%20%5Cfrac%7B%200.4%20%5Ctimes%200.8%7D%7B0.6%20%5Ctimes%200.5%20%7E+%7E%200.4%20%5Ctimes%200.8%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B%5Cmbox%7B%28prior%20probability%20of%20Third%20Year%29%7D%20%5Ctimes%20%5Cmbox%7B%28likelihood%20of%20Declared%20given%20Third%20Year%29%7D%7D%20%7B%5Cmbox%7Btotal%20probability%20of%20Declared%7D%7D%20%5Cend%7Balign*%7D) +![](http://latex.codecogs.com/gif.latex?%5Cbegin%7Balign*%7D%20P%28%5Cmbox%7BThird%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29%20%7E%20%26%3D%7E%20%5Cfrac%7B%200.4%20%5Ctimes%200.8%7D%7B0.6%20%5Ctimes%200.5%20%7E+%7E%200.4%20%5Ctimes%200.8%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B%5Cmbox%7B%28prior%20probability%20of%20Third%20Year%29%7D%20%5Ctimes%20%5Cmbox%7B%28likelihood%20of%20Declared%20given%20Third%20Year%29%7D%7D%20%7B%5Cmbox%7Btotal%20probability%20of%20Declared%7D%7D%20%5Cend%7Balign*%7D) 另一个后验概率是: -![](https://www.zhihu.com/equation?tex=%5Cbegin%7Balign*%7D%20P%28%5Cmbox%7BSecond%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29%20%7E%20%26%3D%7E%20%5Cfrac%7B%200.6%20%5Ctimes%200.5%7D%7B0.6%20%5Ctimes%200.5%20%7E+%7E%200.4%20%5Ctimes%200.8%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B%5Cmbox%7B%28prior%20probability%20of%20Second%20Year%29%7D%20%5Ctimes%20%5Cmbox%7B%28likelihood%20of%20Declared%20given%20Second%20Year%29%7D%7D%20%7B%5Cmbox%7Btotal%20probability%20of%20Declared%7D%7D%20%5Cend%7Balign*%7D) +![](http://latex.codecogs.com/gif.latex?%5Cbegin%7Balign*%7D%20P%28%5Cmbox%7BSecond%20Year%7D%20%7E%5Cbig%7B%7C%7D%7E%20%5Cmbox%7BDeclared%7D%29%20%7E%20%26%3D%7E%20%5Cfrac%7B%200.6%20%5Ctimes%200.5%7D%7B0.6%20%5Ctimes%200.5%20%7E+%7E%200.4%20%5Ctimes%200.8%7D%20%5C%5C%20%5C%5C%20%26%3D%7E%20%5Cfrac%7B%5Cmbox%7B%28prior%20probability%20of%20Second%20Year%29%7D%20%5Ctimes%20%5Cmbox%7B%28likelihood%20of%20Declared%20given%20Second%20Year%29%7D%7D%20%7B%5Cmbox%7Btotal%20probability%20of%20Declared%7D%7D%20%5Cend%7Balign*%7D) ```py (0.6 * 0.5)/(0.6 * 0.5 + 0.4 * 0.8) @@ -148,7 +148,7 @@ students.pivot('Major', 'Year') 正因为如此,贝叶斯方法有时被归纳为比例陈述: -![](https://www.zhihu.com/equation?tex=%5Cmbox%7Bposterior%7D%20%7E%20%5Cpropto%20%7E%20%5Cmbox%7Bprior%7D%20%5Ctimes%20%5Cmbox%7Blikelihood%7D) +![](http://latex.codecogs.com/gif.latex?%5Cmbox%7Bposterior%7D%20%7E%20%5Cpropto%20%7E%20%5Cmbox%7Bprior%7D%20%5Ctimes%20%5Cmbox%7Blikelihood%7D) 公式非常便于高效地描述计算。 但是在我们的学生示例这样的情况中,不用公式来思考更简单。 我们仅仅使用树形图。 diff --git a/6.md b/6.md index aec38291851806fd5613e2f45312a1a84ba9d5f1..2745751c6ecd69ca5c17d43e29ab3e3792db1870 100644 --- a/6.md +++ b/6.md @@ -555,9 +555,9 @@ heights 如果我们只查看表格的第一行,计算就会变得清晰。 -请记住,数据集中有 200 部电影。这个`[300,400)`的桶包含 81 部电影。这是所有电影的 40.5%:![](https://www.zhihu.com/equation?tex=%5Cmbox%7BPercent%7D%20%3D%20%5Cfrac%7B81%7D%7B200%7D%20%5Ccdot%20100%20%3D%2040.5)。 +请记住,数据集中有 200 部电影。这个`[300,400)`的桶包含 81 部电影。这是所有电影的 40.5%:![](http://latex.codecogs.com/gif.latex?%5Cmbox%7BPercent%7D%20%3D%20%5Cfrac%7B81%7D%7B200%7D%20%5Ccdot%20100%20%3D%2040.5)。 -`[300, 400)`桶的宽度是`400-300 = 100`。所以 ![](https://www.zhihu.com/equation?tex=%5Cmbox%7BHeight%7D%20%3D%20%5Cfrac%7B40.5%7D%7B100%7D%20%3D%200.405)。 +`[300, 400)`桶的宽度是`400-300 = 100`。所以 ![](http://latex.codecogs.com/gif.latex?%5Cmbox%7BHeight%7D%20%3D%20%5Cfrac%7B40.5%7D%7B100%7D%20%3D%200.405)。 用于计算高度的代码使用了总共​​有 200 个电影,以及每个箱的宽度是 100 的事实。 diff --git a/8.md b/8.md index a9254c577d4bb123a1e1066b106120e682d9a286..7a6f09ca43cce6c067be59edd4f1e18b4dbeeb7f 100644 --- a/8.md +++ b/8.md @@ -702,27 +702,27 @@ combined.barh(0) 数学是准确发现概率的主要工具,尽管计算机也可用于此目的。模拟可以提供出色的近似,具有很高的概率。在本节中,我们将以非正式方式制定一些简单的规则来管理概率的计算。在随后的章节中,我们将回到模拟来近似复杂事件的概率。 -我们将使用标准符号 ![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bevent%7D%29) 来表示“事件”发生的概率,我们将交替使用“几率”和“概率”两个字。 +我们将使用标准符号 ![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bevent%7D%29) 来表示“事件”发生的概率,我们将交替使用“几率”和“概率”两个字。 ## 事件不会发生的时候 如果事件发生的概率是 40%,不发生的几率就是 60%。这个自然的计算可以这样秒速: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Ban%20event%20doesn%27t%20happen%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bthe%20event%20happens%7D%29) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Ban%20event%20doesn%27t%20happen%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bthe%20event%20happens%7D%29) ## 所有结果等可能的时候 如果你投掷一个普通的骰子,一个自然的假设是,所有六个面都是等可能的。 那么一个面出现的概率可以很容易地计算出来。 例如,骰子显示偶数的几率是: -![](https://www.zhihu.com/equation?tex=%5Cfrac%7B%5Cmbox%7Bnumber%20of%20even%20faces%7D%7D%7B%5Cmbox%7Bnumber%20of%20all%20faces%7D%7D%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B2%2C%204%2C%206%5C%7D%7D%7B%5C%23%5C%7B1%2C%202%2C%203%2C%204%2C%205%2C%206%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B3%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cmbox%7Bnumber%20of%20even%20faces%7D%7D%7B%5Cmbox%7Bnumber%20of%20all%20faces%7D%7D%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B2%2C%204%2C%206%5C%7D%7D%7B%5C%23%5C%7B1%2C%202%2C%203%2C%204%2C%205%2C%206%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B3%7D%7B6%7D) 与之相似: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bdie%20shows%20a%20multiple%20of%203%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B3%2C%206%5C%7D%7D%7B%5C%23%5C%7B1%2C%202%2C%203%2C%204%2C%205%2C%206%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B2%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bdie%20shows%20a%20multiple%20of%203%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B3%2C%206%5C%7D%7D%7B%5C%23%5C%7B1%2C%202%2C%203%2C%204%2C%205%2C%206%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B2%7D%7B6%7D) 通常: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Ban%20event%20happens%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B%5Cmbox%7Boutcomes%20that%20make%20the%20event%20happen%7D%5C%7D%7D%20%7B%5C%23%5C%7B%5Cmbox%7Ball%20outcomes%7D%5C%7D%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Ban%20event%20happens%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B%5Cmbox%7Boutcomes%20that%20make%20the%20event%20happen%7D%5C%7D%7D%20%7B%5C%23%5C%7B%5Cmbox%7Ball%20outcomes%7D%5C%7D%7D) 前提是所有的结果都是等可能的。 @@ -734,21 +734,21 @@ combined.barh(0) 有六种可能的颜色对:RB,BR,RG,GR,BG,GB(我们已经缩写了每种颜色的名字,就是它的第一个字母)。 所有这些都是抽样方案是等可能的,只有其中一个(GR)使事件发生。所以: -![](https://www.zhihu.com/equation?tex=%24%24%20P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B%5Cmbox%7BGR%7D%5C%7D%7D%7B%5C%23%5C%7B%5Cmbox%7BRB%2C%20BR%2C%20RG%2C%20GR%2C%20BG%2C%20GB%7D%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?%24%24%20P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B%5C%23%5C%7B%5Cmbox%7BGR%7D%5C%7D%7D%7B%5C%23%5C%7B%5Cmbox%7BRB%2C%20BR%2C%20RG%2C%20GR%2C%20BG%2C%20GB%7D%5C%7D%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) 但是还有另外一种方法来得到答案,可以用两个阶段来思考这个事件。 必须首先抽取绿色纸条。几率是 1/3,也就是说在所有实验的大约 1/3 的重复中,先抽取了绿色纸条,但事件还没完成。在这 1/3 的重复中,必须再次抽取红色纸条。这个发生在大约 1/2 的重复中,所以: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B2%7D%20%7E%5Cmbox%7Bof%7D%7E%20%5Cfrac%7B1%7D%7B3%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B2%7D%20%7E%5Cmbox%7Bof%7D%7E%20%5Cfrac%7B1%7D%7B3%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) 这个计算通常按照事件顺序,像这样: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B3%7D%20%7E%5Ctimes%7E%20%5Cfrac%7B1%7D%7B2%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bgreen%20first%2C%20then%20red%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B3%7D%20%7E%5Ctimes%7E%20%5Cfrac%7B1%7D%7B2%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D) 因数 1/2 叫做“假设第一次出现了绿色纸条,第二次出现红色纸条的条件几率”。 通常,我们拥有乘法规则: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Btwo%20events%20both%20happen%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7Bone%20event%20happens%7D%29%20%5Ctimes%20P%28%5Cmbox%7Bthe%20other%20event%20happens%2C%20given%20that%20the%20first%20one%20happened%7D%29) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Btwo%20events%20both%20happen%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7Bone%20event%20happens%7D%29%20%5Ctimes%20P%28%5Cmbox%7Bthe%20other%20event%20happens%2C%20given%20that%20the%20first%20one%20happened%7D%29) 两个事件同时发生的概率,等于第一个事件发生的概率,乘上第一个事件发生的情况下第二个事件发生的概率。 @@ -762,11 +762,11 @@ combined.barh(0) 根据上面的计算,GR 和 RG 每个的几率都是 1/6。所以你可以通过把它们相加来计算一绿一红的概率。 -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bone%20green%20and%20one%20red%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7BGR%7D%29%20+%20P%28%5Cmbox%7BRG%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D%20+%20%5Cfrac%7B1%7D%7B6%7D%20%7E%3D%7E%20%5Cfrac%7B2%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bone%20green%20and%20one%20red%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7BGR%7D%29%20+%20P%28%5Cmbox%7BRG%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B6%7D%20+%20%5Cfrac%7B1%7D%7B6%7D%20%7E%3D%7E%20%5Cfrac%7B2%7D%7B6%7D) 通常,我们拥有加法规则: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Ban%20event%20happens%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7Bfirst%20way%20it%20can%20happen%7D%29%20+%20P%28%5Cmbox%7Bsecond%20way%20it%20can%20happen%7D%29%20%7E%7E%7E%20%5Cmbox%7B%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Ban%20event%20happens%7D%29%20%7E%3D%7E%20P%28%5Cmbox%7Bfirst%20way%20it%20can%20happen%7D%29%20+%20P%28%5Cmbox%7Bsecond%20way%20it%20can%20happen%7D%29%20%7E%7E%7E%20%5Cmbox%7B%7D) 事件发生的概率,等于以第一种方式发生的概率,加上以第二种方式发生的概率。 @@ -788,21 +788,21 @@ combined.barh(0) 得出这个答案的另一种方法是,弄清楚如果你不能得到至少一个正面,会发生什么事情:这两次投掷都必须是反面。所以: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bat%20least%20one%20head%20in%20two%20tosses%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bboth%20tails%7D%29%20%7E%3D%7E%201%20-%20%5Cfrac%7B1%7D%7B4%7D%20%7E%3D%7E%20%5Cfrac%7B3%7D%7B4%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bat%20least%20one%20head%20in%20two%20tosses%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bboth%20tails%7D%29%20%7E%3D%7E%201%20-%20%5Cfrac%7B1%7D%7B4%7D%20%7E%3D%7E%20%5Cfrac%7B3%7D%7B4%7D) 要注意根据乘法规则: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bboth%20tails%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B4%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B2%7D%20%5Ccdot%20%5Cfrac%7B1%7D%7B2%7D%20%7E%3D%7E%20%5Cleft%28%5Cfrac%7B1%7D%7B2%7D%5Cright%29%5E2) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bboth%20tails%7D%29%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B4%7D%20%7E%3D%7E%20%5Cfrac%7B1%7D%7B2%7D%20%5Ccdot%20%5Cfrac%7B1%7D%7B2%7D%20%7E%3D%7E%20%5Cleft%28%5Cfrac%7B1%7D%7B2%7D%5Cright%29%5E2) 这两个观察使我们能够在任何给定数量的投掷中找到至少一个正面的几率。 例如: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bat%20least%20one%20head%20in%2017%20tosses%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Ball%2017%20are%20tails%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B1%7D%7B2%7D%5Cright%29%5E%7B17%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bat%20least%20one%20head%20in%2017%20tosses%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Ball%2017%20are%20tails%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B1%7D%7B2%7D%5Cright%29%5E%7B17%7D) 而现在我们有能力找到在骰子的投掷中,六点至少出现一次的几率: -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Ba%20single%20roll%20is%20not%206%7D%29%20%7E%3D%7E%20P%281%29%20+%20P%282%29%20+%20P%283%29%20+%20P%284%29%20+%20P%285%29%20%7E%3D%7E%20%5Cfrac%7B5%7D%7B6%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Ba%20single%20roll%20is%20not%206%7D%29%20%7E%3D%7E%20P%281%29%20+%20P%282%29%20+%20P%283%29%20+%20P%284%29%20+%20P%285%29%20%7E%3D%7E%20%5Cfrac%7B5%7D%7B6%7D) -![](https://www.zhihu.com/equation?tex=P%28%5Cmbox%7Bat%20least%20one%206%20in%20two%20rolls%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bboth%20rolls%20are%20not%206%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B5%7D%7B6%7D%5Cright%29%5E2%20%24%24%20and%20%24%24%20P%28%5Cmbox%7Bat%20least%20one%206%20in%2017%20rolls%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B5%7D%7B6%7D%5Cright%29%5E%7B17%7D) +![](http://latex.codecogs.com/gif.latex?P%28%5Cmbox%7Bat%20least%20one%206%20in%20two%20rolls%7D%29%20%7E%3D%7E%201%20-%20P%28%5Cmbox%7Bboth%20rolls%20are%20not%206%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B5%7D%7B6%7D%5Cright%29%5E2%20%24%24%20and%20%24%24%20P%28%5Cmbox%7Bat%20least%20one%206%20in%2017%20rolls%7D%29%20%7E%3D%7E%201%20-%20%5Cleft%28%5Cfrac%7B5%7D%7B6%7D%5Cright%29%5E%7B17%7D) 下表展示了,这些概率随着投掷数量从 1 增加到 50 而增加。 diff --git a/9.md b/9.md index 2626aa6e566096667a5e5a4ceb814b89eba97649..2a1c0844a020a62fa2b189e9be1b2664581de315 100644 --- a/9.md +++ b/9.md @@ -709,7 +709,7 @@ Table().with_column('Max Serial Number', maxes).hist(bins = every_ten) 这个估计的基本思想是观察到的序列号的平均值可能在1到`N`之间。 因此,如果`A`是平均值,那么: -![](https://www.zhihu.com/equation?tex=A%20%7E%20%5Capprox%20%7E%20%5Cfrac%7BN%7D%7B2%7D%20%7E%7E%7E%20%5Cmbox%7Band%20so%7D%20%7E%7E%7E%20N%20%5Capprox%202A) +![](http://latex.codecogs.com/gif.latex?A%20%7E%20%5Capprox%20%7E%20%5Cfrac%7BN%7D%7B2%7D%20%7E%7E%7E%20%5Cmbox%7Band%20so%7D%20%7E%7E%7E%20N%20%5Capprox%202A) 因此,可以使用一个新的统计量化来估计飞机总数:取观测到的平均序列号并加倍。