Leaderboard_Analysis with t-test for slope

From: https://www.kaggle.com/alexanderliao/leaderboard-analysis-with-t-test-for-slope

Author: A.L.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import statsmodels.api as sm
import os
import scipy.stats as ss
import matplotlib.pyplot as plt
print(os.listdir("../input"))
['imet-leaderboard', 'imet-2019-fgvc6']
In [2]:
lb = pd.read_csv('../input/imet-leaderboard/imet-2019-fgvc6-publicleaderboard.csv',parse_dates=['SubmissionDate'])
In [3]:
lb.SubmissionDate.max()
Out[3]:
Timestamp('2019-06-04 13:37:01')

The leaderboard-sheet was downloaded in 2019-06-04 13:37:01

In [4]:
team655 = (lb[lb.Score>0.655].TeamId.unique())
print(len(team655))
19

There are 19 teams break 0.655 when the leaderboard-sheet was downloaded.

In [5]:
lb = lb[lb.Score>0.6]
lb['score_diff'] = lb['Score'] - lb.groupby('TeamId')['Score'].shift(1)
lb_team655 = lb[lb.TeamId.isin(team655)]
lb_team655 = lb_team655.sort_values(by='score_diff',ascending=False)
In [6]:
lb_team655.groupby('TeamId').head(1)
Out[6]:
TeamId TeamName SubmissionDate Score score_diff
2915 2971624 [ods.ai], I really need a job 2019-06-04 01:48:43 0.690 0.075
2873 3109299 ╰( ͡° ͜ʖ ͡° )つ──☆*:・゚ 2019-06-03 14:03:35 0.684 0.047
1146 2968798 [ods.ai] Konstantin Gavrilchik 2019-04-25 19:44:48 0.666 0.042
2817 3072114 [ods.ai] n01z3 2019-06-02 23:19:15 0.702 0.041
2748 3095808 [ods.ai] Ilya Kibardin 2019-06-01 22:43:10 0.684 0.041
1749 2967966 Guanshuo Xu 2019-05-11 00:24:48 0.640 0.030
1484 2990018 hy-cn.bj.xa 2019-05-04 15:53:40 0.645 0.024
424 2965920 earhian 2019-04-06 02:22:53 0.635 0.016
505 2966430 Appian 2019-04-08 00:00:56 0.634 0.016
1380 2965103 祈ることしかできない 2019-05-02 02:18:01 0.621 0.015
1227 2967142 みんなをStarlightしちゃいます 2019-04-28 09:11:03 0.624 0.014
825 2967418 hahahahaha 2019-04-17 04:37:48 0.634 0.013
940 2993647 pudae 2019-04-20 00:01:35 0.630 0.013
970 3034218 I'm ashamed of these guys 2019-04-20 17:53:44 0.642 0.012
1518 3110368 [ods.ai] Roman Vlasov 2019-05-05 09:09:10 0.624 0.012
1346 2996947 R4 2019-05-01 08:17:33 0.625 0.011
1660 3005711 Drivers on Shennan Road 2019-05-08 07:53:16 0.625 0.010
595 2963612 just for fun 2019-04-10 15:14:36 0.631 0.010
2813 3126895 ValeriyBabushkin[ods.ai] 2019-06-02 19:13:30 0.693 NaN

In the 19 teams , there are 6 teams which have a score jump >0.04 after they had got 0.6+ score.They are all from ods.ai

In the 6 teams, there are 5 teams which jump after june 1st(deadline is june 4). They are all from X5.

In [7]:
 lb_top_50 = pd.read_csv('../input/imet-leaderboard/lb_score_top50.csv',index_col=[0])
In [8]:
a = lb_top_50.sort_values(by='diff',ascending=False)
a
Out[8]:
TeamId TeamName_old SubmissionDate Score_pulblic_when download TeamName_new Score_private diff
3 2971624 [ods.ai], I really need a job 2019-06-04 01:48:43 0.690 [ods.ai], I really need a job 0.593 0.097
2 3126895 ValeriyBabushkin[ods.ai] 2019-06-02 19:13:30 0.693 ValeriyBabushkin[ods.ai] 0.608 0.085
0 3095808 [ods.ai] Ilya Kibardin 2019-06-03 14:47:42 0.724 [ods.ai] Ilya Kibardin 0.664 0.06
1 3072114 [ods.ai] n01z3 2019-06-02 23:34:01 0.707 [ods.ai] n01z3 0.662 0.045
4 3109299 ╰( ͡° ͜ʖ ͡° )つ──☆*:・゚ 2019-06-03 14:03:35 0.684 ╰( ͡° ͜ʖ ͡° )つ──☆*:・゚ 0.667 0.017
17 2996947 R4 2019-06-01 08:32:22 0.656 R4 0.641 0.015
34 2970073 hikaru tebasaki team 2019-06-03 22:32:11 0.648 hikaru tebasaki team 0.636 0.012
16 3005711 Drivers on Shennan Road 2019-06-03 13:48:15 0.656 Drivers on Shennan Road 0.649 0.007
22 2975982 DAISUKE YAMAMOTO 2019-06-01 18:43:53 0.654 DAISUKE YAMAMOTO 0.647 0.007
36 2978009 syoya 2019-06-04 08:05:04 0.648 syoya 0.641 0.007
19 2988594 练习时长两年半 2019-06-01 08:38:27 0.655 练习时长两年半 0.648 0.007
7 2967142 みんなをStarlightしちゃいます 2019-06-04 10:39:28 0.667 みんなをStarlightしちゃいます 0.66 0.007
20 2972872 soywu 2019-06-02 13:59:59 0.655 soywu 0.648 0.007
33 3053418 Maxim Vakhrushev 2019-06-04 07:39:33 0.648 Maxim Vakhrushev 0.642 0.006
32 3104231 MMA 2019-06-02 11:32:31 0.649 MMA 0.643 0.006
26 2963583 rskmoi 2019-05-21 13:39:04 0.652 rskmoi 0.646 0.006
40 3058431 scut 2019-06-03 06:30:01 0.647 scut 0.641 0.006
46 3009112 RUSH 2019-06-04 05:07:51 0.645 [angtk.ai]RUSH 0.639 0.006
48 2965331 Tetsuro Kato 2019-05-24 07:29:55 0.644 Tetsuro Kato 0.638 0.006
25 3047430 yzwu 2019-06-03 08:43:51 0.653 yzwu 0.647 0.006
37 2990544 Samer Fatayri 2019-06-04 09:46:45 0.647 Samer Fatayri 0.642 0.005
39 2968083 [kaggler-ja] mhiro2 & Y.Nakama 2019-05-27 16:26:09 0.647 [kaggler-ja] mhiro2 & Y.Nakama 0.642 0.005
5 2968798 [ods.ai] Konstantin Gavrilchik 2019-06-01 17:41:59 0.677 [ods.ai] Konstantin Gavrilchik 0.672 0.005
8 2963612 just for fun 2019-05-23 23:42:14 0.664 Kaggler-JP&CN 0.659 0.005
24 2978031 [ods.ai] waniz 2019-06-03 09:59:27 0.653 [ods.ai] waniz 0.648 0.005
41 3210822 dalaos don't kick my ass 2019-05-31 09:17:04 0.646 [angtk.ai]Hsaki 0.641 0.005
30 2982148 sheep 2019-05-31 03:23:56 0.650 sheep 0.645 0.005
10 2966430 Appian 2019-05-30 09:33:23 0.662 Appian 0.658 0.004
18 2967966 Guanshuo Xu 2019-06-03 21:54:15 0.656 Guanshuo Xu 0.652 0.004
27 2965250 team_yogurt 2019-06-03 16:50:35 0.652 team_yogurt 0.648 0.004
45 3166681 DataKing 2019-06-03 23:49:45 0.645 Xiao-UCSD 0.641 0.004
14 2965920 earhian 2019-05-30 01:35:17 0.657 earhian 0.653 0.004
42 3081338 Tkoki 2019-06-02 11:35:37 0.646 Tkoki 0.642 0.004
44 2963875 Oleg Yaroshevskyy 2019-05-29 19:00:56 0.645 Oleg Yaroshevskyy 0.642 0.003
31 3187253 It has nothing to do with [ods.ai] 2019-05-30 11:20:26 0.649 [ods.ai] Yury Dzerin 0.646 0.003
12 2967418 hahahahaha 2019-06-04 13:37:01 0.661 X5, Best Russian Company 0.658 0.003
11 2965103 祈ることしかできない 2019-06-03 18:22:13 0.661 頼む!!!!! 0.658 0.003
47 3029906 T_T 2019-06-03 16:02:57 0.644 T_T 0.642 0.002
35 2965329 KeepLearning 2019-06-03 08:18:25 0.648 KeepLearning 0.646 0.002
43 3177564 Alchemists' Creed 2019-06-03 00:09:21 0.646 Alchemists' Creed: Obey the Rules 0.655 -0.009
9 2993647 pudae 2019-05-27 05:38:16 0.662 pudae 0.663 -0.001
38 2979878 ensemble is great 2019-05-29 10:40:09 0.647 NaN no_score #VALUE!
21 3047565 Need more sleep zZ 2019-06-04 08:10:18 0.654 NaN no_score #VALUE!
29 3045183 [dsmlkz] Nurma U 2019-05-20 04:54:51 0.650 NaN no_score #VALUE!
28 2975007 I'm sorry for [ods.ai] 2019-05-31 13:08:41 0.651 NaN no_score #VALUE!
13 2990018 hy-cn.bj.xa 2019-05-30 05:20:34 0.660 NaN no_score #VALUE!
6 3034218 I'm ashamed of these guys 2019-06-03 18:46:17 0.669 Evgeny Kononenko no_score #VALUE!
23 2968195 [kaggler-ja] Tawara 2019-06-01 23:52:55 0.653 NaN no_score #VALUE!
15 3110368 [ods.ai] Roman Vlasov 2019-05-21 13:15:20 0.656 NaN no_score #VALUE!
49 2965594 nishidadaishiro 2019-06-04 04:28:02 0.644 NaN no_score #VALUE!

This is what X5 called as "normal shakeup":

In [9]:
X = a['Score_pulblic_when download'][:8].values
Y = np.array([float(i) for i in a.Score_private[:8].values])
regression_results = sm.OLS(Y, X, missing = "drop").fit()
P_value = regression_results.pvalues [0]
R_squared = regression_results.rsquared
K_slope = regression_results.params [0]
conf_int = regression_results.conf_int ()
low_conf_int = conf_int [0][0]
high_conf_int = conf_int [0][1]
fig, ax = plt.subplots ()
ax.grid (True)
ax.scatter (X, Y, alpha = 1, color='orchid')
x_pred = np.linspace (min (X), max (X), 40)
y_pred = regression_results.predict (x_pred)
ax.plot (x_pred, y_pred, '-', color='darkorchid', linewidth=2)
print(low_conf_int, high_conf_int)
0.8951749592124293 0.9788190217026417

While this is what happened to the rest of us.....

In [10]:
X = a['Score_pulblic_when download'][8:30].values
Y = np.array([float(i) for i in a.Score_private[8:30].values])
regression_results = sm.OLS(Y, X, missing = "drop").fit()
P_value = regression_results.pvalues [0]
R_squared = regression_results.rsquared
K_slope = regression_results.params [0]
conf_int = regression_results.conf_int ()
low_conf_int = conf_int [0][0]
high_conf_int = conf_int [0][1]
fig, ax = plt.subplots ()
ax.grid (True)
ax.scatter (X, Y, alpha = 1, color='orchid')
x_pred = np.linspace (min (X), max (X), 40)
y_pred = regression_results.predict (x_pred)
ax.plot (x_pred, y_pred, '-', color='darkorchid', linewidth=2)
print(low_conf_int, high_conf_int)
0.9906877054960348 0.992061922548745

the top 5 teams drop in the private are all from X5.

Why do those five teams in X5 behave the same way?

They said the overfit public , how can they overfit public by one jump submission ?