未验证 提交 e5f37524 编写于 作者: R Riazone 提交者: GitHub

Use Cases (#99)

* Regression Experiment Gold Prediction

* Delete Regression

* Create test

* Notebooks for Regression Experiement

Adding 2 notebooks:
1. Training Experiment
2. Downloading and Predicting off new updated data

* Create test

* Delete test

* Create test

* Adding Classification Files

Adding files for training experiment and one for using trained model to predict on new data. These models will be used to predict sharp fall in Gold Prices

* Create test

* Data files for experiment

Though these are not necessary for the experiments as the notebooks download data over net through API, these can be used as intermediary sources if someone does not want to omit the data download and preparation part and focus on experiment instead.

* Delete test

* Intro

* Delete test

* Read me

* Update Read me
上级 8e1bc05a
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#Importing Libraries\n",
"import pandas as pd\n",
"from datetime import datetime\n",
"import matplotlib.pyplot as plt\n",
"from yahoofinancials import YahooFinancials"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"ticker_details = pd.read_excel(\"Ticker List.xlsx\")\n",
"ticker = ticker_details['Ticker'].to_list()\n",
"names = ticker_details['Description'].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#Preparing Date Range\n",
"end_date= datetime.strftime(datetime.today(),'%Y-%m-%d')\n",
"start_date = \"2019-01-01\"\n",
"date_range = pd.bdate_range(start=start_date,end=end_date)\n",
"values = pd.DataFrame({ 'Date': date_range})\n",
"values['Date']= pd.to_datetime(values['Date'])\n",
"\n",
"#Extracting Data from Yahoo Finance and Adding them to Values table using date as key\n",
"for i in ticker:\n",
" raw_data = YahooFinancials(i)\n",
" raw_data = raw_data.get_historical_price_data(start_date, end_date, \"daily\")\n",
" df = pd.DataFrame(raw_data[i]['prices'])[['formatted_date','adjclose']]\n",
" df.columns = ['Date1',i]\n",
" df['Date1']= pd.to_datetime(df['Date1'])\n",
" values = values.merge(df,how='left',left_on='Date',right_on='Date1')\n",
" values = values.drop(labels='Date1',axis=1)\n",
"\n",
"#Renaming columns to represent instrument names rather than their ticker codes for ease of readability\n",
"names.insert(0,'Date')\n",
"values.columns = names\n",
"\n",
"#Front filling the NaN values in the data set\n",
"values = values.fillna(method=\"ffill\",axis=0)\n",
"values = values.fillna(method=\"bfill\",axis=0)\n",
"\n",
"# Co-ercing numeric type to all columns except Date\n",
"cols=values.columns.drop('Date')\n",
"values[cols] = values[cols].apply(pd.to_numeric,errors='coerce').round(decimals=1)\n",
"imp = ['Gold','Silver', 'Crude Oil', 'S&P500','MSCI EM ETF']\n",
"\n",
"# Calculating Short term -Historical Returns\n",
"change_days = [1,3,5,14,21]\n",
"\n",
"data = pd.DataFrame(data=values['Date'])\n",
"for i in change_days:\n",
" x= values[cols].pct_change(periods=i).add_suffix(\"-T-\"+str(i))\n",
" data=pd.concat(objs=(data,x),axis=1)\n",
" x=[]\n",
"\n",
"# Calculating Long term Historical Returns\n",
"change_days = [60,90,180,250]\n",
"\n",
"for i in change_days:\n",
" x= values[imp].pct_change(periods=i).add_suffix(\"-T-\"+str(i))\n",
" data=pd.concat(objs=(data,x),axis=1)\n",
" x=[]\n",
"\n",
"#Calculating Moving averages for Gold\n",
"moving_avg = pd.DataFrame(values['Date'],columns=['Date'])\n",
"moving_avg['Date']=pd.to_datetime(moving_avg['Date'],format='%Y-%b-%d')\n",
"moving_avg['Gold/15SMA'] = (values['Gold']/(values['Gold'].rolling(window=15).mean()))-1\n",
"moving_avg['Gold/30SMA'] = (values['Gold']/(values['Gold'].rolling(window=30).mean()))-1\n",
"moving_avg['Gold/60SMA'] = (values['Gold']/(values['Gold'].rolling(window=60).mean()))-1\n",
"moving_avg['Gold/90SMA'] = (values['Gold']/(values['Gold'].rolling(window=90).mean()))-1\n",
"moving_avg['Gold/180SMA'] = (values['Gold']/(values['Gold'].rolling(window=180).mean()))-1\n",
"moving_avg['Gold/90EMA'] = (values['Gold']/(values['Gold'].ewm(span=90,adjust=True,ignore_na=True).mean()))-1\n",
"moving_avg['Gold/180EMA'] = (values['Gold']/(values['Gold'].ewm(span=180,adjust=True,ignore_na=True).mean()))-1\n",
"moving_avg = moving_avg.dropna(axis=0)\n",
"#Merging Moving Average values to the feature space\n",
"\n",
"data['Date']=pd.to_datetime(data['Date'],format='%Y-%b-%d')\n",
"data = pd.merge(left=data,right=moving_avg,how='left',on='Date')\n",
"data = data[data['Gold-T-250'].notna()]\n",
"prediction_data = data.copy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from pycaret.classification import *\n",
"#Loading the stored model\n",
"classifier_22 = load_model(\"22D Classifier\");\n",
"#Making Predictions\n",
"prediction = predict_model(classifier_22,data=prediction_data)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Label</th>\n",
" <th>Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>2020-04-16</td>\n",
" <td>0</td>\n",
" <td>0.0869</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>2020-04-17</td>\n",
" <td>0</td>\n",
" <td>0.1266</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>2020-04-20</td>\n",
" <td>0</td>\n",
" <td>0.1037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>2020-04-21</td>\n",
" <td>0</td>\n",
" <td>0.1489</td>\n",
" </tr>\n",
" <tr>\n",
" <th>91</th>\n",
" <td>2020-04-22</td>\n",
" <td>0</td>\n",
" <td>0.0860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>2020-04-23</td>\n",
" <td>0</td>\n",
" <td>0.0823</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>2020-04-24</td>\n",
" <td>0</td>\n",
" <td>0.1129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>2020-04-27</td>\n",
" <td>0</td>\n",
" <td>0.0976</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>2020-04-28</td>\n",
" <td>0</td>\n",
" <td>0.1023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>2020-04-29</td>\n",
" <td>0</td>\n",
" <td>0.0919</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Label Score\n",
"87 2020-04-16 0 0.0869\n",
"88 2020-04-17 0 0.1266\n",
"89 2020-04-20 0 0.1037\n",
"90 2020-04-21 0 0.1489\n",
"91 2020-04-22 0 0.0860\n",
"92 2020-04-23 0 0.0823\n",
"93 2020-04-24 0 0.1129\n",
"94 2020-04-27 0 0.0976\n",
"95 2020-04-28 0 0.1023\n",
"96 2020-04-29 0 0.0919"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prediction[['Date','Label','Score']].tail(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
In this experiment, we will try to predict if there would be a ‘sharp fall’ or ‘crash’ in Gold Prices over next 22-Day period.
We would be using a classification technique for this experiment. We will also learn how to use the trained model to make a prediction on new data every day.
The steps for this exercise would be:
1. Import and shape the data — This would be similar to what was explained in the Regression Experiment Notebook. You can also use the final Training dataset from the Data folder.
2. Define ‘sharp fall’ in Gold Prices. Sharp is not an absolute measure. We will try to define ‘sharp fall’ objectively
3. Create Label — Based on the definition of ‘sharp fall’, we would create labels on historical data
4. Train models to predict the ‘sharp fall’ and use trained model to make prediction on new data
Detailed article explaining the experiment can be found here ( https://towardsdatascience.com/predicting-crashes-in-gold-prices-using-machine-learning-5769f548496 )
此差异已折叠。
因为 它太大了无法显示 source diff 。你可以改为 查看blob
{
"cells": [
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"#Importing Libraries\n",
"import pandas as pd\n",
"from datetime import datetime\n",
"import matplotlib.pyplot as plt\n",
"from yahoofinancials import YahooFinancials"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"ticker_details = pd.read_excel(\"Ticker List.xlsx\")\n",
"ticker = ticker_details['Ticker'].to_list()\n",
"names = ticker_details['Description'].to_list()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"#Preparing Date Range\n",
"end_date= datetime.strftime(datetime.today(),'%Y-%m-%d')\n",
"start_date = \"2019-01-01\"\n",
"date_range = pd.bdate_range(start=start_date,end=end_date)\n",
"values = pd.DataFrame({ 'Date': date_range})\n",
"values['Date']= pd.to_datetime(values['Date'])\n",
"\n",
"#Extracting Data from Yahoo Finance and Adding them to Values table using date as key\n",
"for i in ticker:\n",
" raw_data = YahooFinancials(i)\n",
" raw_data = raw_data.get_historical_price_data(start_date, end_date, \"daily\")\n",
" df = pd.DataFrame(raw_data[i]['prices'])[['formatted_date','adjclose']]\n",
" df.columns = ['Date1',i]\n",
" df['Date1']= pd.to_datetime(df['Date1'])\n",
" values = values.merge(df,how='left',left_on='Date',right_on='Date1')\n",
" values = values.drop(labels='Date1',axis=1)\n",
"\n",
"#Renaming columns to represent instrument names rather than their ticker codes for ease of readability\n",
"names.insert(0,'Date')\n",
"values.columns = names\n",
"\n",
"#Front filling the NaN values in the data set\n",
"values = values.fillna(method=\"ffill\",axis=0)\n",
"values = values.fillna(method=\"bfill\",axis=0)\n",
"\n",
"# Co-ercing numeric type to all columns except Date\n",
"cols=values.columns.drop('Date')\n",
"values[cols] = values[cols].apply(pd.to_numeric,errors='coerce').round(decimals=1)\n",
"imp = ['Gold','Silver', 'Crude Oil', 'S&P500','MSCI EM ETF']\n",
"\n",
"# Calculating Short term -Historical Returns\n",
"change_days = [1,3,5,14,21]\n",
"\n",
"data = pd.DataFrame(data=values['Date'])\n",
"for i in change_days:\n",
" x= values[cols].pct_change(periods=i).add_suffix(\"-T-\"+str(i))\n",
" data=pd.concat(objs=(data,x),axis=1)\n",
" x=[]\n",
"\n",
"# Calculating Long term Historical Returns\n",
"change_days = [60,90,180,250]\n",
"\n",
"for i in change_days:\n",
" x= values[imp].pct_change(periods=i).add_suffix(\"-T-\"+str(i))\n",
" data=pd.concat(objs=(data,x),axis=1)\n",
" x=[]\n",
"\n",
"#Calculating Moving averages for Gold\n",
"moving_avg = pd.DataFrame(values['Date'],columns=['Date'])\n",
"moving_avg['Date']=pd.to_datetime(moving_avg['Date'],format='%Y-%b-%d')\n",
"moving_avg['Gold/15SMA'] = (values['Gold']/(values['Gold'].rolling(window=15).mean()))-1\n",
"moving_avg['Gold/30SMA'] = (values['Gold']/(values['Gold'].rolling(window=30).mean()))-1\n",
"moving_avg['Gold/60SMA'] = (values['Gold']/(values['Gold'].rolling(window=60).mean()))-1\n",
"moving_avg['Gold/90SMA'] = (values['Gold']/(values['Gold'].rolling(window=90).mean()))-1\n",
"moving_avg['Gold/180SMA'] = (values['Gold']/(values['Gold'].rolling(window=180).mean()))-1\n",
"moving_avg['Gold/90EMA'] = (values['Gold']/(values['Gold'].ewm(span=90,adjust=True,ignore_na=True).mean()))-1\n",
"moving_avg['Gold/180EMA'] = (values['Gold']/(values['Gold'].ewm(span=180,adjust=True,ignore_na=True).mean()))-1\n",
"moving_avg = moving_avg.dropna(axis=0)\n",
"#Merging Moving Average values to the feature space\n",
"\n",
"data['Date']=pd.to_datetime(data['Date'],format='%Y-%b-%d')\n",
"data = pd.merge(left=data,right=moving_avg,how='left',on='Date')\n",
"data = data[data['Gold-T-250'].notna()]\n",
"prediction_data = data.copy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Gold</th>\n",
" <th>Return_22</th>\n",
" <th>Gold-T+22</th>\n",
" <th>Date-T+22</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>84</th>\n",
" <td>2020-04-13</td>\n",
" <td>1769.4</td>\n",
" <td>0.0406</td>\n",
" <td>1841.2</td>\n",
" <td>2020-05-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>2020-04-14</td>\n",
" <td>1751.3</td>\n",
" <td>0.0188</td>\n",
" <td>1784.2</td>\n",
" <td>2020-05-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>2020-04-15</td>\n",
" <td>1743.3</td>\n",
" <td>-0.0068</td>\n",
" <td>1731.4</td>\n",
" <td>2020-05-07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>2020-04-16</td>\n",
" <td>1730.1</td>\n",
" <td>-0.0096</td>\n",
" <td>1713.5</td>\n",
" <td>2020-05-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>2020-04-17</td>\n",
" <td>1694.5</td>\n",
" <td>-0.0233</td>\n",
" <td>1655.0</td>\n",
" <td>2020-05-09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Gold Return_22 Gold-T+22 Date-T+22\n",
"84 2020-04-13 1769.4 0.0406 1841.2 2020-05-05\n",
"85 2020-04-14 1751.3 0.0188 1784.2 2020-05-06\n",
"86 2020-04-15 1743.3 -0.0068 1731.4 2020-05-07\n",
"87 2020-04-16 1730.1 -0.0096 1713.5 2020-05-08\n",
"88 2020-04-17 1694.5 -0.0233 1655.0 2020-05-09"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pycaret.regression import *\n",
"#Loading the stored model\n",
"regressor_22 = load_model(\"22Day Regressor\");\n",
"#Making Predictions\n",
"predicted_return_22 = predict_model(regressor_22,data=prediction_data)\n",
"predicted_return_22=predicted_return_22[['Date','Label']]\n",
"predicted_return_22.columns = ['Date','Return_22']\n",
"\n",
"\n",
"#Adding return Predictions to Gold Values\n",
"predicted_values = values[['Date','Gold']]\n",
"predicted_values = predicted_values.tail(len(predicted_return_22))\n",
"predicted_values = pd.merge(left=predicted_values,right=predicted_return_22,on=['Date'],how='inner')\n",
"predicted_values['Gold-T+22']=(predicted_values['Gold']*(1+predicted_values['Return_22'])).round(decimals =1)\n",
"#Adding T+22 Date\n",
"from datetime import datetime, timedelta\n",
"predicted_values['Date-T+22'] = predicted_values['Date']+timedelta(days = 22)\n",
"predicted_values.tail()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
In this experiemnet we will try to use a multi-variate regression approach to forecast the return from Gold.
In the notebook, you will find codes that help you download financial information from yahoofinancials API for free and we will go through
how to shape and prepare the data to make it fit for prediction.
We will use a list of tickers which can be found in the Tickers.xlsx file in the Data folder, to download the historical prices.
Detailed article on the experiment can be found here (https://towardsdatascience.com/machine-learning-to-predict-gold-price-returns-4bdb0506b132 )
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册