{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generalized Linear Models (Formula)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates how you can use R-style formulas to fit Generalized Linear Models.\n", "\n", "To begin, we load the ``Star98`` dataset and we construct a formula and pre-process the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from __future__ import print_function\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf\n", "star98 = sm.datasets.star98.load_pandas().data\n", "formula = 'SUCCESS ~ LOWINC + PERASIAN + PERBLACK + PERHISP + PCTCHRT + \\\n", " PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF'\n", "dta = star98[['NABOVE', 'NBELOW', 'LOWINC', 'PERASIAN', 'PERBLACK', 'PERHISP',\n", " 'PCTCHRT', 'PCTYRRND', 'PERMINTE', 'AVYRSEXP', 'AVSALK',\n", " 'PERSPENK', 'PTRATIO', 'PCTAF']].copy()\n", "endog = dta['NABOVE'] / (dta['NABOVE'] + dta.pop('NBELOW'))\n", "del dta['NABOVE']\n", "dta['SUCCESS'] = endog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we fit the GLM model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mod1 = smf.glm(formula=formula, data=dta, family=sm.families.Binomial()).fit()\n", "mod1.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we define a function to operate customized data transformation using the formula framework:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def double_it(x):\n", " return 2 * x\n", "formula = 'SUCCESS ~ double_it(LOWINC) + PERASIAN + PERBLACK + PERHISP + PCTCHRT + \\\n", " PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF'\n", "mod2 = smf.glm(formula=formula, data=dta, family=sm.families.Binomial()).fit()\n", "mod2.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the coefficient for ``double_it(LOWINC)`` in the second model is half the size of the ``LOWINC`` coefficient from the first model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(mod1.params[1])\n", "print(mod2.params[1] * 2)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }