Generalized Linear Models
Generalized linear models currently supports estimation using the one-parameter exponential families.
See Module Reference for commands and arguments.
Examples
# Load modules and data In [1]: import statsmodels.api as sm In [2]: data = sm.datasets.scotland.load() In [3]: data.exog = sm.add_constant(data.exog) # Instantiate a gamma family model with the default link function. In [4]: gamma_model = sm.GLM(data.endog, data.exog, family=sm.families.Gamma()) In [5]: gamma_results = gamma_model.fit() In [6]: print(gamma_results.summary()) Generalized Linear Model Regression Results ============================================================================== Dep. Variable: y No. Observations: 32 Model: GLM Df Residuals: 24 Model Family: Gamma Df Model: 7 Link Function: inverse_power Scale: 0.0035843 Method: IRLS Log-Likelihood: -83.017 Date: Mon, 14 May 2018 Deviance: 0.087389 Time: 21:48:07 Pearson chi2: 0.0860 No. Iterations: 6 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ const -0.0178 0.011 -1.548 0.122 -0.040 0.005 x1 4.962e-05 1.62e-05 3.060 0.002 1.78e-05 8.14e-05 x2 0.0020 0.001 3.824 0.000 0.001 0.003 x3 -7.181e-05 2.71e-05 -2.648 0.008 -0.000 -1.87e-05 x4 0.0001 4.06e-05 2.757 0.006 3.23e-05 0.000 x5 -1.468e-07 1.24e-07 -1.187 0.235 -3.89e-07 9.56e-08 x6 -0.0005 0.000 -2.159 0.031 -0.001 -4.78e-05 x7 -2.427e-06 7.46e-07 -3.253 0.001 -3.89e-06 -9.65e-07 ==============================================================================
Detailed examples can be found here:
Technical Documentation
The statistical model for each observation \(i\) is assumed to be
\(Y_i \sim F_{EDM}(\cdot|\theta,\phi,w_i)\) and \(\mu_i = E[Y_i|x_i] = g^{-1}(x_i^\prime\beta)\).where \(g\) is the link function and \(F_{EDM}(\cdot|\theta,\phi,w)\) is a distribution of the family of exponential dispersion models (EDM) with natural parameter \(\theta\), scale parameter \(\phi\) and weight \(w\). Its density is given by
\(f_{EDM}(y|\theta,\phi,w) = c(y,\phi,w) \exp\left(\frac{y\theta-b(\theta)}{\phi}w\right)\,.\)It follows that \(\mu = b'(\theta)\) and \(Var[Y|x]=\frac{\phi}{w}b''(\theta)\). The inverse of the first equation gives the natural parameter as a function of the expected value \(\theta(\mu)\) such that
\(Var[Y_i|x_i] = \frac{\phi}{w_i} v(\mu_i)\)with \(v(\mu) = b''(\theta(\mu))\). Therefore it is said that a GLM is determined by link function \(g\) and variance function \(v(\mu)\) alone (and \(x\) of course).
Note that while \(\phi\) is the same for every observation \(y_i\) and therefore does not influence the estimation of \(\beta\), the weights \(w_i\) might be different for every \(y_i\) such that the estimation of \(\beta\) depends on them.
Distribution | Domain | \(\mu=E[Y|x]\) | \(v(\mu)\) | \(\theta(\mu)\) | \(b(\theta)\) | \(\phi\) |
---|---|---|---|---|---|---|
Binomial \(B(n,p)\) | \(0,1,\ldots,n\) | \(np\) | \(\mu-\frac{\mu^2}{n}\) | \(\log\frac{p}{1-p}\) | \(n\log(1+e^\theta)\) | 1 |
Poisson \(P(\mu)\) | \(0,1,\ldots,\infty\) | \(\mu\) | \(\mu\) | \(\log(\mu)\) | \(e^\theta\) | 1 |
Neg. Binom. \(NB(\mu,\alpha)\) | \(0,1,\ldots,\infty\) | \(\mu\) | \(\mu+\alpha\mu^2\) | \(\log(\frac{\alpha\mu}{1+\alpha\mu})\) | \(-\frac{1}{\alpha}\log(1-\alpha e^\theta)\) | 1 |
Gaussian/Normal \(N(\mu,\sigma^2)\) | \((-\infty,\infty)\) | \(\mu\) | \(1\) | \(\mu\) | \(\frac{1}{2}\theta^2\) | \(\sigma^2\) |
Gamma \(N(\mu,\nu)\) | \((0,\infty)\) | \(\mu\) | \(\mu^2\) | \(-\frac{1}{\mu}\) | \(-\log(-\theta)\) | \(\frac{1}{\nu}\) |
Inv. Gauss. \(IG(\mu,\sigma^2)\) | \((0,\infty)\) | \(\mu\) | \(\mu^3\) | \(-\frac{1}{2\mu^2}\) | \(-\sqrt{-2\theta}\) | \(\sigma^2\) |
Tweedie \(p\geq 1\) | depends on \(p\) | \(\mu\) | \(\mu^p\) | \(\frac{\mu^{1-p}}{1-p}\) | \(\frac{\alpha-1}{\alpha}\left(\frac{\theta}{\alpha-1}\right)^{\alpha}\) | \(\phi\) |
The Tweedie distribution has special cases for \(p=0,1,2\) not listed in the table and uses \(\alpha=\frac{p-2}{p-1}\).
Correspondence of mathematical variables to code:
-
\(Y\) and \(y\) are coded as
endog
, the variable one wants to model -
\(x\) is coded as
exog
, the covariates alias explanatory variables -
\(\beta\) is coded as
params
, the parameters one wants to estimate -
\(\mu\) is coded as
mu
, the expectation (conditional on \(x\)) of \(Y\) -
\(g\) is coded as
link
argument to theclass Family
-
\(\phi\) is coded as
scale
, the dispersion parameter of the EDM -
\(w\) is not yet supported (i.e. \(w=1\)), in the future it might be
var_weights
-
\(p\) is coded as
var_power
for the power of the variance function \(v(\mu)\) of the Tweedie distribution, see table -
\(\alpha\) is either
- Negative Binomial: the ancillary parameter
alpha
, see table - Tweedie: an abbreviation for \(\frac{p-2}{p-1}\) of the power \(p\) of the variance function, see table
- Negative Binomial: the ancillary parameter
References
- Gill, Jeff. 2000. Generalized Linear Models: A Unified Approach. SAGE QASS Series.
- Green, PJ. 1984. “Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives.” Journal of the Royal Statistical Society, Series B, 46, 149-192.
- Hardin, J.W. and Hilbe, J.M. 2007. “Generalized Linear Models and Extensions.” 2nd ed. Stata Press, College Station, TX.
- McCullagh, P. and Nelder, J.A. 1989. “Generalized Linear Models.” 2nd ed. Chapman & Hall, Boca Rotan.
Module Reference
Model Class
GLM (endog, exog[, family, offset, exposure, …]) | Generalized Linear Models class |
Results Class
GLMResults (model, params, …[, cov_type, …]) | Class to contain GLM results. |
PredictionResults (predicted_mean, var_pred_mean) |
Families
The distribution families currently implemented are
Family (link, variance) | The parent class for one-parameter exponential families. |
Binomial ([link]) | Binomial exponential family distribution. |
Gamma ([link]) | Gamma exponential family distribution. |
Gaussian ([link]) | Gaussian exponential family distribution. |
InverseGaussian ([link]) | InverseGaussian exponential family. |
NegativeBinomial ([link, alpha]) | Negative Binomial exponential family. |
Poisson ([link]) | Poisson exponential family. |
Tweedie ([link, var_power]) | Tweedie family. |
Link Functions
The link functions currently implemented are the following. Not all link functions are available for each distribution family. The list of available link functions can be obtained by
>>> sm.families.family.<familyname>.links
Link | A generic link function for one-parameter exponential family. |
CDFLink ([dbn]) | The use the CDF of a scipy.stats distribution |
CLogLog | The complementary log-log transform |
Log | The log transform |
Logit | The logit transform |
NegativeBinomial ([alpha]) | The negative binomial link function |
Power ([power]) | The power transform |
cauchy () | The Cauchy (standard Cauchy CDF) transform |
cloglog | The CLogLog transform link function. |
identity () | The identity transform |
inverse_power () | The inverse transform |
inverse_squared () | The inverse squared transform |
log | The log transform |
logit | |
nbinom ([alpha]) | The negative binomial link function. |
probit ([dbn]) | The probit (standard normal CDF) transform |
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/glm.html