family
Family Objects for Models
Description
Family objects provide a convenient way to specify the details of the models used by functions such as glm
. See the documentation for glm
for the details on how such model fitting takes place.
Usage
family(object, ...) binomial(link = "logit") gaussian(link = "identity") Gamma(link = "inverse") inverse.gaussian(link = "1/mu^2") poisson(link = "log") quasi(link = "identity", variance = "constant") quasibinomial(link = "logit") quasipoisson(link = "log")
Arguments
link | a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector, or an object of class The The |
variance | for all families other than |
object | the function |
... | further arguments passed to methods. |
Details
family
is a generic function with methods for classes "glm"
and "lm"
(the latter returning gaussian()
).
For the binomial
and quasibinomial
families the response can be specified in one of three ways:
-
As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
-
As a numerical vector with values between
0
and1
, interpreted as the proportion of successful cases (with the total number of cases given by theweights
). -
As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
The quasibinomial
and quasipoisson
families differ from the binomial
and poisson
families only in that the dispersion parameter is not fixed at one, so they can model over-dispersion. For the binomial case see McCullagh and Nelder (1989, pp. 124–8). Although they show that there is (under some restrictions) a model with variance proportional to mean as in the quasi-binomial model, note that glm
does not compute maximum-likelihood estimates in that model. The behaviour of S is closer to the quasi- variants.
Value
An object of class "family"
(which has a concise print method). This is a list with elements
family | character: the family name. |
link | character: the link name. |
linkfun | function: the link. |
linkinv | function: the inverse of the link function. |
variance | function: the variance as a function of the mean. |
dev.resids | function giving the deviance for each observation as a function of |
aic | function giving the AIC value if appropriate (but |
mu.eta | function: derivative of the inverse-link function with respect to the linear predictor. If the inverse-link function is mu = ginv(eta) where eta is the value of the linear predictor, then this function returns d(ginv(eta))/d(eta) = d(mu)/d(eta). |
initialize | expression. This needs to set up whatever data objects are needed for the family as well as |
validmu | logical function. Returns |
valideta | logical function. Returns |
simulate | (optional) function |
Note
The link
and variance
arguments have rather awkward semantics for back-compatibility. The recommended way is to supply them as quoted character strings, but they can also be supplied unquoted (as names or expressions). Additionally, they can be supplied as a length-one character vector giving the name of one of the options, or as a list (for link
, of class "link-glm"
). The restrictions apply only to links given as names: when given as a character string all the links known to make.link
are accepted.
This is potentially ambiguous: supplying link = logit
could mean the unquoted name of a link or the value of object logit
. It is interpreted if possible as the name of an allowed link, then as an object. (You can force the interpretation to always be the value of an object via logit[1]
.)
Author(s)
The design was inspired by S functions of the same names described in Hastie & Pregibon (1992) (except quasibinomial
and quasipoisson
).
References
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.
Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples. London: Chapman and Hall.
Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
See Also
For binomial coefficients, choose
; the binomial and negative binomial distributions, Binomial
, and NegBinomial
.
Examples
require(utils) # for str nf <- gaussian() # Normal family nf str(nf) gf <- Gamma() gf str(gf) gf$linkinv gf$variance(-3:4) #- == (.)^2 ## Binomial with default 'logit' link: Check some properties visually: bi <- binomial() et <- seq(-10,10, by=1/8) plot(et, bi$mu.eta(et), type="l") ## show that mu.eta() is derivative of linkinv() : lines((et[-1]+et[-length(et)])/2, col=adjustcolor("red", 1/4), diff(bi$linkinv(et))/diff(et), type="l", lwd=4) ## which here is the logistic density: lines(et, dlogis(et), lwd=3, col=adjustcolor("blue", 1/4)) stopifnot(exprs = { all.equal(bi$ mu.eta(et), dlogis(et)) all.equal(bi$linkinv(et), plogis(et) -> m) all.equal(bi$linkfun(m ), qlogis(m)) # logit(.) == qlogis(.) ! }) ## Data from example(glm) : d.AD <- data.frame(treatment = gl(3,3), outcome = gl(3,1,9), counts = c(18,17,15, 20,10,20, 25,13,12)) glm.D93 <- glm(counts ~ outcome + treatment, d.AD, family = poisson()) ## Quasipoisson: compare with above / example(glm) : glm.qD93 <- glm(counts ~ outcome + treatment, d.AD, family = quasipoisson()) glm.qD93 anova (glm.qD93, test = "F") summary(glm.qD93) ## for Poisson results (same as from 'glm.D93' !) use anova (glm.qD93, dispersion = 1, test = "Chisq") summary(glm.qD93, dispersion = 1) ## Example of user-specified link, a logit model for p^days ## See Shaffer, T. 2004. Auk 121(2): 526-540. logexp <- function(days = 1) { linkfun <- function(mu) qlogis(mu^(1/days)) linkinv <- function(eta) plogis(eta)^days mu.eta <- function(eta) days * plogis(eta)^(days-1) * binomial()$mu.eta(eta) valideta <- function(eta) TRUE link <- paste0("logexp(", days, ")") structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = "link-glm") } (bil3 <- binomial(logexp(3))) ## in practice this would be used with a vector of 'days', in ## which case use an offset of 0 in the corresponding formula ## to get the null deviance right. ## Binomial with identity link: often not a good idea, as both ## computationally and conceptually difficult: binomial(link = "identity") ## is exactly the same as binomial(link = make.link("identity")) ## tests of quasi x <- rnorm(100) y <- rpois(100, exp(1+x)) glm(y ~ x, family = quasi(variance = "mu", link = "log")) # which is the same as glm(y ~ x, family = poisson) glm(y ~ x, family = quasi(variance = "mu^2", link = "log")) ## Not run: glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails y <- rbinom(100, 1, plogis(x)) # need to set a starting value for the next fit glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0,1))
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.