Hypergeometric
The Hypergeometric Distribution
Description
Density, distribution function, quantile function and random generation for the hypergeometric distribution.
Usage
dhyper(x, m, n, k, log = FALSE) phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE) rhyper(nn, m, n, k)
Arguments
x, q | vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. |
m | the number of white balls in the urn. |
n | the number of black balls in the urn. |
k | the number of balls drawn from the urn, hence must be in 0,1,…, m+n. |
p | probability, it must be between 0 and 1. |
nn | number of observations. If |
log, log.p | logical; if TRUE, probabilities p are given as log(p). |
lower.tail | logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. |
Details
The hypergeometric distribution is used for sampling without replacement. The density of this distribution with parameters m
, n
and k
(named Np, N-Np, and n, respectively in the reference below, where N := m+n is also used in other references) is given by
p(x) = choose(m, x) choose(n, k-x) / choose(m+n, k)
for x = 0, …, k.
Note that p(x) is non-zero only for max(0, k-n) <= x <= min(k, m).
With p := m/(m+n) (hence Np = N \times p in the reference's notation), the first two moments are mean
E[X] = μ = k p
and variance
Var(X) = k p (1 - p) * (m+n-k)/(m+n-1),
which shows the closeness to the Binomial(k,p) (where the hypergeometric has smaller variance unless k = 1).
The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.
In rhyper()
, if one of m, n, k exceeds .Machine$integer.max
, currently the equivalent of qhyper(runif(nn), m,n,k)
is used which is comparably slow while instead a binomial approximation may be considerably more efficient.
Value
dhyper
gives the density, phyper
gives the distribution function, qhyper
gives the quantile function, and rhyper
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for rhyper
, and is the maximum of the lengths of the numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the length of the result. Only the first elements of the logical arguments are used.
Source
dhyper
computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom
).
phyper
is based on calculating dhyper
and phyper(...)/dhyper(...)
(as a summation), based on ideas of Ian Smith and Morten Welinder.
qhyper
is based on inversion (of an earlier phyper()
algorithm).
rhyper
is based on a corrected version of
Kachitvichyanukul, V. and Schmeiser, B. (1985). Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation, 22, 127–145.
References
Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.
See Also
Distributions for other standard distributions.
Examples
m <- 10; n <- 7; k <- 8 x <- 0:(k+1) rbind(phyper(x, m, n, k), dhyper(x, m, n, k)) all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k))) # FALSE ## but error is very small: signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.