hist
Histograms
Description
The generic function hist
computes a histogram of the given data values. If plot = TRUE
, the resulting object of class "histogram"
is plotted by plot.histogram
, before it is returned.
Usage
hist(x, ...) ## Default S3 method: hist(x, breaks = "Sturges", freq = NULL, probability = !freq, include.lowest = TRUE, right = TRUE, density = NULL, angle = 45, col = "lightgray", border = NULL, main = paste("Histogram of" , xname), xlim = range(breaks), ylim = NULL, xlab = xname, ylab, axes = TRUE, plot = TRUE, labels = FALSE, nclass = NULL, warn.unused = TRUE, ...)
Arguments
x | a vector of values for which the histogram is desired. |
breaks | one of:
In the last three cases the number is a suggestion only; as the breakpoints will be set to |
freq | logical; if |
probability | an alias for |
include.lowest | logical; if |
right | logical; if |
density | the density of shading lines, in lines per inch. The default value of |
angle | the slope of shading lines, given as an angle in degrees (counter-clockwise). |
col | a colour to be used to fill the bars. The default of |
border | the color of the border around the bars. The default is to use the standard foreground color. |
main, xlab, ylab | main title and axis labels: these arguments to |
xlim, ylim | the range of x and y values with sensible defaults. Note that |
axes | logical. If |
plot | logical. If |
labels | logical or character string. Additionally draw labels on top of bars, if not |
nclass | numeric (integer). For S(-PLUS) compatibility only, |
warn.unused | logical. If |
... | further arguments and graphical parameters passed to |
Details
The definition of histogram differs by source (with country-specific biases). R's default with equi-spaced breaks (also the default) is to plot the counts in the cells defined by breaks
. Thus the height of a rectangle is proportional to the number of points falling into the cell, as is the area provided the breaks are equally-spaced.
The default with non-equi-spaced breaks is to give a plot of area one, in which the area of the rectangles is the fraction of the data points falling in the cells.
If right = TRUE
(default), the histogram cells are intervals of the form (a, b]
, i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest
is TRUE
.
For right = FALSE
, the intervals are of the form [a, b)
, and include.lowest
means ‘include highest’.
A numerical tolerance of 1e-7 times the median bin size (for more than four bins, otherwise the median is substituted) is applied when counting entries on the edges of bins. This is not included in the reported breaks
nor in the calculation of density
.
The default for breaks
is "Sturges"
: see nclass.Sturges
. Other names for which algorithms are supplied are "Scott"
and "FD"
/ "Freedman-Diaconis"
(with corresponding functions nclass.scott
and nclass.FD
). Case is ignored and partial matching is used. Alternatively, a function can be supplied which will compute the intended number of breaks or the actual breakpoints as a function of x
.
Value
an object of class "histogram"
which is a list with components:
breaks | the n+1 cell boundaries (= |
counts | n integers; for each cell, the number of |
density | values f^(x[i]), as estimated density values. If |
mids | the n cell midpoints. |
xname | a character string with the actual |
equidist | logical, indicating if the distances between |
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Venables, W. N. and Ripley. B. D. (2002) Modern Applied Statistics with S. Springer.
See Also
nclass.Sturges
, stem
, density
, truehist
in package MASS.
Typical plots with vertical bars are not histograms. Consider barplot
or plot(*, type = "h")
for such bar plots.
Examples
op <- par(mfrow = c(2, 2)) hist(islands) utils::str(hist(islands, col = "gray", labels = TRUE)) hist(sqrt(islands), breaks = 12, col = "lightblue", border = "pink") ##-- For non-equidistant breaks, counts should NOT be graphed unscaled: r <- hist(sqrt(islands), breaks = c(4*0:5, 10*3:5, 70, 100, 140), col = "blue1") text(r$mids, r$density, r$counts, adj = c(.5, -.5), col = "blue3") sapply(r[2:3], sum) sum(r$density * diff(r$breaks)) # == 1 lines(r, lty = 3, border = "purple") # -> lines.histogram(*) par(op) require(utils) # for str str(hist(islands, breaks = 12, plot = FALSE)) #-> 10 (~= 12) breaks str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE)) hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE, main = "WRONG histogram") # and warning ## Extreme outliers; the "FD" rule would take very large number of 'breaks': XXL <- c(1:9, c(-1,1)*1e300) hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives warning ## pretty() determines how many counts are used (platform dependently!): length(hh$breaks) ## typically 1 million -- though 1e6 was "a suggestion only" require(stats) set.seed(14) x <- rchisq(100, df = 4) ## Comparing data with a model distribution should be done with qqplot()! qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2) ## if you really insist on using hist() ... : hist(x, freq = FALSE, ylim = c(0, 0.2)) curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.