nclass {grDevices} | R Documentation |
Compute the number of classes for a histogram.
nclass.Sturges(x) nclass.scott(x) nclass.FD(x)
x |
a data vector. |
nclass.Sturges
uses Sturges' formula, implicitly basing bin
sizes on the range of the data.
nclass.scott
uses Scott's choice for a normal distribution based on
the estimate of the standard error, unless that is zero where it
returns 1
.
nclass.FD
uses the Freedman-Diaconis choice based on the
inter-quartile range (IQR(signif(x, 5))
) unless that's
zero where it uses increasingly more extreme symmetric quantiles up to
c(1,511)/512 and if that difference is still zero, reverts to using
Scott's choice.
The suggested number of classes.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S-PLUS. Springer, page 112.
Freedman, D. and Diaconis, P. (1981). On the histogram as a density estimator: L_2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57, 453–476. doi: 10.1007/BF01025868.
Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66, 605–610. doi: 10.2307/2335182.
Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice, and Visualization. Wiley.
Sturges, H. A. (1926). The choice of a class interval. Journal of the American Statistical Association, 21, 65–66. doi: 10.1080/01621459.1926.10502161.
hist
and truehist
(package
MASS); dpih
(package
KernSmooth) for a plugin bandwidth proposed by Wand(1995).
set.seed(1) x <- stats::rnorm(1111) nclass.Sturges(x) ## Compare them: NC <- function(x) c(Sturges = nclass.Sturges(x), Scott = nclass.scott(x), FD = nclass.FD(x)) NC(x) onePt <- rep(1, 11) NC(onePt) # no longer gives NaN