Statistics
The Statistics module contains basic statistics functionality.
Statistics.std
Function
std(v; corrected::Bool=true, mean=nothing, dims)
Compute the sample standard deviation of a vector or array v
, optionally along the given dimensions. The algorithm returns an estimator of the generative distribution's standard deviation under the assumption that each entry of v
is an IID drawn from that generative distribution. This computation is equivalent to calculating sqrt(sum((v - mean(v)).^2) / (length(v) - 1))
. A pre-computed mean
may be provided. If corrected
is true
, then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = length(v)
.
If array contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if array contains both). Use the skipmissing
function to omit missing
entries and compute the standard deviation of non-missing values.
Statistics.stdm
Function
stdm(v, m; corrected::Bool=true)
Compute the sample standard deviation of a vector v
with known mean m
. If corrected
is true
, then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = length(v)
.
If array contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if array contains both). Use the skipmissing
function to omit missing
entries and compute the standard deviation of non-missing values.
Statistics.var
Function
var(v; dims, corrected::Bool=true, mean=nothing)
Compute the sample variance of a vector or array v
, optionally along the given dimensions. The algorithm will return an estimator of the generative distribution's variance under the assumption that each entry of v
is an IID drawn from that generative distribution. This computation is equivalent to calculating sum(abs2, v - mean(v)) / (length(v) - 1)
. If corrected
is true
, then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = length(v)
. The mean mean
over the region may be provided.
If array contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if array contains both). Use the skipmissing
function to omit missing
entries and compute the variance of non-missing values.
Statistics.varm
Function
varm(v, m; dims, corrected::Bool=true)
Compute the sample variance of a collection v
with known mean(s) m
, optionally over the given dimensions. m
may contain means for each dimension of v
. If corrected
is true
, then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = length(v)
.
If array contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if array contains both). Use the skipmissing
function to omit missing
entries and compute the variance of non-missing values.
Statistics.cor
Function
cor(x::AbstractVector)
Return the number one.
sourcecor(X::AbstractMatrix; dims::Int=1)
Compute the Pearson correlation matrix of the matrix X
along the dimension dims
.
cor(x::AbstractVector, y::AbstractVector)
Compute the Pearson correlation between the vectors x
and y
.
cor(X::AbstractVecOrMat, Y::AbstractVecOrMat; dims=1)
Compute the Pearson correlation between the vectors or matrices X
and Y
along the dimension dims
.
Statistics.cov
Function
cov(x::AbstractVector; corrected::Bool=true)
Compute the variance of the vector x
. If corrected
is true
(the default) then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = length(x)
.
cov(X::AbstractMatrix; dims::Int=1, corrected::Bool=true)
Compute the covariance matrix of the matrix X
along the dimension dims
. If corrected
is true
(the default) then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = size(X, dims)
.
cov(x::AbstractVector, y::AbstractVector; corrected::Bool=true)
Compute the covariance between the vectors x
and y
. If corrected
is true
(the default), computes $\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x) (y_i-\bar y)^*$ where $*$ denotes the complex conjugate and n = length(x) = length(y)
. If corrected
is false
, computes $\frac{1}{n}\sum_{i=1}^n (x_i-\bar x) (y_i-\bar y)^*$.
cov(X::AbstractVecOrMat, Y::AbstractVecOrMat; dims::Int=1, corrected::Bool=true)
Compute the covariance between the vectors or matrices X
and Y
along the dimension dims
. If corrected
is true
(the default) then the sum is scaled with n-1
, whereas the sum is scaled with n
if corrected
is false
where n = size(X, dims) = size(Y, dims)
.
Statistics.mean!
Function
mean!(r, v)
Compute the mean of v
over the singleton dimensions of r
, and write results to r
.
Examples
julia> v = [1 2; 3 4] 2×2 Array{Int64,2}: 1 2 3 4 julia> mean!([1., 1.], v) 2-element Array{Float64,1}: 1.5 3.5 julia> mean!([1. 1.], v) 1×2 Array{Float64,2}: 2.0 3.0source
Statistics.mean
Function
mean(itr)
Compute the mean of all elements in a collection.
If itr
contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if array contains both). Use the skipmissing
function to omit missing
entries and compute the mean of non-missing values.
Examples
julia> mean(1:20) 10.5 julia> mean([1, missing, 3]) missing julia> mean(skipmissing([1, missing, 3])) 2.0source
mean(f::Function, itr)
Apply the function f
to each element of collection itr
and take the mean.
julia> mean(√, [1, 2, 3]) 1.3820881233139908 julia> mean([√1, √2, √3]) 1.3820881233139908source
mean(A::AbstractArray; dims)
Compute the mean of an array over the given dimensions.
mean
for empty arrays requires at least Julia 1.1.
Examples
julia> A = [1 2; 3 4] 2×2 Array{Int64,2}: 1 2 3 4 julia> mean(A, dims=1) 1×2 Array{Float64,2}: 2.0 3.0 julia> mean(A, dims=2) 2×1 Array{Float64,2}: 1.5 3.5source
Statistics.median!
Function
median!(v)
Like median
, but may overwrite the input vector.
Statistics.median
Function
median(itr)
Compute the median of all elements in a collection. For an even number of elements no exact median element exists, so the result is equivalent to calculating mean of two median elements.
If itr
contains NaN
or missing
values, the result is also NaN
or missing
(missing
takes precedence if itr
contains both). Use the skipmissing
function to omit missing
entries and compute the median of non-missing values.
Examples
julia> median([1, 2, 3]) 2.0 julia> median([1, 2, 3, 4]) 2.5 julia> median([1, 2, missing, 4]) missing julia> median(skipmissing([1, 2, missing, 4])) 2.0source
median(A::AbstractArray; dims)
Compute the median of an array along the given dimensions.
Examples
julia> median([1 2; 3 4], dims=1) 1×2 Array{Float64,2}: 2.0 3.0source
Statistics.middle
Function
middle(x)
Compute the middle of a scalar value, which is equivalent to x
itself, but of the type of middle(x, x)
for consistency.
middle(x, y)
Compute the middle of two reals x
and y
, which is equivalent in both value and type to computing their mean ((x + y) / 2
).
middle(range)
Compute the middle of a range, which consists of computing the mean of its extrema. Since a range is sorted, the mean is performed with the first and last element.
julia> middle(1:10) 5.5source
middle(a)
Compute the middle of an array a
, which consists of finding its extrema and then computing their mean.
julia> a = [1,2,3.6,10.9] 4-element Array{Float64,1}: 1.0 2.0 3.6 10.9 julia> middle(a) 5.95source
Statistics.quantile!
Function
quantile!([q::AbstractArray, ] v::AbstractVector, p; sorted=false)
Compute the quantile(s) of a vector v
at a specified probability or vector or tuple of probabilities p
on the interval [0,1]. If p
is a vector, an optional output array q
may also be specified. (If not provided, a new output array is created.) The keyword argument sorted
indicates whether v
can be assumed to be sorted; if false
(the default), then the elements of v
will be partially sorted in-place.
Quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k])
, for k = 1:n
where n = length(v)
. This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R default.
An ArgumentError
is thrown if v
contains NaN
or missing
values.
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365
Examples
julia> x = [3, 2, 1]; julia> quantile!(x, 0.5) 2.0 julia> x 3-element Array{Int64,1}: 1 2 3 julia> y = zeros(3); julia> quantile!(y, x, [0.1, 0.5, 0.9]) === y true julia> y 3-element Array{Float64,1}: 1.2 2.0 2.8source
Statistics.quantile
Function
quantile(itr, p; sorted=false)
Compute the quantile(s) of a collection itr
at a specified probability or vector or tuple of probabilities p
on the interval [0,1]. The keyword argument sorted
indicates whether itr
can be assumed to be sorted.
Quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k])
, for k = 1:n
where n = length(itr)
. This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R default.
An ArgumentError
is thrown if itr
contains NaN
or missing
values. Use the skipmissing
function to omit missing
entries and compute the quantiles of non-missing values.
- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365
Examples
julia> quantile(0:20, 0.5) 10.0 julia> quantile(0:20, [0.1, 0.5, 0.9]) 3-element Array{Float64,1}: 2.0 10.0 18.0 julia> quantile(skipmissing([1, 10, missing]), 0.5) 5.5source
© 2009–2019 Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and other contributors
Licensed under the MIT License.
https://docs.julialang.org/en/v1.1.1/stdlib/Statistics/