Ref:
We’ll show egs from the package distr
that is capable of creating combinations of known distributions.
library(distr)
Let’s start by creating simple distributions and plots:
N1 <- Norm()
plot(N1)
B1 <- Binom(prob=0.25, size=2)
plot(B1)
## NULL
Bt1 <- Beta(3,9)
plot(Bt1, to.draw.arg="d") # draw just the pdf
U1 <- Unif(Min=-1,Max=1)
plot(U1, to.draw.arg="p") # draw just the cdf
E1 <- Exp(rate=2)+2
plot(E1, to.draw.arg=c("d","q")) # draw the pdf and the quantile
# produce a discrete distribution with support (1,5,7,21) with corresponding probabilities (.1,.1,.6,.2)
DD <- DiscreteDistribution(supp = c(1,5,7,21), prob = c(0.1,0.1,0.6,0.2))
plot(DD, panel.first=grid(), lty=1:4, lwd=c(1,2,4), col.vert="gold", col.hor = "blue",
col.points=c("red","black"), cex.points=1, pch.u=5, pch.a=19, vertical=T)
## $lty
## 1:4
##
## $lwd
## c(1, 2, 4)
plot(DD, do.points=FALSE, vertical=FALSE)
## NULL
plot(Nbinom(size=4, prob=0.3), cex.points=1.2, pch.u=20, pch.a=10)
## NULL
plot(Chisq(), log="xy", ngrid=100)
These objects can be acessed with the typical R’s d
, p
, q
and r
:
d(N1)(0) # density, pdf_AC(0)
## [1] 0.3989423
d(N1)(1)
## [1] 0.2419707
p(N1)(0.6745) # p(AC <= 0.6745)
## [1] 0.7500033
distr::q(N1)(0.75) # 75% quantile
## [1] 0.6744898
r(N1)(20) # generate 20 random numbers from the distribution
## [1] -0.153241771 1.529221179 -0.001043751 0.242755462 -1.461567423
## [6] 1.433645342 0.429453555 -1.023760221 -1.813338230 0.067470653
## [11] 0.367217323 0.635847502 2.135205878 -0.875030150 0.150371841
## [16] 0.917199284 0.268455735 0.299268740 -0.810842487 0.814442814
par(mfcol=c(1,1))
hist(r(N1)(5e3), breaks=50, prob=T)
It’s also possible to define distribution with a given pdf function:
D1 <- AbscontDistribution(d = function (x) exp(-abs(x/2)^3 ), withStand = TRUE)
plot(D1)
D2 <- AbscontDistribution(q = function (x) x^2, withStand = TRUE)
plot(D2)
There are available quite general arithmetical operations to distribution objects, generating new image distributions automatically. Arithmetics on distribution objects are understood as operations on corresponding random variables (r.v.’s) and not on distribution functions or densities. E.g.
\[\mathcal{N}(0,1) + 3 * \mathcal{N}(0,1) + 2\]
returns a distribution object representing the distribution of the r.v. \(X+3*Y+2\) where \(X\) and \(Y\) are r.v.’s i.i.d. \(\mathcal{N}(0,1)\).
N2 <- Norm() + 3*Norm() + 2
plot(N2)
Binary operators like “+”, “-” would loose their elegant calling e1 + e2 if they had to be called with an extra argument controlling their accuracy. Therefore, this accuracy is controlled by global options. These options are inspected and set by distroptions(), getdistrOption(), see ?distroptions.
Special attention has to be paid to arithmetic expressions of distributions involving multiple instances of the same symbol: All arising instances of distribution objects in arithmetic expressions are assumed stochastically independent. As a consequence, whenever in an expression, the same symbol for an object occurs more than once, every instance means a new independent distribution. So for a distribution object \(X\), the expressions \(X+X\) and \(2*X\) are not equivalent.
N2 <- Norm(mean=2, sd=1)
plot(N2+N2)
plot(2*N2)
The first means the convolution of distribution \(X\) with distribution \(X\), i.e. the distribution of the r.v. \(X1 + X2\), where \(X1\) and \(X2\) are identically distributed according to X. In contrast to this, the second expression means the distribution of the r.v. \(2X1 = X1 + X1\), where again \(X1\) is distributed according to \(X\). Hence always use \(2*X\), when you want to realize the second case. Similar caution is due for \(X^2\) and \(X*X\) and so on.
plot(N2*N2)
plot(N2^2)
A classic approximation of a normal with 12 uniforms:
N <- Norm(0,1)
U <- Unif(0,1)
U4 <- U+U+U+U
U12 <- U4+U4+U4
NormApprox <- U12-6
xs <- seq(-4,4,len=101)
plot(xs, d(N)(xs), type="l", lwd=6)
lines(xs, d(NormApprox)(xs), type="l", lwd=2, col="red")
legend("topleft", legend=c("Normal(0,1)", "Approximation"), fill=c("black","red"))
Some more egs:
D3 <- 1 / (Unif() + 0.3)
plot(D3)
D4 <- Norm() ^ (Binom(5,.2)+1)
plot(D4, xlim=c(-3,3))
D5 <- (Binom(5,.2)+1) ^ Norm()
plot(D5)
## NULL
At several instances (in particular for non-monotone functions from group Math like sin(), cos()) new distributions are generated by means of RtoDPQ
, RtoDPQ.d
, RtoDPQ.LC
. In these functions, slots d
, p
, q
are filled by simulating a large number of random variables, hence they are stochastic estimates. So don’t be surprised if they will change from call to call.
The next code sums a normal distribution with a log-normal by applying the convolution of their individual distributions:
\[f_{X+Y}(x) = \int_{-\infty}^{\infty} f_X(x) \times f_Y(z-x)~dz\]
f.X <- function(x) dnorm(x,1,0.5) # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75) # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z) # need to vectorize the resulting fn.
set.seed(1) # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")
It’s also possible to mix distributions:
M1 <- UnivarMixingDistribution(Norm(3,.25), Norm(1,.5))
plot(M1)
M2 <- UnivarMixingDistribution(Norm(5,.4), M1)
plot(M2)
M3 <- UnivarMixingDistribution(Binom(3,.3), Dirac(2), Norm(), mixCoeff=c(1/4,1/5,11/20))
plot(M3)
## NULL
We can truncate the distribution over a range:
T1 <- Truncate(Norm(), lower=-1, upper=2)
plot(T1)
T2 <- Truncate(Exp()+2, lower=2, upper=3)
plot(T2)
It’s possible to define the distribution that is minimum of maximum of two other distributions:
M1 <- Minimum(Norm(), Unif())
plot(M1)
M2 <- Maximum(Norm(), Unif(Min=-2,Max=2))
plot(M2)
plot(Minimum(M1,M2))
M3 <- Minimum(Norm(2,2), Pois(3))
plot(M3)
## NULL
The package is able to simulate events far into the tail:
F <- Truncate(Norm(), 20, 22)
r(F)(10)
## [1] 20.00818 20.23106 20.00112 20.01749 20.05563 20.05762 20.02661
## [8] 20.01330 20.10415 20.00146
hist(r(F)(1e4), breaks=40, prob=T)
p(Norm())(20, lower.tail=FALSE) # p(N>=20)
## [1] 2.753624e-89
library(distrEx)
id <- function(x) x
sq <- function(x) x^2
D <- Norm(0,1)
E(D, id) # Expectation
## [1] 0
E(D, sq) - E(D,id)^2 # Variance
## [1] 0.9999942
E(D, sq) # E(X^2), X ~ N(0,1)
## [1] 0.9999942
E(D, function(x) x^3) # E(X^3)
## [1] 0
E(D, function(x) x^4) # E(X^4)
## [1] 2.999831
E(D, function(x) abs(x)) # E_|x|[X], X ~ N(0,1)
## [1] 0.7978835