Creating New Distributions

Distributions with generic pdfs
Arithmetical Expressions
- Summing distributions by convolution
Mix Distributions
Truncation
Minimum and Maximum
Simulation of far out terms
Computing Expectations

Ref:

We’ll show egs from the package distr that is capable of creating combinations of known distributions.

library(distr)

Let’s start by creating simple distributions and plots:

N1 <- Norm()
plot(N1)

B1 <- Binom(prob=0.25, size=2)
plot(B1)

## NULL

Bt1 <- Beta(3,9)
plot(Bt1, to.draw.arg="d") # draw just the pdf

U1 <- Unif(Min=-1,Max=1)
plot(U1, to.draw.arg="p") # draw just the cdf

E1 <- Exp(rate=2)+2
plot(E1, to.draw.arg=c("d","q")) # draw the pdf and the quantile

# produce a discrete distribution with support (1,5,7,21) with corresponding probabilities (.1,.1,.6,.2)
DD <- DiscreteDistribution(supp = c(1,5,7,21), prob = c(0.1,0.1,0.6,0.2))
plot(DD, panel.first=grid(), lty=1:4, lwd=c(1,2,4), col.vert="gold", col.hor = "blue", 
     col.points=c("red","black"), cex.points=1, pch.u=5, pch.a=19, vertical=T)

## $lty
## 1:4
## 
## $lwd
## c(1, 2, 4)

plot(DD, do.points=FALSE, vertical=FALSE)

## NULL

plot(Nbinom(size=4, prob=0.3), cex.points=1.2, pch.u=20, pch.a=10)

## NULL

plot(Chisq(), log="xy", ngrid=100)

These objects can be acessed with the typical R’s d, p, q and r:

d(N1)(0)           # density, pdf_AC(0)
## [1] 0.3989423
d(N1)(1)
## [1] 0.2419707
p(N1)(0.6745)      # p(AC <= 0.6745)
## [1] 0.7500033
distr::q(N1)(0.75) # 75% quantile
## [1] 0.6744898
r(N1)(20)          # generate 20 random numbers from the distribution
##  [1] -0.153241771  1.529221179 -0.001043751  0.242755462 -1.461567423
##  [6]  1.433645342  0.429453555 -1.023760221 -1.813338230  0.067470653
## [11]  0.367217323  0.635847502  2.135205878 -0.875030150  0.150371841
## [16]  0.917199284  0.268455735  0.299268740 -0.810842487  0.814442814
par(mfcol=c(1,1))
hist(r(N1)(5e3), breaks=50, prob=T)

Distributions with generic pdfs

It’s also possible to define distribution with a given pdf function:

D1 <- AbscontDistribution(d = function (x) exp(-abs(x/2)^3 ), withStand = TRUE)
plot(D1)

D2 <- AbscontDistribution(q = function (x) x^2, withStand = TRUE)
plot(D2)

Arithmetical Expressions

There are available quite general arithmetical operations to distribution objects, generating new image distributions automatically. Arithmetics on distribution objects are understood as operations on corresponding random variables (r.v.’s) and not on distribution functions or densities. E.g.

\[\mathcal{N}(0,1) + 3 * \mathcal{N}(0,1) + 2\]

returns a distribution object representing the distribution of the r.v. \(X+3*Y+2\) where \(X\) and \(Y\) are r.v.’s i.i.d. \(\mathcal{N}(0,1)\).

N2 <- Norm() + 3*Norm() + 2
plot(N2)

Binary operators like “+”, “-” would loose their elegant calling e1 + e2 if they had to be called with an extra argument controlling their accuracy. Therefore, this accuracy is controlled by global options. These options are inspected and set by distroptions(), getdistrOption(), see ?distroptions.

Special attention has to be paid to arithmetic expressions of distributions involving multiple instances of the same symbol: All arising instances of distribution objects in arithmetic expressions are assumed stochastically independent. As a consequence, whenever in an expression, the same symbol for an object occurs more than once, every instance means a new independent distribution. So for a distribution object \(X\), the expressions \(X+X\) and \(2*X\) are not equivalent.

N2 <- Norm(mean=2, sd=1)
plot(N2+N2)

plot(2*N2)

The first means the convolution of distribution \(X\) with distribution \(X\), i.e. the distribution of the r.v. \(X1 + X2\), where \(X1\) and \(X2\) are identically distributed according to X. In contrast to this, the second expression means the distribution of the r.v. \(2X1 = X1 + X1\), where again \(X1\) is distributed according to \(X\). Hence always use \(2*X\), when you want to realize the second case. Similar caution is due for \(X^2\) and \(X*X\) and so on.

plot(N2*N2)

plot(N2^2)

A classic approximation of a normal with 12 uniforms:

N   <- Norm(0,1)
U   <- Unif(0,1)
U4  <- U+U+U+U
U12 <- U4+U4+U4
NormApprox <- U12-6

xs <- seq(-4,4,len=101)
plot(xs, d(N)(xs), type="l", lwd=6)
lines(xs, d(NormApprox)(xs), type="l", lwd=2, col="red")
legend("topleft", legend=c("Normal(0,1)", "Approximation"), fill=c("black","red"))

Some more egs:

D3 <- 1 / (Unif() + 0.3)
plot(D3)

D4 <- Norm() ^ (Binom(5,.2)+1)
plot(D4, xlim=c(-3,3))

D5 <- (Binom(5,.2)+1) ^ Norm()
plot(D5)

## NULL

At several instances (in particular for non-monotone functions from group Math like sin(), cos()) new distributions are generated by means of RtoDPQ, RtoDPQ.d, RtoDPQ.LC. In these functions, slots d, p, q are filled by simulating a large number of random variables, hence they are stochastic estimates. So don’t be surprised if they will change from call to call.

Summing distributions by convolution

The next code sums a normal distribution with a log-normal by applying the convolution of their individual distributions:

\[f_{X+Y}(x) = \int_{-\infty}^{\infty} f_X(x) \times f_Y(z-x)~dz\]

f.X <- function(x) dnorm(x,1,0.5)        # normal (mu=1.5, sigma=0.5)
f.Y <- function(y) dlnorm(y,1.5, 0.75)   # log-normal (mu=1.5, sigma=0.75)
# convolution integral
f.Z <- function(z) integrate(function(x,z) f.Y(z-x)*f.X(x),-Inf,Inf,z)$value
f.Z <- Vectorize(f.Z)                    # need to vectorize the resulting fn.

set.seed(1)                              # for reproducible example
X <- rnorm(1000,1,0.5)
Y <- rlnorm(1000,1.5,0.75)
Z <- X + Y
# compare the methods
hist(Z,freq=F,breaks=50, xlim=c(0,30))
z <- seq(0,50,0.01)
lines(z,f.Z(z),lty=2,col="red")

Mix Distributions

It’s also possible to mix distributions:

M1  <- UnivarMixingDistribution(Norm(3,.25), Norm(1,.5))
plot(M1)

M2 <- UnivarMixingDistribution(Norm(5,.4), M1)
plot(M2)

M3  <- UnivarMixingDistribution(Binom(3,.3), Dirac(2), Norm(),  mixCoeff=c(1/4,1/5,11/20))
plot(M3)

## NULL

Truncation

We can truncate the distribution over a range:

T1 <- Truncate(Norm(), lower=-1, upper=2)
plot(T1)

T2 <- Truncate(Exp()+2, lower=2, upper=3)
plot(T2)

Minimum and Maximum

It’s possible to define the distribution that is minimum of maximum of two other distributions:

M1 <- Minimum(Norm(), Unif())
plot(M1)

M2 <- Maximum(Norm(), Unif(Min=-2,Max=2))
plot(M2)

plot(Minimum(M1,M2))

M3 <- Minimum(Norm(2,2), Pois(3))
plot(M3)

## NULL

Simulation of far out terms

The package is able to simulate events far into the tail:

F <- Truncate(Norm(), 20, 22)
r(F)(10)

##  [1] 20.00818 20.23106 20.00112 20.01749 20.05563 20.05762 20.02661
##  [8] 20.01330 20.10415 20.00146

hist(r(F)(1e4), breaks=40, prob=T)

p(Norm())(20, lower.tail=FALSE) # p(N>=20)

## [1] 2.753624e-89

Computing Expectations

library(distrEx)

id <- function(x) x
sq <- function(x) x^2

D <- Norm(0,1)

E(D, id)                # Expectation
## [1] 0
E(D, sq) - E(D,id)^2    # Variance
## [1] 0.9999942
E(D, sq)                # E(X^2), X ~ N(0,1)
## [1] 0.9999942
E(D, function(x) x^3)   # E(X^3)
## [1] 0
E(D, function(x) x^4)   # E(X^4)
## [1] 2.999831

E(D, function(x) abs(x)) # E_|x|[X], X ~ N(0,1)
## [1] 0.7978835