Refs:
Say we have \(n\) iid uniform rvs
\[X_i \sim U(0,1), i=1 \ldots n\]
The cdf of their minimum \(Y=\min(X_1,\ldots, X_n)\) is:
\[ \begin{array}{lclr} p(Y \leq x) & = & 1 - p(Y \geq x) & \\ & = & 1- \prod_{i=1}^n p(X_i \geq x) & \color{blue}{ Y = \min(X_i) } \\ & = & 1- p(X \geq x)^n & \color{blue}{ X_i~\text{iid}~ X \sim U(0,1) } \\ & = & 1- (1-p(X \leq x))^n & \\ & = & 1- (1-x)^n & \color{blue}{ p(X \leq x) = x } \\ \end{array} \]
Thus the pdf for \(Y\) is
\[f_Y(x) = \frac{d}{dx} P(Y \leq x) = n(1-x)^{n-1}\]
We can make a simulation to confirm this result:
n <- 10
pdf.min <- function(x) { # pdf function for the minimum
n*(1-x)^(n-1)
}
sample.min <- function() { # miminum of sample with n U(0,1) rvs
min(runif(n))
}
sim.min <- replicate(1e5, sample.min()) # simulation
hist(sim.min, breaks=50, prob=T, main="pdf of Y")
curve(pdf.min, 0, 1, col="red", lwd=2, add=T)
The maximum \(Z = \max(X_1,\ldots, X_n)\) has similar development:
\[ \begin{array}{lclr} p(Z \leq x) & = & \prod_{i=1}^n p(X_i \geq x) & \\ & = & x^n & \color{blue}{ p(X \leq x) = x } \\ \end{array} \]
so, the pdf of \(Z\) is
\[f_Z(x) = nx^{n-1}\]
Again:
n <- 10
pdf.max <- function(x) { # pdf function for the minimum
n*x^(n-1)
}
sample.max <- function() { # miminum of sample with n U(0,1) rvs
max(runif(n))
}
sim.max <- replicate(1e5, sample.max()) # simulation
hist(sim.max, breaks=50, prob=T, main="pdf of Z")
curve(pdf.max, 0, 1, col="red", lwd=2, add=T)
The distribution of the range \(R=Z-Y\) of these \(n\) values should be something like this:
hist(sim.max-sim.min, breaks=50, prob=T, main="approximate pdf of R=Z-Y")
which resembles a beta distribution. But is it? Notice that the true pdf for \(R\) is not the difference \(Z-Y\) because they are not independent. To compute \(R\)’s cdf we assume that \(x\) is the minimum value and the range is \(d\).
There are two mutually exclusive events:
\(x<1-d\) so that we have a range \([x,x+d]\). This means two events happening, the minimum \(Y=x\) and all the remaining \(n-1\) points are within the interval which has length \(d/(1-x)\), let’s call this event \(W\).
\(x>1-d\) so that we have range \([x,1]\), ie, the minimum \(Y \geq 1-d\), ie, all \(n\) points are within a range \(d\).
\[ \begin{array}{lclr} p(R \leq d) & = & \int_0^{1-d} f_Y(x) p(W) dx + p(Y \geq 1-d) & \\ & = & \int_0^{1-d} n(1-x)^{n-1} \left( \frac{d}{1-x} \right) ^{n-1} dx + d^n & \\ & = & \int_0^{1-d} n d^{n-1} dx + d^n & \\ & = & n d^{n-1} (1-d) + d^n & \\ \end{array} \]
To find the pdf:
\[f_R(x) = \frac{d}{dx} n x^{n-1} (1-x) + x^n = (1-x) x^{n-2} (n-1) n\]
We see that \(R \sim \text{Beta}(n-1,2)\)
pdf.range <- function(x) {
(1-x)*x^(n-2)*(n-1)*n
}
pdf.beta <- function(x) dbeta(x,n-1,2)
hist(sim.max-sim.min, breaks=50, prob=T, main="pdf of R=Z-Y")
curve(pdf.range, 0, 1, col="blue", lwd=6, add=T)
curve(pdf.beta, 0, 1, col="red", lwd=2, add=T)
If we ask what is the probability for a sample range to be greater than a value \(c\), we need to compute \(p(R \geq c)\)
\[\int_c^1 n(n-1)x^{n-2}(1-x) dx = 1 - c^{n-1} (n-c(n-1))\]
We can ask now what should the minimum \(n\) be so that the probability is greater than \(0.5\) for the sample range to be \(90\%\) of total range, ie, \(c=0.9\).
f <- function(n,c) {
1 - c^(n-1)*(n-c*(n-1))
}
ns <- 1:60
plot(ns,f(ns,.9), type="l", col="blue")
n <- which(f(ns,.9)>0.5)[1]
abline(v=n, lty=2, col="red")
We need n=17 samples.