Skip to contents

Monte Carlo standard errors and confidence intervals

In plain Monte Carlo estimation of an expectation value, we can usually construct approximate confidence intervals by estimating the standard deviation of the estimator and construct an interval based on a Normal distribution for the estimator. However, this breaks down in some cases. For example, when using a randomisation test to estimate a p-value, the Normal approximation only works if the p-value pp or the number of samples NN are large enough, so that we actually observe non-zero counts. Fortunately, if we observe zero counts, we can construct a confidence interval using an exact method instead of relying on the Normal approximation.

Monte Carlo randomisation test variability

Let pp be the unknown p-value, and XβˆΌπ–‘π—‚π—‡(N,p)X\sim\mathsf{Bin}(N,p) be the random variable for how many times we observe a randomised test statistic as extreme as or more extreme than the observed test statistic. We observe X=xX=x and estimate the p-value with pΜ‚=x/N\widehat{p}=x/N. Then 𝖀(X)=Np,𝖡𝖺𝗋(X)=Np(1βˆ’p),𝖀(pΜ‚)=p,𝖡𝖺𝗋(pΜ‚)=p(1βˆ’p)N. \begin{aligned} \mathsf{E}(X) &= N p, \\ \mathsf{Var}(X) &= N p (1 - p), \\ \mathsf{E}(\widehat{p}) &= p, \\ \mathsf{Var}(\widehat{p}) &= \frac{p(1-p)}{N} . \end{aligned} We see that the variance decreases towards 00 when pβ†’0p\rightarrow 0. We can control the expectation of the absolute error. Due to Jensen’s inequality (you may or may not have heard of that!) 𝖀(|pΜ‚βˆ’p|)≀𝖀[|pΜ‚βˆ’p|2]=𝖡𝖺𝗋(pΜ‚)≀12N\mathsf{E}(|\widehat{p}-p|)\leq\sqrt{\mathsf{E}[|\widehat{p}-p|^2]}=\sqrt{\mathsf{Var}(\widehat{p})}\leq\frac{1}{2\sqrt{N}}, where the last step uses that the variance is maximised for p=1/2p=1/2. Thus, to guarantee 𝖀(|pΜ‚βˆ’p|)≀ϡ\mathsf{E}(|\widehat{p}-p|)\leq \epsilon for some Ο΅>0\epsilon>0, we can choose Nβ‰₯14Ο΅2N \geq \frac{1}{4 \epsilon^2}.

The relative error has variance 𝖡𝖺𝗋[pΜ‚βˆ’pp]=1βˆ’pNp\mathsf{Var}\left[\frac{\widehat{p}-p}{p}\right]=\frac{1-p}{Np} which goes to ∞\infty when pβ†’0p\rightarrow 0, so we cannot control the relative error uniformly for all pp by increasing NN. We therefore need to be careful when assessing results for small values of pp.

Normal approximation

When x>0x>0, a basic approximate 95% confidence interval for pp is given by CIp=pΜ‚Β±z0.975pΜ‚(1βˆ’pΜ‚)N=xNΒ±z0.975x(Nβˆ’x)N3. CI_p = \widehat{p} \pm z_{0.975} \sqrt{\frac{\widehat{p} (1-\widehat{p})}{N}} = \frac{x}{N} \pm z_{0.975} \sqrt{\frac{x (N-x)}{N^3}} . With the approximation z0.975β‰ˆ2z_{0.975}\approx 2, we can limit the interval width with 414N≀ϡ4\sqrt{\frac{1}{4N}}\leq \epsilon by taking Nβ‰₯4Ο΅2N\geq \frac{4}{\epsilon^2}. For Ο΅=0.02\epsilon=0.02, we get Nβ‰₯10,000N\geq 10,000.

Exact interval for x=0x=0

When the observed count is x=0x=0, we can go back to the definition of a confidence interval and how we can construct confidence intervals by β€œinverting a test”; the interval is taken to be the set for which the corresponding null hypothesis is not rejected.

Imagine that we reject the hypothesis H0:p=p0H_0:p=p_0 for some p0p_0 if 𝖯X(X≀x∣p0)<0.025\mathsf{P}_X(X \leq x \mid p_0) < 0.025 or 𝖯X(Xβ‰₯x∣p0)<0.025\mathsf{P}_X(X \geq x \mid p_0) < 0.025 (to nominally give equal tail error probability). When x=0x=0, the second probability is equal to 11, so that condition is never used. The test is therefore only rejected when 𝖯X(X=0∣p0)<0.025\mathsf{P}_X(X = 0 \mid p_0) < 0.025. We solve for p0p_0: 𝖯X(X=0∣p0)=(1βˆ’p0)N<0.0251βˆ’p0<0.0251/Np0>1βˆ’0.0251/N. \begin{aligned} \mathsf{P}_X(X = 0 \mid p_0) = (1-p_0)^N &< 0.025 \\ 1-p_0 < 0.025^{1/N} \\ p_0 > 1 - 0.025^{1/N} . \end{aligned} The set of p0p_0 values for which the test is not rejected is p0≀1βˆ’0.0251/Np_0 \leq 1-0.025^{1/N}, so when x=0x=0 we can define the confidence interval for pp as CIp=(0,1βˆ’0.0251/N). CI_p = (0, 1-0.025^{1/N}) . To limit the width of such confidence intervals to at most some Ο΅\epsilon, we need 1βˆ’0.0251/N≀ϡ1-0.025^{1/N}\leq \epsilon, and Nβ‰₯log(0.025)log(1βˆ’Ο΅)N\geq \frac{\log(0.025)}{\log(1-\epsilon)}. This grows much more slowly than 4Ο΅2\frac{4}{\epsilon^2} when Ο΅β†’0\epsilon\rightarrow 0, so we can safely use the NN that’s required to bound the Normal approximation interval width and still fulfill the interval width criterion for the x=0x=0 interval construction.

Bayesian credible interval construction

An alternative approach is to construct a Bayesian credible interval for pp. Let pβˆΌπ–΄π—‡π—‚π–Ώ(0,1)p\sim\mathsf{Unif}(0,1) a priori. The posterior density for pp given xx is then proportional to 𝖯(X=x|p)=(Nx)px(1βˆ’p)Nβˆ’x, \mathsf{P}(X=x|p) = {N \choose x} p^x(1-p)^{N-x}, which shows that the posterior distribution for pp is 𝖑𝖾𝗍𝖺(1+x,1+Nβˆ’x)\mathsf{Beta}(1+x, 1+N-x), and a credible interval is provided by the 0.0250.025 and 0.9750.975 quantiles. In R, qbeta(c(0.025, 0.975), shape1 = 1 + x, shape2 = 1 + N - x). This construction works for all Nβ‰₯1N\geq 1 and all 0≀x≀N0\leq x \leq N.

Prediction standard deviations

The predict() function can provide prediction standard errors for the linear predictor, with se.fit, but those are only half the story when predicting new data. The standard errors only include the uncertainty information about the linear predictor curve. For full prediction uncertainty, we need to take the observation variation into account, which lm() estimated via the variance of the residuals. Since the residuals for new observations is assumed to be conditionally independent of the predictor curve, the prediction variance can be estimated as the sum of the square of the prediction standard error and the residual variance, if the degrees of freedom is large. For the help text for lm() we see that when se.fit=TRUE, the output list contains the elements

  • fit: vector or matrix (depending on the interval argument)
  • se.fit: standard error of predicted means
  • residual.scale: residual standard deviations
  • df: degrees of freedom for residual
# When creating a tibble, the construct can use variables defined
# first in the later variables; here we use x when constructing y:
df <- tibble(x = rnorm(10),
             y = 2 + x + rnorm(10, sd = 0.1))
# Estimate a model:
fit <- lm(y ~ x, data = df)
# Compute prediction mean and standard deviations and add to df_pred:
df_pred <- data.frame(x = seq(-2, 2, length.out = 100))
pred <- predict(fit, newdata = df_pred, se.fit = TRUE)
df_pred <- df_pred %>%
  mutate(mean = pred$fit,
         se.fit = pred$se.fit,
         sd = sqrt(pred$se.fit^2 + pred$residual.scale^2))
ggplot(df_pred) +
  geom_line(aes(x, se.fit, colour = "se.fit")) +
  geom_line(aes(x, sd, colour = "sd")) +
  ylab("Std. deviations")

If we also ask for prediction intervals, we need to modify the code a bit. From comparing the interval width results from predict() with those from an interval assuming t-distributions, we see that they are identical up to floating point accuracy.

# Compute prediction mean and standard deviations and add to df_pred:
df_pred <- data.frame(x = seq(-2, 2, length.out = 100))
pred <- predict(fit, newdata = df_pred, se.fit = TRUE, interval = "prediction")
df_pred <- df_pred %>%
  mutate(mean = pred$fit[, "fit"],
         lwr = pred$fit[, "lwr"],
         upr = pred$fit[, "upr"],
         se.fit = pred$se.fit,
         sd = sqrt(pred$se.fit^2 + pred$residual.scale^2))
ggplot(df_pred) +
  geom_line(aes(x, upr - lwr - (qt(0.975, pred$df) - qt(0.025, pred$df)) * sd)) +
  ylab("Interval width difference") +
  xlab("x")

Handling non-Gaussian precipitation

While the assessment methods requested in the project description are valid for non-Gaussian predictions, the non-Gaussianity of the precipitation data is still very noticeable on the monthly average scale. An effect of this is that the constant variance assumption of a basic lm() model isn’t a good fit to the data. To improve this, you can take the square root of the monthly averages before applying the modelling in part 2. You may also do all the model and prediction assessment on this square-root version of the data.