Tutorial 04: Uncertainty and integration
Finn Lindgren
Source:vignettes/Tutorial04.Rmd
Tutorial04.Rmd
Introduction
In this lab session you will explore
- using RMarkdown to organise text and code
- maximum likelihood estimator sampling distributions and approximate confidence interval construction
- Laplace approximation and importance sampling for approximate Bayesian credible interval construction
- Clone your
lab04-*
repository from https://github.com/StatComp21/ either on your own computer (new Project from version control) or to https://rstudio.cloud - If on rstudio.cloud, setup
GITHUB_PAT
credentials, like before. - Upgrade/install the
StatCompLab
package, see https://finnlindgren.github.io/StatCompLab/ - The repository has two files,
RMDemo.Rmd
andmy_code.R
. Make a copy ofRMDemo.Rmd
, and call itLab4.Rmd
- During this lab, modify the
Lab4.Rmd
document and add new code and text commentary for the lab to the document. (You can remove the demonstration parts of the file when you don’t need them anymore, and/or keep a separate copy of it.) When pressing the “knit” button, the RMarkdown file will be run in its own R environment, so you need to include any neededlibrary()
calls in a code chnk in the file, normally an initial “setup” chunk.
Solution:
The accompanying Tutorial04Solutions
tutorial/vignette
documents contain the solutions explicitly, to make it easier to review
the material after the workshops. The separate T4sol.Rmd
document at https://github.com/finnlindgren/StatCompLab/blob/main/vignettes/articles/T4sol.Rmd
is the source document for the standalone solution shown in
T4sol
on the StatCompLab
website.
Three alternatives for Poisson parameter confidence intervals
Consider the Poisson model for observations \(\boldsymbol{y}=\{y_1,\dots,y_n\}\): \[ \begin{aligned} y_i & \sim \mathsf{Poisson}(\lambda), \quad\text{independent for $i=1,\dots,n$.} \end{aligned} \] that has joint probability mass function \[ p(\boldsymbol{y}|\lambda) = \exp(-n\lambda) \prod_{i=1}^n \frac{\lambda^{y_i}}{y_i!} \] In the week 4 lecture, two parameterisations were considered. We now add a third option:
- \(\theta = \lambda\), and \(\widehat{\theta}_\text{ML}=\frac{1}{n}\sum_{i=1}^n y_i = \overline{y}\)
- \(\theta = \sqrt{\lambda}\), and \(\widehat{\theta}_\text{ML}=\sqrt{\overline{y}}\)
- \(\theta = \log(\lambda)\), and \(\widehat{\theta}_\text{ML}=\log\left(\overline{y}\right)\)
From the week 4 lecture, we know that the inverse expected Fisher information is \(\lambda/n\) for case 1 and \(1/(4n)\) for case 2. For case 3, show that the inverse expected Fisher information is \(1/(n\lambda)\).
Interval construction
Use the approximation method for large \(n\) from the lecture to construct approximate confidence intervals for \(\lambda\) using each of the three parameterisations. Define three functions, CI1, CI2, and CI3, each taking paramters
-
y
: a vector of observed values -
alpha
: the nominal error probability of the confidence intervals
To avoid having to specify alpha
in a common case, you
can use alpha = 0.05
in the function argument definition to
set a default value.
The function pmax
may be useful (see its help text).
You can use the following code to test your functions, storing each
interval as a row of a matrix with rbind
(“bind” as “rows”,
see also cbind
for combining columns):
## [1] 0 2 2 2 4
CI <- rbind(
"Method 1" = CI1(y),
"Method 2" = CI2(y),
"Method 3" = CI3(y)
)
colnames(CI) <- c("Lower", "Upper")
We can print the result as a table in our RMarkdown by using a
separate codechunk, calling the knitr::kable
function:
knitr::kable(CI)
Lower | Upper | |
---|---|---|
Method 1 | 0.7604099 | 3.239590 |
Method 2 | 0.9524829 | 3.431663 |
Method 3 | 1.0761094 | 3.717094 |
Will all three methods always produce a valid interval? Consider the
possible values of \(\overline{y}\).
Experiment with different values of n
and
lambda
in the simulation of y
.
For each approximate confidence interval construction method, we might ask the question of whether it fulfils the definition of an actual confidence interval construction method; that \(\mathsf{P}_{\boldsymbol{y}|\theta}(\theta\in \text{CI}(\boldsymbol{y})|\theta)\geq 1-\alpha\) for all \(\theta\) (or at least for a relevant subset of the parameter space). In coursework project 1, you will investigate the accuracy of some approximate confidence interval construction methods.
Bayesian credible intervals
Assume a true value of \(\lambda=10\), and simulate a sample of \(\boldsymbol{y}\) of size \(n=5\).
Now consider a Bayesian version of the Poisson model, with prior model \[ \lambda \sim \mathsf{Exp}(a) \] that has probability density function \(p(\lambda) = a \exp(-a \lambda)\).
One can show that the exact posterior distribution for \(\lambda\) given \(\boldsymbol{y}\) is a \(\mathsf{Gamma}(1 + \sum_{i=1}^n y_i, a + n)\) distribution (using the shape&rate parameterisation), and credible intervals can be constructed from quantiles of this distribution.
In cases where the theoretical construction is impractical, an alternative is to instead construct samples from the posterior distribution, and extract empirical quantiles from this sample. Here, we will use importance sampling to achieve this.
Let \(\theta=\log(\lambda)\), so that \(\lambda=\exp(\theta)\). Show that the prior probability density for \(\theta\) is \(p(\theta)=a \exp\left( \theta-ae^\theta \right)\).
Gaussian approximation
The posterior density function for \(\theta\) is \[ p(\theta|\boldsymbol{y}) = \frac{p(\theta) p(\boldsymbol{y}|\theta)}{p(\boldsymbol{y})} \] with log-density \[ \log p(\theta|\boldsymbol{y}) = \text{const} + \theta (1 + n\overline{y}) - (a+n)\exp(\theta) , \] and by taking derivatives we find the mode at \(\widetilde{\theta}=\log\left(\frac{1+n\overline{y}}{a+n}\right)\), and negated Hessian \(1+n\overline{y}\) at the mode.
With this information we can construct a Gaussian approximation to the posterior distribution, \(\widetilde{p}(\theta|\boldsymbol{y})\sim\mathsf{Normal}(\widetilde{\theta},\frac{1}{1+n\overline{y}})\).
Importance sampling
Simulate a sample \(\boldsymbol{x}=\{x_1,\dots,x_m\}\) from this Gaussian approximation of the posterior distribution, for some large \(m > 10000\), with hyperparameter \(a=1/5\).
We need to calculate unnormalised importance weights \(w_k\), \(k=1,\dots,m\), \[ w_k = \left.\frac{p(\theta)p(\boldsymbol{y}|\theta)}{\widetilde{p}(\theta|\boldsymbol{y})}\right|_{\theta=x_k} . \] Due to lack of normalisation, these “raw” weights cannot be represented accurately in the computer. To get around that issue, first compute the logarithm of the weights, \(\log(w_k)\), and then new, equivalent unnormalised weights \(\widetilde{w}_k=\exp[\log(w_k) - \max_j \log(w_j)]\).
Look at the help text for the function wquantile
(in the
StatCompLab package, from version 0.4.0) that computes quantiles from a
weighted sample, and construct a 95% credible interval for \(\theta\) using the \(\boldsymbol{x}\) sample and associate
weights, and then transform it into a credible interval for \(\lambda\)
Cumulative distribution function comparison
With ggplot
, use geom_function
to plot the
theoretical posterior cumulative distribution function for \(\lambda\) (the CDF from the Gamma
distribution given above, see pgamma()
) and compare it to
the approximation given by the importance sampling. The
stat_ewcdf()
function from the StatCompLab should be used
to plot the cdf for the weighted sample \(\lambda_k=\exp(x_k)\), with (unnormalised)
weights \(w_k\). Also include the
unweighted sample, with stat_ecwf()
. How close does the
approximations come to the true posterior distribution?