Skip to contents

Introduction

  • In this lab session you will explore Monte Carlo integration and importance sampling.
  • Open your github repository clone project from Lab 2 (either on https://rstudio.cloud or on your own computer and upgrade the StatCompLab package (see the https://finnlindgren.github.io/StatCompLab/ for more details)
  • Save your work for this lab in one or several new files in the project, and commit and push the changes to github (see Lab 2 for more information)
  • In your code script, start with library(StatCompLab) to get access to the arch_loglike function needed for one of the tasks.

Solution:

The accompanying Tutorial03Solutions tutorial document contains the solutions explicitly, to make it easier to review the material after the workshops. You can also run the document in the Tutorials pane in RStudio.


Overdispersed Poisson distribution

We’ll use an overdispersed Poisson distribution as the first example. In an ordinary Poisson distribution YβˆΌπ–―π—ˆπ—‚π—Œπ—Œπ—ˆπ—‡(Ξ»)Y\sim\mathsf{Poisson}(\lambda) with expectation Ξ»\lambda, the variance is also Ξ»\lambda. The coefficient of variation is defined as the ratio between the standard deviation and the expectation, which gives 1/Ξ»1/\sqrt{\lambda}. In real applications, one often observes data where the coefficient of variation is larger than the Poisson model would give. One way of dealing with that is to add a latent random effect; a hidden layer of random values that β€œnudges” the expectation for each observation, which increases the observed coefficient of variation. The model can be written in two steps: XβˆΌπ–­π—ˆπ—‹π—†π–Ίπ—…(0,Οƒ2),(Y|X=x)βˆΌπ–―π—ˆπ—‚π—Œπ—Œπ—ˆπ—‡[exp(ΞΌ+x)], \begin{aligned} X & \sim \mathsf{Normal}(0,\sigma^2), \\ (Y|X=x) & \sim \mathsf{Poisson}[\exp(\mu + x)], \end{aligned} where the conditional expectation for YY given X=xX=x and ΞΌ\mu is Ξ»(ΞΌ,x)=exp(ΞΌ+x)\lambda(\mu,x)=\exp(\mu+x), and ΞΌ\mu is a (fixed) parameter. The marginal expectation (but still conditionally on ΞΌ\mu and Οƒ\sigma) can be obtained via the tower property (law of total expectation), 𝖀(Y|ΞΌ,Οƒ)=𝖀[𝖀(Y|X)]=𝖀[Ξ»(ΞΌ,X)]=𝖀[exp(ΞΌ+X)]=exp(ΞΌ+Οƒ2/2) \begin{aligned} \mathsf{E}(Y|\mu,\sigma) &= \mathsf{E}[\mathsf{E}(Y|X)] = \mathsf{E}[\lambda(\mu,X)] = \mathsf{E}[\exp(\mu+X)] = \exp(\mu + \sigma^2/2) \end{aligned} where the last step comes from the expectation of a log-Normal distribution. For multiple observations, we can write the model as xiβˆΌπ–­π—ˆπ—‹π—†π–Ίπ—…(0,Οƒ2), independent for each i=1,…,n,(yi|xi)βˆΌπ–―π—ˆπ—‚π—Œπ—Œπ—ˆπ—‡[exp(ΞΌ+xi)], conditionally independent for each i=1,…,n. \begin{aligned} x_i & \sim \mathsf{Normal}(0,\sigma^2), \text{ independent for each $i=1,\dots,n$,}\\ (y_i|x_i) & \sim \mathsf{Poisson}[\exp(\mu + x_i)], \text{ conditionally independent for each $i=1,\dots,n$}. \end{aligned}

Consider the following simulation code:

Y <- rpois(30, lambda = 3 * exp(-2 + rnorm(30, sd = 2)))

What is the expectation of YY?

  1. Theory exercise: Using the tower property (law of total variance), theoretically derive an expression for 𝖡𝖺𝗋(Y)\mathsf{Var}(Y). You can get the needed log-Normal variance expression from https://en.wikipedia.org/wiki/Log-normal_distribution
  1. Monte Carlo integration: Use Monte Carlo integration to approximate the probability mass function pY(m|ΞΌ,Οƒ)=𝖯(Y=m|ΞΌ,Οƒ)=βˆ«βˆ’βˆžβˆžπ–―(Y=m|ΞΌ,Οƒ,X=x)pX(x|Οƒ)dx \begin{aligned} p_Y(m|\mu,\sigma)=\mathsf{P}(Y=m|\mu,\sigma) &= \int_{-\infty}^\infty \mathsf{P}(Y=m|\mu,\sigma,X=x) p_X(x|\sigma) \,\mathrm{d}x \end{aligned} for m=0,1,2,3,…,15m=0,1,2,3,\dots,15, with ΞΌ=log(2)βˆ’1/2\mu=\log(2)-1/2 and Οƒ=1\sigma=1. Check with the formulas above what the resulting theoretical expectation is for YY for this parameter combination.

For this to be efficient, you should vectorise the calculations by evaluating the conditional probability function for YY over all the mm values at once, for each simulated value x[k]βˆΌπ–­π—ˆπ—‹π—†π–Ίπ—…(0,Οƒ2)x^{[k]}\sim\mathsf{Normal}(0,\sigma^2) value, for k=1,2,…,Kk=1,2,\dots,K. Use K=10000K=10000 samples.

Plot the resulting probability mass function pY(m|ΞΌ,Οƒ)p_Y(m|\mu,\sigma) together with the ordinary Poisson probability mass function with the same expectation value, Ξ»=2\lambda = 2. (Use the theory above to convince yourself that these two models for YY have the same expectation.)

  1. Reusable function: Generalise the code to a function doverpois (for β€œd”ensity for over-dispersed Poisson) that takes m, mu, sigma, and K as input and returns a data.frame suitable for use with ggplot.

Use the function to plot results for ΞΌ=log(8)βˆ’1/8\mu=\log(8)-1/8, Οƒ=1/2\sigma = 1/2, with m=0,1,…,30m=0,1,\dots,30 by adding P_Y and P_Poisson geoms to

ggplot(doverpois(m = 0:30, mu = log(8) - 0.125, sigma = 0.5, K = 10000))

Archaeology in the Baltic sea

The Waldemar cross, source: Wikipedia
The Waldemar cross, source: Wikipedia
Original inscription, in Latin:
``Anno Domini MCCCLXI feria III post Jacobi ante portas Visby in manibus Danorum ceciderunt Gutenses, hic sepulti, orate pro eis!’’

English translation:
``In the year of our Lord 1361, on the third day after St.Β Jacob, the Goth fell outside the gates of Visby at the hands of the Danish. They are buried here. Pray for them!’’

Strategically located in the middle of the Baltic sea, the island of Gotland had shifting periods of being partly self-governed, and in partial control by the Hanseatic trading alliance, Sweden, Denmark, and the Denmark-Norway-Sweden union, until settling as part of Sweden in 1645. Gotland has an abundance of archaeological treasures, with coins dating back to Viking era trade routes via Russia to the Arab Caliphates. and captured the rich Hanseatic town of Visby.

In 1361 the Danish king Valdemar Atterdag conquered Gotland. The conquest was followed by a plunder of Visby. Most of the defenders (primarily local farmers that could not take shelter inside the city walls) were killed in the attack and are buried in a field, Korsbetningen (Literal translation: the grazing field that is marked by a cross, as shown in the picture), outside of the walls of Visby.

In the 1920s the gravesite was subject to several archaeological excavations. A total of 493493 femurs (thigh bones) (256256 left, and 237237 right) were found. We want to figure out how many persons were likely buried at the gravesite. It must reasonably have been at least 256256, but how many more?

Statistical model

To build a simple model for this problem, we assume that the number of left (y1=256y_1=256) and right (y2=237y_2=237) femurs are two independent observations from a 𝖑𝗂𝗇(N,Ο•)\mathsf{Bin}(N,\phi) distribution. Here NN is the total number of people buried and Ο•\phi is the probability of finding a femur, left or right, and both NN and Ο•\phi are unknown parameters.

The probability function for a single observation yβˆΌπ–‘π—‚π—‡(N,Ο•)y\sim\mathsf{Bin}(N,\phi) is p(y|N,Ο•)=(Ny)Ο•y(1βˆ’Ο•)Nβˆ’y.\begin{align*} p(y|N,\phi) &= {N \choose y} \phi^y (1-\phi)^{N-y} . \end{align*} The function arch_loglike() in the StatCompLab package evaluates the combined log-likelihood log[p(𝐲|N,Ο•)]\log[p(\boldsymbol{y}|N,\phi)] for a collection 𝐲\boldsymbol{y} of yy-observations. If a data.frame with columns N and phi is provided, the log-likelihood for each row-pair (N,Ο•)(N,\phi) is returned.

The combined l(y1,y2|N,ΞΈ)=logp(y1,y2|N,Ο•)l(y_1,y_2|N,\theta)=\log p(y_1,y_2|N,\phi) for the data set {y1,y2}\{y_1,y_2\} is then given by l(y1,y2|N,ΞΈ)=βˆ’logΞ“(y1+1)βˆ’logΞ“(y2+1)=βˆ’logΞ“(Nβˆ’y1+1)βˆ’logΞ“(Nβˆ’y2+1)+2logΞ“(N+1)=+(y1+y2)log(Ο•)+(2Nβˆ’y1βˆ’y2)log(1βˆ’Ο•) \begin{aligned} l(y_1,y_2|N,\theta) &= -\log\Gamma(y_1+1) - \log\Gamma(y_2+1) \\&\phantom{=~} - \log\Gamma(N-y_1+1) - \log\Gamma(N-y_2+1) + 2\log\Gamma(N+1) \\&\phantom{=~} + (y_1+y_2) \log(\phi) + (2 N - y_1 - y_2)\log(1-\phi) \end{aligned}

The task is to use Monte Carlo integration to estimate the posterior expectations of NN and Ο•\phi, when the prior distributions are NβˆΌπ–¦π–Ύπ—ˆπ—†(ΞΎ)N\sim\mathsf{Geom}(\xi), 0<ΞΎ<10<\xi<1, and Ο•βˆΌπ–‘π–Ύπ—π–Ί(a,b)\phi\sim\mathsf{Beta}(a, b), a,b>0a,b>0.

The mathematical definitions are: Let NN have a π–¦π–Ύπ—ˆπ—†(ΞΎ)\mathsf{Geom}(\xi), ΞΎ>0\xi>0, prior distribution, and let Ο•\phi have a 𝖑𝖾𝗍𝖺(a,b)\mathsf{Beta}(a,b), a,b>0a,b>0, prior distribution: pN(n)=𝖯(N=n)=ΞΎ(1βˆ’ΞΎ)n,n=0,1,2,3,…,pΟ•(Ο•)=Ο•aβˆ’1(1βˆ’Ο•)bβˆ’1B(a,b),Ο•βˆˆ[0,1]. \begin{aligned} p_N(n) = \mathsf{P}(N=n) &= \xi\,(1-\xi)^n,\quad n=0,1,2,3,\dots, \\ p_\phi(\phi) &= \frac{\phi^{a-1}(1-\phi)^{b-1}}{B(a,b)}, \quad \phi\in[0,1] . \end{aligned} and the probability mass function pN(n)p_N(n) can be evaluated with dgeom() in R, and the density pΟ•(Ο•)p_\phi(\phi) can be evaluated with dbeta.

Bayesian estimation

Before the excavation took place, the archaeologist believed that around 10001000 individuals were buried, and that they would find around half on the femurs. To encode that belief in the Bayesian analysis, set ΞΎ=1/(1+1000)\xi=1/(1+1000), which corresponds to an expected total count of 10001000, and a=b=2a=b=2, which makes Ο•\phi more likely to be close to 1/21/2 than to 00 or 11.

The posterior density/probability function for (N,Ο•|𝐲)(N,\phi|\boldsymbol{y}) is pN,Ο•|𝐲(n,Ο•|𝐲)=pN,Ο•,𝐲(n,Ο•,𝐲)p𝐲(𝐲)=pN(n)pΟ•(Ο•)p(𝐲|n,Ο•)p𝐲(𝐲) \begin{aligned} p_{N,\phi|\boldsymbol{y}}(n,\phi|\boldsymbol{y}) &= \frac{p_{N,\phi,\boldsymbol{y}}(n,\phi,\boldsymbol{y})}{p_{\boldsymbol{y}}(\boldsymbol{y})} = \frac{p_N(n)p_\phi(\phi)p(\boldsymbol{y}|n,\phi)}{p_{\boldsymbol{y}}(\boldsymbol{y})} \end{aligned}

We can estimate p𝐲(𝐲)p_{\boldsymbol{y}}(\boldsymbol{y}) with Monte Carlo integration, the the same time as using importance sampling to estimate the posterior expectations. The theoretical expressions p𝐲(𝐲)=βˆ‘n=max(y1,y2)∞∫01pN,Ο•,𝐲(n,Ο•,𝐲)dΟ•=βˆ‘n=max(y1,y2)∞∫01p(𝐲|n,Ο•)pN(n)pΟ•(Ο•)dϕ𝖀(N|𝐲)=βˆ‘n=max(y1,y2)∞∫01npN,Ο•|𝐲(n,Ο•)dΟ•=βˆ‘n=max(y1,y2)∞∫01np(𝐲|n,Ο•)p𝐲(𝐲)pN(n)pΟ•(Ο•)dϕ𝖀(Ο•|𝐲)=βˆ‘n=max(y1,y2)∞∫01Ο•pN,Ο•|𝐲(n,Ο•)dΟ•=βˆ‘n=max(y1,y2)∞∫01Ο•p(𝐲|n,Ο•)p𝐲(𝐲)pN(n)pΟ•(Ο•)dΟ• \begin{aligned} p_{\boldsymbol{y}}(\boldsymbol{y}) &= \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 p_{N,\phi,\boldsymbol{y}}(n,\phi,\boldsymbol{y}) \,\mathrm{d}\phi = \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 p(\boldsymbol{y}|n,\phi) p_N(n) p_\phi(\phi) \,\mathrm{d}\phi\\ \mathsf{E}(N|\boldsymbol{y}) &= \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 n\,p_{N,\phi|\boldsymbol{y}}(n,\phi) \,\mathrm{d}\phi = \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 n\,\frac{p(\boldsymbol{y}|n,\phi)}{p_{\boldsymbol{y}}(\boldsymbol{y})} p_N(n) p_\phi(\phi) \,\mathrm{d}\phi\\ \mathsf{E}(\phi|\boldsymbol{y}) &= \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 \phi\,p_{N,\phi|\boldsymbol{y}}(n,\phi) \,\mathrm{d}\phi = \sum_{n=\max(y_1,y_2)}^\infty \int_0^1 \phi\,\frac{p(\boldsymbol{y}|n,\phi)}{p_{\boldsymbol{y}}(\boldsymbol{y})} p_N(n) p_\phi(\phi) \,\mathrm{d}\phi \end{aligned}

Monte Carlo integration

What might a good choice of sampling distribution be? In this first approach to the problem, let’s use the prior distributions as sampling distributions for the Monte Carlo integration. This means that we need to sample n[k]βˆΌπ–¦π–Ύπ—ˆπ—†(ΞΎ)n^{[k]}\sim\mathsf{Geom}(\xi) and Ο•[k]βˆΌπ–‘π–Ύπ—π–Ί(a,b)\phi^{[k]}\sim\mathsf{Beta}(a,b), and compute Monte Carlo estimates p̂𝐲(𝐲)=1Kβˆ‘k=1Kp(𝐲|n[k],Ο•[k])𝖀̂(N|𝐲)=1Kp̂𝐲(𝐲)βˆ‘k=1Kn[k]p(𝐲|n[k],Ο•[k])𝖀̂(Ο•|𝐲)=1Kp̂𝐲(𝐲)βˆ‘k=1KΟ•[k]p(𝐲|n[k],Ο•[k]) \begin{aligned} \widehat{p}_{\boldsymbol{y}}(\boldsymbol{y}) &= \frac{1}{K} \sum_{k=1}^K p(\boldsymbol{y}|n^{[k]},\phi^{[k]})\\ \widehat{\mathsf{E}}(N|\boldsymbol{y}) &= \frac{1}{K \widehat{p}_{\boldsymbol{y}}(\boldsymbol{y})} \sum_{k=1}^K n^{[k]} p(\boldsymbol{y}|n^{[k]},\phi^{[k]})\\ \widehat{\mathsf{E}}(\phi|\boldsymbol{y}) &= \frac{1}{K \widehat{p}_{\boldsymbol{y}}(\boldsymbol{y})} \sum_{k=1}^K \phi^{[k]} p(\boldsymbol{y}|n^{[k]},\phi^{[k]})\\ \end{aligned} Write a function estimate that takes inputs y, xi, a, b, and K and implements the above Monte Carlo integration method.

Test the function by running estimate(y=c(237,256), xi=1/1001, a=0.5, b=0.5, K=10000)

Note: as shown in lecture 3, the conditional posterior distribution for Ο•\phi for fixed N=nN=n is (Ο•|n,𝐲)βˆΌπ–‘π–Ύπ—π–Ί(a+y1+y2,b+2nβˆ’y1βˆ’y2)(\phi|n,\boldsymbol{y})\sim \mathsf{Beta}(a+y_1+y_2,b+2n-y_1-y_2). This can be used to construct a potentially more efficient method, where each Ο•[k]\phi^{[k]} is sampled conditionally on n[k]n^{[k]}, and the integration importance weights adjusted appropriately.