Introduction
- In this lab session you will explore Monte Carlo integration and importance sampling.
- Open your github repository clone project from Lab 2 (either on https://rstudio.cloud or on your own
computer and upgrade the
StatCompLab
package (see the https://finnlindgren.github.io/StatCompLab/ for more details) - Save your work for this lab in one or several new files in the project, and commit and push the changes to github (see Lab 2 for more information)
- In your code script, start with
library(StatCompLab)
to get access to thearch_loglike
function needed for one of the tasks.
Solution:
The accompanying Tutorial03Solutions
tutorial document
contains the solutions explicitly, to make it easier to review the
material after the workshops. You can also run the document in the
Tutorials pane in RStudio.
Overdispersed Poisson distribution
Weβll use an overdispersed Poisson distribution as the first example. In an ordinary Poisson distribution with expectation , the variance is also . The coefficient of variation is defined as the ratio between the standard deviation and the expectation, which gives . In real applications, one often observes data where the coefficient of variation is larger than the Poisson model would give. One way of dealing with that is to add a latent random effect; a hidden layer of random values that βnudgesβ the expectation for each observation, which increases the observed coefficient of variation. The model can be written in two steps: where the conditional expectation for given and is , and is a (fixed) parameter. The marginal expectation (but still conditionally on and ) can be obtained via the tower property (law of total expectation), where the last step comes from the expectation of a log-Normal distribution. For multiple observations, we can write the model as
Consider the following simulation code:
What is the expectation of ?
- Theory exercise: Using the tower property (law of total variance), theoretically derive an expression for . You can get the needed log-Normal variance expression from https://en.wikipedia.org/wiki/Log-normal_distribution
- Monte Carlo integration: Use Monte Carlo integration to approximate the probability mass function for , with and . Check with the formulas above what the resulting theoretical expectation is for for this parameter combination.
For this to be efficient, you should vectorise the calculations by evaluating the conditional probability function for over all the values at once, for each simulated value value, for . Use samples.
Plot the resulting probability mass function together with the ordinary Poisson probability mass function with the same expectation value, . (Use the theory above to convince yourself that these two models for have the same expectation.)
-
Reusable function: Generalise the code to a
function
doverpois
(for βdβensity for over-dispersed Poisson) that takesm
,mu
,sigma
, andK
as input and returns adata.frame
suitable for use withggplot
.
Use the function to plot results for
,
,
with
by adding P_Y
and P_Poisson
geoms to
Archaeology in the Baltic sea

``Anno Domini MCCCLXI feria III post Jacobi ante portas Visby in manibus Danorum ceciderunt Gutenses, hic sepulti, orate pro eis!ββEnglish translation:
``In the year of our Lord 1361, on the third day after St.Β Jacob, the Goth fell outside the gates of Visby at the hands of the Danish. They are buried here. Pray for them!ββ
Strategically located in the middle of the Baltic sea, the island of Gotland had shifting periods of being partly self-governed, and in partial control by the Hanseatic trading alliance, Sweden, Denmark, and the Denmark-Norway-Sweden union, until settling as part of Sweden in 1645. Gotland has an abundance of archaeological treasures, with coins dating back to Viking era trade routes via Russia to the Arab Caliphates. and captured the rich Hanseatic town of Visby.
In 1361 the Danish king Valdemar Atterdag conquered Gotland. The conquest was followed by a plunder of Visby. Most of the defenders (primarily local farmers that could not take shelter inside the city walls) were killed in the attack and are buried in a field, Korsbetningen (Literal translation: the grazing field that is marked by a cross, as shown in the picture), outside of the walls of Visby.
In the 1920s the gravesite was subject to several archaeological excavations. A total of femurs (thigh bones) ( left, and right) were found. We want to figure out how many persons were likely buried at the gravesite. It must reasonably have been at least , but how many more?
Statistical model
To build a simple model for this problem, we assume that the number of left () and right () femurs are two independent observations from a distribution. Here is the total number of people buried and is the probability of finding a femur, left or right, and both and are unknown parameters.
The probability function for a single observation
is
The function
arch_loglike()
in the StatCompLab
package
evaluates the combined log-likelihood
for a collection
of
-observations.
If a data.frame
with columns N
and
phi
is provided, the log-likelihood for each row-pair
is returned.
The combined for the data set is then given by
The task is to use Monte Carlo integration to estimate the posterior expectations of and , when the prior distributions are , , and , .
The mathematical definitions are: Let
have a
,
,
prior distribution, and let
have a
,
,
prior distribution:
and the probability mass function
can be evaluated with dgeom()
in R, and the density
can be evaluated with dbeta
.
Bayesian estimation
Before the excavation took place, the archaeologist believed that around individuals were buried, and that they would find around half on the femurs. To encode that belief in the Bayesian analysis, set , which corresponds to an expected total count of , and , which makes more likely to be close to than to or .
The posterior density/probability function for is
We can estimate with Monte Carlo integration, the the same time as using importance sampling to estimate the posterior expectations. The theoretical expressions
Monte Carlo integration
What might a good choice of sampling distribution be? In this first
approach to the problem, letβs use the prior distributions as sampling
distributions for the Monte Carlo integration. This means that we need
to sample
and
,
and compute Monte Carlo estimates
Write a function estimate
that takes inputs y
, xi
, a
,
b
, and K
and implements the above Monte Carlo
integration method.
Test the function by running
estimate(y=c(237,256), xi=1/1001, a=0.5, b=0.5, K=10000)
Note: as shown in lecture 3, the conditional posterior distribution for for fixed is . This can be used to construct a potentially more efficient method, where each is sampled conditionally on , and the integration importance weights adjusted appropriately.