Tutorial 06: Prediction assessment with proper scores
Finn Lindgren
Source:vignettes/Tutorial06.Rmd
Tutorial06.Rmd
Introduction
In this lab session you will explore
- Probabilistic prediction and proper scores
- Open your github repository clone project from Tutorial 2 or 4
(either on https://rstudio.cloud on
your own computer, and upgrade the
StatCompLab
package. - During this lab, you can either work in a
.R
file or a new.Rmd
file.
Solution:
The accompanying Tutorial06Solutions
tutorial/vignette
documents contain the solutions explicitly, to make it easier to review
the material after the workshops.
3D printer
The aim is to build and assess statistical models of material use in a 3D printer.1 The printer uses rolls of filament that gets heated and squeezed through a moving nozzle, gradually building objects. The objects are first designed in a CAD program (Computer Aided Design), that also estimates how much material will be required to print the object.
The data can be loaded with
data("filament1", package = "StatCompLab")
, and contains
information about one 3D-printed object per row. The columns are
-
Index
: an observation index -
Date
: printing dates -
Material
: the printing material, identified by its colour -
CAD_Weight
: the object weight (in gram) that the CAD software calculated -
Actual_Weight
: the actual weight of the object (in gram) after printing
If the CAD system and printer were both perfect, the
CAD_Weight
and Actual_Weight
values would be
equal for each object. In reality, there is both random variation, for
example due to varying humidity and temperature, and systematic
deviations due to the CAD system not having perfect information about
the properties of the printing materials.
When looking at the data (see below) it’s clear that the variability
of the data is larger for larger values of CAD_Weight
. The
printer operator wants to know which of two models, named A and B, are
better at capturing the distributions of the random deviations.
Both models use a linear model for coneection between
CAD_Weight
and Actual_Weight
. We denote the
CAD weight for observations
by x_i
, and the corresponding actual weight by
.
The two models are defined by
- Model A:
- Model B:
The printer operator reasons that due to random fluctuations in the material properties (such as the density) and room temperature should lead to a relative error instead of an additive error, and this leads them to model B as an approximation of that. The basic physics assumption is that the error in the CAD software calculation of the weight is proportional to the weight itself. Model A on the other hand is slightly more mathematically convenient, but has no such motivation in physics.
Start by loading the data and plot it.
Estimate and predict
Next week, we will assess model predictions using cross validation, where data is split into separate parts for parameter estimation and prediction assessment, in order to avoid or reduce bias in the prediction assessments. For simplicity this week, we will start by using the entire data set for both parameter estimation and prediction assessment.
Estimate
First, use filament1_estimate()
from the
StatCompLab
package to estimate the two models A and B
using the filament1
data. See the help text for
information.
Prediction
Next, use filament1_predict()
to compute probabilistic
predictions of Actual_Weight
using the two estimated
models.
Inspect the predictions by drawing figures, e.g. with
geom_ribbon(aes(CAD_Weight, ymin = lwr, ymax = upr), alpha = 0.25)
(the alpha
here is for transparency), together with the
observed data. It can be useful to join the predictions and data into a
new data.frame object to get access to both the prediction information
and data when plotting.
Prediction Scores
Now, use the score calculator function proper_score()
from the StatCompLab package to compute the squared error,
Dawid-Sebastiani, and Interval scores (with target coverage probability
90%). It’s useful to joint the prediction information and data set with
cbind
, so that e.g. mutate()
can have access
to all the needed information.
As a basic summary of the results, compute the average score
for each model and type of score, and present the result with
knitr::kable()
.
Do the scores indicate that one of the models is better or worse than the other? Do the three score types agree with each other?
Data splitting
Now, use the sample()
function to generate a random
selection of the rows of the data set, and split it into two parts, with
ca 50% to be used for parameter estimation, and 50% to be used for
prediction assessment.
Redo the previous analysis for the problem using this division of the data.
Do the score results agree with the previous results?