Tutorial 06: Prediction assessment with proper scores

Introduction

In this lab session you will explore

Probabilistic prediction and proper scores

Open your github repository clone project from Tutorial 2 or 4 (either on https://rstudio.cloud on your own computer, and upgrade the StatCompLab package.
During this lab, you can either work in a .R file or a new .Rmd file.

Solution:

The accompanying Tutorial06Solutions tutorial/vignette documents contain the solutions explicitly, to make it easier to review the material after the workshops.

3D printer

The aim is to build and assess statistical models of material use in a 3D printer.¹ The printer uses rolls of filament that gets heated and squeezed through a moving nozzle, gradually building objects. The objects are first designed in a CAD program (Computer Aided Design), that also estimates how much material will be required to print the object.

The data can be loaded with data("filament1", package = "StatCompLab"), and contains information about one 3D-printed object per row. The columns are

Index: an observation index
Date: printing dates
Material: the printing material, identified by its colour
CAD_Weight: the object weight (in gram) that the CAD software calculated
Actual_Weight: the actual weight of the object (in gram) after printing

If the CAD system and printer were both perfect, the CAD_Weight and Actual_Weight values would be equal for each object. In reality, there is both random variation, for example due to varying humidity and temperature, and systematic deviations due to the CAD system not having perfect information about the properties of the printing materials.

When looking at the data (see below) it’s clear that the variability of the data is larger for larger values of CAD_Weight. The printer operator wants to know which of two models, named A and B, are better at capturing the distributions of the random deviations.

Both models use a linear model for coneection between CAD_Weight and Actual_Weight. We denote the CAD weight for observations $i$ by x_i, and the corresponding actual weight by $y_i$ . The two models are defined by

Model A: $y_i \sim \mathsf{Normal}[\theta_1 + \theta_2 x_i, \exp(\theta_3 + \theta_4 x_i)]$
Model B: $y_i \sim \mathsf{Normal}[\theta_1 + \theta_2 x_i, \exp(\theta_3) + \exp(\theta_4) x_i^2)]$

The printer operator reasons that due to random fluctuations in the material properties (such as the density) and room temperature should lead to a relative error instead of an additive error, and this leads them to model B as an approximation of that. The basic physics assumption is that the error in the CAD software calculation of the weight is proportional to the weight itself. Model A on the other hand is slightly more mathematically convenient, but has no such motivation in physics.

Start by loading the data and plot it.

Estimate and predict

Next week, we will assess model predictions using cross validation, where data is split into separate parts for parameter estimation and prediction assessment, in order to avoid or reduce bias in the prediction assessments. For simplicity this week, we will start by using the entire data set for both parameter estimation and prediction assessment.

Estimate

First, use filament1_estimate() from the StatCompLab package to estimate the two models A and B using the filament1 data. See the help text for information.

Prediction

Next, use filament1_predict() to compute probabilistic predictions of Actual_Weight using the two estimated models.

Inspect the predictions by drawing figures, e.g. with geom_ribbon(aes(CAD_Weight, ymin = lwr, ymax = upr), alpha = 0.25) (the alpha here is for transparency), together with the observed data. It can be useful to join the predictions and data into a new data.frame object to get access to both the prediction information and data when plotting.

Prediction Scores

Now, use the score calculator function proper_score() from the StatCompLab package to compute the squared error, Dawid-Sebastiani, and Interval scores (with target coverage probability 90%). It’s useful to joint the prediction information and data set with cbind, so that e.g. mutate() can have access to all the needed information.

As a basic summary of the results, compute the average score $\overline{S}(\{F_i,y_i\})$ for each model and type of score, and present the result with knitr::kable().

Do the scores indicate that one of the models is better or worse than the other? Do the three score types agree with each other?

Data splitting

Now, use the sample() function to generate a random selection of the rows of the data set, and split it into two parts, with ca 50% to be used for parameter estimation, and 50% to be used for prediction assessment.

Redo the previous analysis for the problem using this division of the data.

Do the score results agree with the previous results?