The Bootstrap

powered by NetLogo

view/download model file: BootStrapIllustrator.nlogo

WHAT IS IT?

Bootstrapping in statistics is a simple idea that nonetheless can be difficult for some to understand. This applet provides a visual illustration of the bootstrap process. In particular, a lake has many sizes of fish, including possibly a few “whoppers” scattered about. The length of the fish in the lake is a single dimensional dataset, but estimating the average length of a fish in the lake is non-trivial. The “whoppers”, if they exist, are outliers. The distribution of the fish is unknown. Consequently, there is good reason to create a bootstrap estimate of the mean length of a fish in the lake, along with its associated variance and confidence intervals.

HOW IT WORKS

As the fish pass under the boat, they are caught, their length is measured, and then they are released. After a five tick delay to allow the fish population to redistribute, the process is repeated, possibly catching the same fish again (sampling with replacement). Once a sufficiently large sample is capture, the statistic (mean or median) is applied and the estimate is added to a list of BootStrapEstimates. The BootstrapEstimates is an empirical distribution for the statistic which also allows the estimate of a confidence interval for the length.

However, the length of a fish is a metaphor for any univariate data. To illustrate this, switching on Values shows the numerical value in place of the fish. Moreover, univariate data can be loaded as space separated values and can also be produced via ‘Show Data’ in the Data box (negative numbers produce ‘red’ fish).

HOW TO USE IT

‘Setup’ generates an exponentially distributed random sample with a mean of ‘TargetMean’ of size ‘NumberOfFish’. ‘SampleSize’ indicates the size of the samples used to estimate the statistic. ‘Load Data’ loads the space separated values in the ‘Data’ box and sets the ‘Values’ switch to on. ‘Target_Statistic’ chooses the desired statistic. ‘Go’ begins the simulation, in which samples are loaded into the ‘Current Sample’ box until they reach size ‘SampleSize’, at which point an estimate of the statistic is produced. The estimate is added to the list in the ‘Bootstrap Estimates’ box. The estimate, a histogram of ‘Bootstrap Estimates’, and the 95% confidence interval for the empirical distribution are subsequently produced.

THINGS TO NOTICE

Sampling with replacement corresponds to a previously captured fish once again swimming beneath the boat and being captured. Varying the sample size improves the confidence interval.