RE: Statistical errors in residual dose rates and activities and number of primaries/batches from Chris Theis on 2012-08-15 (fluka discuss archive)

From: Chris Theis <Christian.Theis_at_cern.ch>
Date: Wed, 15 Aug 2012 17:36:43 +0000

Hi Alberto,

> I do not know about variance-of-variance analysis (another feature of MCNP that
> FLUKA does not have). But I think that the "magic" 5 (a piece of traditional
> wisdom passed on to us by our Monte Carlo ancestors) is mainly due to the need
> to have enough batches to make the Central Limit Theorem hold. The distribution
> tends to be Gaussian for a number of batches tending to infinity: experience shows
> that 5 is closer to infinity than 4 :-)

It is surely correct that 5 is closer to infinity than 4 ;-) but the way
that the results of one batch are calculated already implicitly relies on the fact that the Central Limit Theorem holds.
Thus, the suggestion that the number of batches > 5 is only remotely
connected to the CLT. Doing Monte Carlo we are always looking at a sample sub-set
and not the full population. Therefore, strictly speaking we obtain estimators
instead of "true" values for mean, variance, etc.
As a consequence the "standard deviation of the sample" (in some text books
this is also referred to as the "population standard deviation" as they are
equivalent for discrete random variables) is not necessarily equal to the
"sample standard deviation".

If one calculates the standard deviation taking only the sample as a data
source it will usually underestimate the corresponding value of the standard
deviation of the full population. The rigorous mathematical proof is non
trivial but it can be intuitively understood with the following popular
example. Let's look at the height of people, but we only take a sub-group for
whom the standard deviation (of the sample) is calculated. In reality we
always have a few very very large and also very very small people in our
population. But the chance to find them in our sub-group is relatively small.
Therefore, the standard deviation of our sample is probably going to be too
small in comparison to the one we would have obtained if we had had access
to the data of the full population.

As a consequence, the standard deviation of the sample has to be considered
a biased estimator with respect to the "true" standard deviation of the full
population. In order to remove this bias a Bessel correction is applied
which accounts for the reduced degrees of freedom that is caused by looking
only at a sample instead of the full population. In very simple terms this
yields the (1/N-1) factor that is used instead of 1/N in the formula to
calculate the sample variance, which is an unbiased estimator for the variance
of the whole population. It's mathematically not completely correct but for
reasons of simplicity we can also regard it as the (more or less) unbiased
estimator for the standard deviation of the population.

For larger values of N the impact of the Bessel correction starts to vanish
and the "population standard deviation" and the "sample standard deviation"
start to become equivalent. Analysis of biased and unbiased estimators shows
that the impact of the magnitude of N starts to taper off with N > 5.
I'm afraid that a rigorous mathematical treatment is beyond the scope of a
mailing list and requires going into lots of details of estimation theory.

But in the end using at least 5 batches should ensure that the sample standard
deviation becomes a good estimator of the "true" population standard deviation.
However, this suggestion still silently assumes that the number of primaries
per batch represent a somewhat adequate sample vs. population ratio,
as mentioned by Mary. Yet, determining what adequate means is far from
trivial and the best and easiest in this case is probably to follow the
advice that you give in the FLUKA course to use one's eye as a judge.

Cheers
Chris
Received on Thu Aug 16 2012 - 20:18:47 CEST

This archive was generated by hypermail 2.2.0 : Thu Aug 16 2012 - 20:19:00 CEST