From: Chris Theis <Christian.Theis_at_cern.ch>

Date: Wed, 15 Aug 2012 17:36:43 +0000

Date: Wed, 15 Aug 2012 17:36:43 +0000

Hi Alberto,

*> I do not know about variance-of-variance analysis (another feature of MCNP that
*

*> FLUKA does not have). But I think that the "magic" 5 (a piece of traditional
*

*> wisdom passed on to us by our Monte Carlo ancestors) is mainly due to the need
*

*> to have enough batches to make the Central Limit Theorem hold. The distribution
*

*> tends to be Gaussian for a number of batches tending to infinity: experience shows
*

*> that 5 is closer to infinity than 4 :-)
*

It is surely correct that 5 is closer to infinity than 4 ;-) but the way

that the results of one batch are calculated already implicitly relies on the fact that the Central Limit Theorem holds.

Thus, the suggestion that the number of batches > 5 is only remotely

connected to the CLT. Doing Monte Carlo we are always looking at a sample sub-set

and not the full population. Therefore, strictly speaking we obtain estimators

instead of "true" values for mean, variance, etc.

As a consequence the "standard deviation of the sample" (in some text books

this is also referred to as the "population standard deviation" as they are

equivalent for discrete random variables) is not necessarily equal to the

"sample standard deviation".

If one calculates the standard deviation taking only the sample as a data

source it will usually underestimate the corresponding value of the standard

deviation of the full population. The rigorous mathematical proof is non

trivial but it can be intuitively understood with the following popular

example. Let's look at the height of people, but we only take a sub-group for

whom the standard deviation (of the sample) is calculated. In reality we

always have a few very very large and also very very small people in our

population. But the chance to find them in our sub-group is relatively small.

Therefore, the standard deviation of our sample is probably going to be too

small in comparison to the one we would have obtained if we had had access

to the data of the full population.

As a consequence, the standard deviation of the sample has to be considered

a biased estimator with respect to the "true" standard deviation of the full

population. In order to remove this bias a Bessel correction is applied

which accounts for the reduced degrees of freedom that is caused by looking

only at a sample instead of the full population. In very simple terms this

yields the (1/N-1) factor that is used instead of 1/N in the formula to

calculate the sample variance, which is an unbiased estimator for the variance

of the whole population. It's mathematically not completely correct but for

reasons of simplicity we can also regard it as the (more or less) unbiased

estimator for the standard deviation of the population.

For larger values of N the impact of the Bessel correction starts to vanish

and the "population standard deviation" and the "sample standard deviation"

start to become equivalent. Analysis of biased and unbiased estimators shows

that the impact of the magnitude of N starts to taper off with N > 5.

I'm afraid that a rigorous mathematical treatment is beyond the scope of a

mailing list and requires going into lots of details of estimation theory.

But in the end using at least 5 batches should ensure that the sample standard

deviation becomes a good estimator of the "true" population standard deviation.

However, this suggestion still silently assumes that the number of primaries

per batch represent a somewhat adequate sample vs. population ratio,

as mentioned by Mary. Yet, determining what adequate means is far from

trivial and the best and easiest in this case is probably to follow the

advice that you give in the FLUKA course to use one's eye as a judge.

Cheers

Chris

Received on Thu Aug 16 2012 - 20:18:47 CEST

*
This archive was generated by hypermail 2.2.0
: Thu Aug 16 2012 - 20:19:00 CEST
*