# Re: Statistical errors in residual dose rates and activities and number of primaries/batches

From: Alberto Fasso' <fasso_at_SLAC.Stanford.EDU>
Date: Tue, 14 Aug 2012 11:18:04 -0700 (PDT)

Hi Mina,

Mary has given a very good explanation of the difference between sampling
from a population and sampling from a number of population subsets (the
batches). It is possible in principle to calculate a variance with a single
batch, using the variation of each contribution from individual histories:
but it has been decided not to do it in FLUKA for two reasons:
1) the distribution of contributions from individual histories could be highly
skewed (a few large contibutions and many small contributions and even more
zero contributions), therefore it is not Gaussian and could be difficult to
handle mathematically (but MCNP does it, for instance). Using batches, the
Central Limit Theorem tells us that the distribution tends to be Gaussian for
a large number of batches, and the formula sent by Francesco can be
safely used.
2) to implement single batch statistics, it would be necessary to store not only
the sum of individual scores, but also the sum of their squares, requiring
large storage resources.

Mary, who knows well MCNP, has said:

> > The difference depends on how well the batch represents the population. I
> > do not know of 5 as the magic number; I think 5 is not adequate for
> > variance-of-variance analysis; maybe Alberto can show us the derivation.

I do not know about variance-of-variance analysis (another feature of MCNP that
FLUKA does not have). But I think that the "magic" 5 (a piece of traditional
wisdom passed on to us by our Monte Carlo ancestors) is mainly due to the need
to have enough batches to make the Central Limit Theorem hold. The distribution
tends to be Gaussian for a number of batches tending to infinity: experience shows
that 5 is closer to infinity than 4 :-)

Coming to the test you have done (comparing averages and errors between
10 runs with 2 M particles and 10 runs with 5 M particles).
You have written:

> So obviously an improvement [from 2M to 5M per run], which made me conclude
> that number of primaries per run does matter. Is this because the total number of
> primaries is increased in the 2nd case (5 M prim/run *10 runs = 50 M
> primaries)?

As I told you in my previous email, it is the TOTAL number of primaries which
matters, as you can see. FOR A SAME TOTAL NUMBER, the number of primaries per
run does NOT matter (with the two limitations I described in my previous email).
Here the the total number is not the same: it is larger by a factor 5/2, so
you can expect an error 2.3 times smaller (2.3=sqrt(5/2)). You find a decrease
a little larger (4 or 5): this is due to the fact that 28% is NOT AN ACCEPTABLE
ERROR in Monte Carlo. Since we are doing here some publicity to our competitor
MCNP, I remind you about the little table which was in the MCNP manual some
years ago (now modified in new versions of the manual to be a bit less harsh :-)

Error Quality of the estimate
50 to 100% Garbage
20 to 50% Factor of a few (notice that your 28% series gives an average
differing by a factor 1.6 from that of 7.9%)
10 to 20% Questionable
< 10% Generally reliable

> And that if I had run 5 M primaries/run but only 4 runs, I would have seen a
> similar error as in case 1?

As Francesco has pointed out, 4 runs are just borderline. Let's say rather that
in principle you would get the same error if you had run 5 runs of 4M each.
But remember that 28% is too large, so you better compare 10 runs of 5M
(with a 7.9% error) with 5 runs of 10M. I am sure that not only the two averages
will be much closer, but also the errors.

Now I would like to comment to something Mary said, probably due to a
statement of mine expressed too unclearly.
I said:
> Detector scoring should be >> 0 in each batch
Mary said:
> >*The same detector score may be legitimately quoted in GeV or meV,
> > differing by 12 orders of magnitude. Would 1e-12 GeV fail as score>>0?
> > Would 1 meV qualify as score>>0? Unlike c>>v, x>>0 is difficult to gauge,
> > I get division by zero, I'm trying to make the point that stats depend on
> > the number of samples, not the magnitude of score. I can have an energy
> > deposition of 1e-20 (arbitrary unit this time), if I observe similar
> > events 1e20 times, I am certainly more far confident in my score than if I
> > have an energy deposition of 20 (same unit) but I only observe one single
> > event.

When I said "Detector scoring should be >> 0 in each batch" I actually meant
"In no batch there should be zero or few histories contributing to the scoring".
That has nothing to do with the units used. "Zero or few histories" are an
integer number which does not change with units. Mary is right that "stats
depend on the number of samples, not the magnitude of score", and I was not
referring to the magnitude of score, but to the number of primaries leading to a
score.

Alberto

On Mon, 13 Aug 2012, Mina Nozar wrote:

> Dear Alberto,
>
> Hello and thank you for your response.
>
> I am still trying to understand so please bear with me.
>
> As I understand, the uncertainty we see in any given quantity in FLUKA is just the error in the mean of the quantity,
> where the mean is the average of the quantity as calculated from the result of the batches. So, the higher the
> fluctuations in a given quantity from run to run, the higher the variance and the error in the mean.
>
> I first came across this issue when doing Stefan's two-step process. From a study I did, I found out that the error in
> the dose rate, one week after EOB was on the order of 28% for 2 M primaries per run, 10 runs (20 M primaries total). I
> looked at the results of the individual runs and saw large fluctuations from run to run. This is what I saw for 10 runs
> (Dose rate in PSv/s):
>
> (766.4, 986.4, 1131.9, 1785.8, 4077.8, 224.2, 1143.6, 35.4, 1511.3, 939.9)
> So the result from combining all ten runs was 1260.3 +/- 28%.
>
> When I repeated the run for 5 M primaries/run, I saw smaller fluctuations:
> (907.8, 986.4, 749.2, 937.5, 587.2, 599.1, 726.8, 421.9, 985.3, 754.3)
> Average: 751.5 +/- 7.9%
>
> So obviously an improvement, which made me conclude that number of primaries
> per run does matter. Is this because the total number of primaries is
> increased in the 2nd case (5 M prim/run *10 runs = 50 M primaries)? And that
> if I had run 5 M primaries/run but only 4 runs, I would have seen a similar
> error as in case 1?
>
>