From: Alberto Fasso' <fasso_at_SLAC.Stanford.EDU>

Date: Tue, 14 Aug 2012 11:18:04 -0700 (PDT)

Date: Tue, 14 Aug 2012 11:18:04 -0700 (PDT)

Hi Mina,

Mary has given a very good explanation of the difference between sampling

from a population and sampling from a number of population subsets (the

batches). It is possible in principle to calculate a variance with a single

batch, using the variation of each contribution from individual histories:

but it has been decided not to do it in FLUKA for two reasons:

1) the distribution of contributions from individual histories could be highly

skewed (a few large contibutions and many small contributions and even more

zero contributions), therefore it is not Gaussian and could be difficult to

handle mathematically (but MCNP does it, for instance). Using batches, the

Central Limit Theorem tells us that the distribution tends to be Gaussian for

a large number of batches, and the formula sent by Francesco can be

safely used.

2) to implement single batch statistics, it would be necessary to store not only

the sum of individual scores, but also the sum of their squares, requiring

large storage resources.

Mary, who knows well MCNP, has said:

*> > The difference depends on how well the batch represents the population. I
*

*> > do not know of 5 as the magic number; I think 5 is not adequate for
*

*> > variance-of-variance analysis; maybe Alberto can show us the derivation.
*

I do not know about variance-of-variance analysis (another feature of MCNP that

FLUKA does not have). But I think that the "magic" 5 (a piece of traditional

wisdom passed on to us by our Monte Carlo ancestors) is mainly due to the need

to have enough batches to make the Central Limit Theorem hold. The distribution

tends to be Gaussian for a number of batches tending to infinity: experience shows

that 5 is closer to infinity than 4 :-)

Coming to the test you have done (comparing averages and errors between

10 runs with 2 M particles and 10 runs with 5 M particles).

You have written:

*> So obviously an improvement [from 2M to 5M per run], which made me conclude
*

*> that number of primaries per run does matter. Is this because the total number of
*

*> primaries is increased in the 2nd case (5 M prim/run *10 runs = 50 M
*

*> primaries)?
*

As I told you in my previous email, it is the TOTAL number of primaries which

matters, as you can see. FOR A SAME TOTAL NUMBER, the number of primaries per

run does NOT matter (with the two limitations I described in my previous email).

Here the the total number is not the same: it is larger by a factor 5/2, so

you can expect an error 2.3 times smaller (2.3=sqrt(5/2)). You find a decrease

a little larger (4 or 5): this is due to the fact that 28% is NOT AN ACCEPTABLE

ERROR in Monte Carlo. Since we are doing here some publicity to our competitor

MCNP, I remind you about the little table which was in the MCNP manual some

years ago (now modified in new versions of the manual to be a bit less harsh :-)

Error Quality of the estimate

50 to 100% Garbage

20 to 50% Factor of a few (notice that your 28% series gives an average

differing by a factor 1.6 from that of 7.9%)

10 to 20% Questionable

< 10% Generally reliable

*> And that if I had run 5 M primaries/run but only 4 runs, I would have seen a
*

*> similar error as in case 1?
*

As Francesco has pointed out, 4 runs are just borderline. Let's say rather that

in principle you would get the same error if you had run 5 runs of 4M each.

But remember that 28% is too large, so you better compare 10 runs of 5M

(with a 7.9% error) with 5 runs of 10M. I am sure that not only the two averages

will be much closer, but also the errors.

Now I would like to comment to something Mary said, probably due to a

statement of mine expressed too unclearly.

I said:

*> Detector scoring should be >> 0 in each batch
*

Mary said:

*> >*The same detector score may be legitimately quoted in GeV or meV,
*

*> > differing by 12 orders of magnitude. Would 1e-12 GeV fail as score>>0?
*

*> > Would 1 meV qualify as score>>0? Unlike c>>v, x>>0 is difficult to gauge,
*

*> > I get division by zero, I'm trying to make the point that stats depend on
*

*> > the number of samples, not the magnitude of score. I can have an energy
*

*> > deposition of 1e-20 (arbitrary unit this time), if I observe similar
*

*> > events 1e20 times, I am certainly more far confident in my score than if I
*

*> > have an energy deposition of 20 (same unit) but I only observe one single
*

*> > event.
*

When I said "Detector scoring should be >> 0 in each batch" I actually meant

"In no batch there should be zero or few histories contributing to the scoring".

That has nothing to do with the units used. "Zero or few histories" are an

integer number which does not change with units. Mary is right that "stats

depend on the number of samples, not the magnitude of score", and I was not

referring to the magnitude of score, but to the number of primaries leading to a

score.

Alberto

On Mon, 13 Aug 2012, Mina Nozar wrote:

*> Dear Alberto,
*

*>
*

*> Hello and thank you for your response.
*

*>
*

*> I am still trying to understand so please bear with me.
*

*>
*

*> As I understand, the uncertainty we see in any given quantity in FLUKA is just the error in the mean of the quantity,
*

*> where the mean is the average of the quantity as calculated from the result of the batches. So, the higher the
*

*> fluctuations in a given quantity from run to run, the higher the variance and the error in the mean.
*

*>
*

*> I first came across this issue when doing Stefan's two-step process. From a study I did, I found out that the error in
*

*> the dose rate, one week after EOB was on the order of 28% for 2 M primaries per run, 10 runs (20 M primaries total). I
*

*> looked at the results of the individual runs and saw large fluctuations from run to run. This is what I saw for 10 runs
*

*> (Dose rate in PSv/s):
*

*>
*

*> (766.4, 986.4, 1131.9, 1785.8, 4077.8, 224.2, 1143.6, 35.4, 1511.3, 939.9)
*

*> So the result from combining all ten runs was 1260.3 +/- 28%.
*

*>
*

*> When I repeated the run for 5 M primaries/run, I saw smaller fluctuations:
*

*> (907.8, 986.4, 749.2, 937.5, 587.2, 599.1, 726.8, 421.9, 985.3, 754.3)
*

*> Average: 751.5 +/- 7.9%
*

*>
*

*> So obviously an improvement, which made me conclude that number of primaries
*

*> per run does matter. Is this because the total number of primaries is
*

*> increased in the 2nd case (5 M prim/run *10 runs = 50 M primaries)? And that
*

*> if I had run 5 M primaries/run but only 4 runs, I would have seen a similar
*

*> error as in case 1?
*

*>
*

*>
*

*> Thank you in advance,
*

*> Mina
*

Received on Tue Aug 14 2012 - 22:31:18 CEST

*
This archive was generated by hypermail 2.2.0
: Tue Aug 14 2012 - 22:31:30 CEST
*