Re: Statistical errors in residual dose rates and activities and number of primaries/batches from Mary Chin on 2012-08-14 (fluka discuss archive)

From: Mary Chin <mary.chin_at_cern.ch>
Date: Tue, 14 Aug 2012 13:23:55 +0200

Dear Mina,

If we compare three cases:
#1 START WHAT(1)= 1, rfluka -M1000, score = a_1 +/- b_1
#2 START WHAT(1)=1000, rfluka -M1, score = a_2 +/- b_2
#3 START WHAT(1)= 100, rfluka -M10, score = a_3 +/- b_3
#4 START WHAT(1)= 20, rfluka -M50, score = a_4 +/- b_4

Typically we do something like #3 or #4, for the following reasons:
1. it gives us a way to estimate the fluctuation (to be defined later,
below);
2. we secure data already gracefully written to the disk in the event of
crashes (whether due to power cut or programmatical problems or whatever);
3. maybe we wish do variance-of-variance analysis, which is useful for
diagnosing undersampling (eg poor penetration in shielding problems). In
this case we keep the sample size the same, as opposed to previous
suggestion:
4. we do not do event-by-event scoring, where individual values prior to
averaging cannot be recovered after the run.

We don't do #1 for production runs, fundamentally because this is not the
way Monte Carlo codes are designed to work. We also do not want to end up
spending CPU resources on initialisation and finishing, rather than on the
real work of tracking particles.

If we do #2, then we have no way of estimating b_2.

Assuming identical random number sequence, we get
a_1 = a_2 = a_3 = a_4
i.e. the score will be identical for all three cases, but

b_1 != b_3 != b_4 for the following reasons:
b_1 is the standard deviation, we draw our samples from the population;
b_3 is the standard error, we draw our samples from batches (grouped
samples of the population);
b_4 is also a standard error.

b_3 versus b_4
==============
The difference depends on how well the batch represents the population. I
do not know of 5 as the magic number; I think 5 is not adequate for
variance-of-variance analysis; maybe Alberto can show us the derivation.
The trend depends on the sampling and the population; it is
problem-specific according to both the batch size and the number of
batches.
> The batches need to be at least 4 or 5 (better 5)

a versus b
==========
'b' depends not on the magnitude of 'a' (except for the specific case of
counting). It depends on the number of samples we have.

> Detector scoring should be >> 0 in each batch
*The same detector score may be legitimately quoted in GeV or meV,
differing by 12 orders of magnitude. Would 1e-12 GeV fail as score>>0?
Would 1 meV qualify as score>>0? Unlike c>>v, x>>0 is difficult to gauge,
I get division by zero, I'm trying to make the point that stats depend on
the number of samples, not the magnitude of score. I can have an energy
deposition of 1e-20 (arbitrary unit this time), if I observe similar
events 1e20 times, I am certainly more far confident in my score than if I
have an energy deposition of 20 (same unit) but I only observe one single
event.

:) mary

On Mon, 13 Aug 2012, Mina Nozar wrote:

> Dear Alberto,
>
> Hello and thank you for your response.
>
> I am still trying to understand so please bear with me.
>
> As I understand, the uncertainty we see in any given quantity in FLUKA is just the error in the mean of the quantity,
> where the mean is the average of the quantity as calculated from the result of the batches. So, the higher the
> fluctuations in a given quantity from run to run, the higher the variance and the error in the mean.
>
> I first came across this issue when doing Stefan's two-step process. From a study I did, I found out that the error in
> the dose rate, one week after EOB was on the order of 28% for 2 M primaries per run, 10 runs (20 M primaries total). I
> looked at the results of the individual runs and saw large fluctuations from run to run. This is what I saw for 10 runs
> (Dose rate in PSv/s):
>
> (766.4, 986.4, 1131.9, 1785.8, 4077.8, 224.2, 1143.6, 35.4, 1511.3, 939.9)
> So the result from combining all ten runs was 1260.3 +/- 28%.
>
> When I repeated the run for 5 M primaries/run, I saw smaller fluctuations:
> (907.8, 986.4, 749.2, 937.5, 587.2, 599.1, 726.8, 421.9, 985.3, 754.3)
> Average: 751.5 +/- 7.9%
>
> So obviously an improvement, which made me conclude that number of primaries per run does matter. Is this because the
> total number of primaries is increased in the 2nd case (5 M prim/run *10 runs = 50 M primaries)? And that if I had run
> 5 M primaries/run but only 4 runs, I would have seen a similar error as in case 1?
>
>
> Thank you in advance,
> Mina
>
> On 12-08-11 09:33 AM, Alberto Fasso' wrote:
>> Dear Mina,
>>
>> you results are exactly what should be expected.
>> The error of the average is inversely proportional to the square root of
>> the TOTAL number of primaries, independent of how you divide this total
>> number over the batches. The only limitations about the number of of
>> batches and the number of primaries per batch are two:
>> 1) The batches need to be at least 4 or 5 (better 5)
>> 2) Detector scoring should be >> 0 in each batch
>> With these two limitations, the batches don't even need to be all of the same
>> size.
>> For a same total number of primaries, increasing the number of primaries per
>> batch is not particularly useful, unless it is necessary to satisfy the
>> condition 2).
>>
>> Kind regards,
>>
>> Alberto
>>
>>
>> On Fri, 10 Aug 2012, Mina Nozar wrote:
>>
>>> Hello everyone,
>>>
>>> I was recently wondering about how to reduce the statistical fluctuations in residual dose rate calculations and thought
>>> that if I increase number or primaries per run for a total number of primaries, I would see a reduction in statistical
>>> errors but this is not what I see. As I understand it, the values from different batches give the variance of the mean
>>> for a given quantity.
>>>
>>> In the study I had two sets of runs, each totalling 100 M primaries.
>>> First set: 1 M primaries * 100 batches
>>> Second set: 10 M primaries * 10 batches
>>>
>>> Residual dose rate results:
>>>
>>> Time Tot. response (pSv/s)
>>> First set Second set
>>> --------------------------------------------------------------------------
>>> 1h_SOB 7.3209280E+07 +/- 0.2788494 % 7.3209280E+07 +/- 0.1696288 %
>>> 1d_SOB 3.9039216E+08 +/- 0.1965424 % 3.9039216E+08 +/- 0.1532526 %
>>> 3d_SOB 4.5325405E+08 +/- 0.1940250 % 4.5325405E+08 +/- 0.1533875 %
>>> 10d_SOB 4.6606554E+08 +/- 0.1911391 % 4.6606554E+08 +/- 0.1504155 %
>>> 0s_EOB 4.7146954E+08 +/- 0.1893668 % 4.7146954E+08 +/- 0.1492323 %
>>> 1h_EOB 4.0237386E+08 +/- 0.2029038 % 4.0237386E+08 +/- 0.1707704 %
>>> 1d_EOB 7.9427664E+07 +/- 0.3225139 % 7.9427664E+07 +/- 0.3200963 %
>>> 10d_EOB 6036197. +/- 0.3228723 % 6036197. +/- 0.3079922 %
>>> 40d_EOB 190254. +/- 0.2063927 % 2190254. +/- 0.1794188 %
>>> 1y_EOB 358294.0 +/- 0.3270685 % 358294.0 +/- 0.3348824 %
>>> 2y_EOB 214956.4 +/- 0.3732046 % 214956.4 +/- 0.3926361 %
>>> 3y_EOB 147807.1 +/- 0.3735141 % 147807.1 +/- 0.3969630 %
>>> 5y_EOB 71939.66 +/- 0.3682632 % 71939.66 +/- 0.4093317 %
>>>
>>> Why do I not see a reduction in the variance of the mean in the second set? When I look at the dominant activities in
>>> different regions (1y, 3y, and 5y after EOB), I don't see much difference in the errors between each set. So doesn't
>>> seem to be advantageous to run a larger set of primaries per batch.
>>>
>>>
>>> How are the errors in activities determined? Looking at production rates and time evolution of activities, seems as if
>>> the errors are determined from production rates and remain the same for most but not all isotopes (see Lu,172 below).
>>> This might have to do with the grow-in contribution from decay of other isotopes.
>>>
>>> Also, sometimes I see larger errors for smaller activities (on the order of 10^2 Bq) that for larger activities (on the
>>> order of 10^4 Bq). So I don't think the fluctuations in activities depend on the production rates alone.
>>>
>>> Thanks and best wishes,
>>> Mina
>>>
>>> Here, I am listing production rates and activities in a Tantalum target at different times after End of Beam a subset of
>>> isotopes:
>>>
>>> Production rates (nuclei/prim):
>>> Mn,56,25, 9.7495E-22 +/- 66.68 %
>>> Co,60,27, 4.4082E+04 +/- 99.00 %
>>> Se,75,34, 8.1350E+04 +/- 99.00 %
>>> Y, 88,39, 2.6066E+04 +/- 55.28 %
>>> Lu,173,71, 3.6126E+08 +/- 0.24 %
>>> Lu,172,71, 1.8536E+09 +/- 2.110 %
>>> Hf,175,72 1.6301E+10 +/- 0.09%
>>> Hf,172,72, 4.8303E+08 +/- 0.3%
>>>
>>>
>>> Activities (Bq):
>>>
>>> 0 sec after EOB:
>>> Mn,56,25, 9.99e+06 +/- 66.67 %
>>> Co,60,27, 4.47e+04 +/- 99.00 %
>>> Se,75,34, 6.73e+05 +/- 99.00 %
>>> Y, 88,39, 3.00e+06 +/- 55.28 %
>>> Lu,173,71, 3.90e+10 +/- 0.24 %
>>> Lu,172,71, 2.53e+10 +/- 1.46 %
>>> Hf,175,72, 7.21e+11 +/- 0.09 %
>>> Hf,172,72, 1.23e+10 +/- 0.30 %
>>>
>>> 10 days after EOB:
>>> Mn,56,25, 9.75e-22 +/- 66.68 %
>>> Co,60,27, 4.46e+04 +/- 99.00 %
>>> Se,75,34, 6.35e+05 +/- 99.00 %
>>> Y,88,39, 2.81e+06 +/- 55.28 %
>>> Lu,173,71, 4.11e+10 +/- 0.24 %
>>> Lu,172,71, 1.22e+10 +/- 1.07 %
>>> Hf,175,72, 6.73e+11 +/- 0.09 %
>>> Hf,172,72, 1.22e+10 +/- 0.30 %
>>>
>>> 40 days after EOB:
>>> Co,60,27, 4.41e+04 +/- 99.00 %
>>> Se,75,34, 5.34e+05 +/- 99.00 %
>>> Y,88,39, 2.31e+06 +/- 55.28 %
>>> Lu,173,71, 3.94e+10 +/- 0.24 %
>>> Lu,172,71, 5.15e+09 +/- 0.29 %
>>> Hf,175,72, 5.00e+11 +/- 0.09 %
>>> Hf,172,72, 1.19e+10 +/- 0.30 %
>>>
>>> 1 year after EOB:
>>> Co,60,27, 3.92e+04 +/- 99.00 %
>>> Se,75,34, 8.14e+04 +/- 99.00 %
>>> Y,88,39, 2.79e+05 +/- 55.28 %
>>> Lu,173,71, 2.51e+10 +/- 0.24 %
>>> Lu,172,71, 3.47e+09 +/- 0.30 %
>>> Hf,175,72, 2.00e+10 +/- 0.09 %
>>> Hf,172,72, 8.52e+09 +/- 0.30 %
>>>
>>> 3 years after EOB:
>>> Co,60,27, 3.01e+04 +/- 99.00 %
>>> Se,75,34, 1.19e+03 +/- 99.00 %
>>> Y,88,39, 2.43e+03 +/- 55.28 %
>>> Lu,173,71, 9.14e+09 +/- 0.24 %
>>> Lu,172,71, 1.65e+09 +/- 0.30 %
>>> Hf,175,72, 1.45e+07 +/- 0.09 %
>>> Hf,172,72, 4.06e+09 +/- 0.30 %
>>>
>>> 5 years after EOB:
>>> Co,60,27, 2.32e+04 +/- 99.00 %
>>> Se,75,34, 1.74e+01 +/- 99.00 %
>>> Y,88,39, 2.11e+01 +/- 55.28 %
>>> Lu,173,71, 3.32e+09 +/- 0.24 %
>>> Lu,172,71, 7.87e+08 +/- 0.30 %
>>> Hf,175,72, 1.05e+04 +/- 0.09 %
>>> Hf,172,72, 1.93e+09 +/- 0.30 %
>>>
>>>
>>>
>>
>
>
>
Received on Tue Aug 14 2012 - 22:31:38 CEST

This archive was generated by hypermail 2.2.0 : Tue Aug 14 2012 - 22:31:39 CEST