From: Chris Theis <Christian.Theis_at_cern.ch>

Date: Fri, 12 Mar 2010 20:29:36 +0100

Date: Fri, 12 Mar 2010 20:29:36 +0100

Hi Roger,

I take the liberty of sending a copy of the mail/answer to the

discussion list in case your question is also interesting for other

users.

*> My new simulations take more time per primary so I have to send jobs
*

*> with less primaries to the cluster in order to not upset the other
*

*> users. This leads to some disk space problems, as every cycle uses the
*

*> same amount of disk space more or less independent of the number of
*

*> primaries. So I am asking myself the question to run for instance 1 run
*

*> with 1 cycle with 500'000 primaries or 1 run with 5 cycles with 100'000
*

*> primaries each? I think the second version would be the better one, but
*

*> I don't understand the difference.
*

*> Could you tell me a little bit more about cycles and runs or give me a
*

*> link to resources? What version would you prefer?
*

the terminology and the differences are indeed a bit tricky. Let's

imagine the following scenario:

- You launch one simulation of 1 run with 1 cycle of 500.000 particles.

FLUKA will automatically take the first default

random seed (unless you specify a seed manually!) and will start the

transport. So after 100.000 particles the

random seed will have a certain value which we call N100K, after

200.000 it will be N200K etc. So at the end of the

one cycle it will have the value N500K and you will have one final

result for your scoring, which could be energy

deposition in one region for example.

- Now we launch one simulation of 1 run which has 5 cycles of 100.000

particles per cycle. Unless specified otherwise

FLUKA will take the default seed, which is identical to the scenario

above. After the first cycle of 100.000 is completed

FLUKA writes out the current seed which will and should be identical

to N100K (there might be divergences but this is

a technical problem which we ignore and should normally not happen).

For the start of the second cycle FLUKA will

read in the last seed (N100K) and start from this. It will do this

for each cycle and at the end the last random seed will

be equivalent to N500K. This means that the sequence of random

numbers in these two scenarios has been identical and thus,

also the final averaged result over the 5 cycles

will be identical. The only difference to the first

scenario is that now you can do some statistics calculations because

for the final result you need to average over

the 5 cycles and this means you also obtain the standard deviation,

which of course is not possible if you only have 1 result.

As FLUKA does not provide as any statistical information

on the evolution of the results during the

simulation via second or third order statistical moments, you can only

obtain an idea about the significance of the

result if you run several cycles and calculate at least the deviation

from the mean. The final average result will

of course not change for both cases but the second case is preferable

as you obtain more valuable information.

- Let's imagine a third scenario: we launch 5 runs of 1 cycle of 100.000

particles on 5 different machines. So in the

end we will also have 500.000 particles in total and should get the

same result as in the first two cases. Actually

this is a common misconception because by default FLUKA will use the

default seeds for each run which means that

we will have 5 runs with identical sequences for the random numbers.

This means that the results of all 5 runs will

be identical and thus, you have the perfect simulation which will have

a standard error of 0% :-)

Therefore, you have to be careful if you launch several runs in parallel

on different machines to manually specify a different random seed in

each input file for each machine.

In general with MC simulations one should also be careful how to choose

seeds to avoid nasty problems like masked cross-correlation,

non-decaying auto-correlation etc. But in case of FLUKA's random number

generator the inventors Marsaglia & Tsang already paid attention to this

problem and thus, you should not worry about it.

Cheers

Chris

________________________________

From: Roger H=E4lg [mailto:rhaelg_at_phys.ethz.ch]

Sent: Fri 12/03/2010 19:14

To: Chris Theis

Subject: Re: AW: Combining data from runs on different =

machines/architectures

My new simulations take more time per primary so I have to send jobs

with less primaries to the cluster in order to not upset the other

users. This leads to some disk space problems, as every cycle uses the

same amount of disk space more or less independent of the number of

primaries. So I am asking myself the question to run for instance 1 run

with 1 cycle with 500'000 primaries or 1 run with 5 cycles with 100'000

primaries each? I think the second version would be the better one, but

I don't understand the difference.

Could you tell me a little bit more about cycles and runs or give me a

link to resources? What version would you prefer?

Received on Sun Mar 14 2010 - 21:57:24 CET

*
This archive was generated by hypermail 2.2.0
: Sun Mar 14 2010 - 21:57:29 CET
*