RE: AW: Combining data from runs on different machines/architectures

From: Roger Hälg <rhaelg_at_phys.ethz.ch>
Date: Mon, 22 Mar 2010 16:23:17 +0100

Hi Chris

Thank you for your explication. I always start my parallel runs with
different initial random seeds.
My question is now if I can get the statistical advantage of different
cycles in one run also by using several runs in parallel with different
initial random seeds but only one cycle each? So lets say 100 runs in
parallel with different initial random seed, 1 cycle and 10.000
primaries. Or is it still necessary to have the runs with lets say 5
cycles and 2.000 primaries each?
With only 1 cycle I could save quite an amount of disk space,
independent of the number of primaries per run.

Greetz

Roger

On Fri, 2010-03-12 at 20:29 +0100, Chris Theis wrote:
> Hi Roger,
> I take the liberty of sending a copy of the mail/answer to the
> discussion list in case your question is also interesting for other
> users.
>
> > My new simulations take more time per primary so I have to send jobs
> > with less primaries to the cluster in order to not upset the other
> > users. This leads to some disk space problems, as every cycle uses the
> > same amount of disk space more or less independent of the number of
> > primaries. So I am asking myself the question to run for instance 1 run
> > with 1 cycle with 500'000 primaries or 1 run with 5 cycles with 100'000
> > primaries each? I think the second version would be the better one, but
> > I don't understand the difference.
>
> > Could you tell me a little bit more about cycles and runs or give me a
> > link to resources? What version would you prefer?
>
> the terminology and the differences are indeed a bit tricky. Let's
> imagine the following scenario:
>
> - You launch one simulation of 1 run with 1 cycle of 500.000 particles.
> FLUKA will automatically take the first default
> random seed (unless you specify a seed manually!) and will start the
> transport. So after 100.000 particles the
> random seed will have a certain value which we call N100K, after
> 200.000 it will be N200K etc. So at the end of the
> one cycle it will have the value N500K and you will have one final
> result for your scoring, which could be energy
> deposition in one region for example.
>
> - Now we launch one simulation of 1 run which has 5 cycles of 100.000
> particles per cycle. Unless specified otherwise
> FLUKA will take the default seed, which is identical to the scenario
> above. After the first cycle of 100.000 is completed
> FLUKA writes out the current seed which will and should be identical
> to N100K (there might be divergences but this is
> a technical problem which we ignore and should normally not happen).
> For the start of the second cycle FLUKA will
> read in the last seed (N100K) and start from this. It will do this
> for each cycle and at the end the last random seed will
> be equivalent to N500K. This means that the sequence of random
> numbers in these two scenarios has been identical and thus,
> also the final averaged result over the 5 cycles
> will be identical. The only difference to the first
> scenario is that now you can do some statistics calculations because
> for the final result you need to average over
> the 5 cycles and this means you also obtain the standard deviation,
> which of course is not possible if you only have 1 result.
> As FLUKA does not provide as any statistical information
> on the evolution of the results during the
> simulation via second or third order statistical moments, you can only
> obtain an idea about the significance of the
> result if you run several cycles and calculate at least the deviation
> from the mean. The final average result will
> of course not change for both cases but the second case is preferable
> as you obtain more valuable information.
>
> - Let's imagine a third scenario: we launch 5 runs of 1 cycle of 100.000
> particles on 5 different machines. So in the
> end we will also have 500.000 particles in total and should get the
> same result as in the first two cases. Actually
> this is a common misconception because by default FLUKA will use the
> default seeds for each run which means that
> we will have 5 runs with identical sequences for the random numbers.
> This means that the results of all 5 runs will
> be identical and thus, you have the perfect simulation which will have
> a standard error of 0% :-)
>
> Therefore, you have to be careful if you launch several runs in parallel
> on different machines to manually specify a different random seed in
> each input file for each machine.
>
> In general with MC simulations one should also be careful how to choose
> seeds to avoid nasty problems like masked cross-correlation,
> non-decaying auto-correlation etc. But in case of FLUKA's random number
> generator the inventors Marsaglia & Tsang already paid attention to this
> problem and thus, you should not worry about it.
>
> Cheers
> Chris
>
> ________________________________
>
> From: Roger H=E4lg [mailto:rhaelg_at_phys.ethz.ch]
> Sent: Fri 12/03/2010 19:14
> To: Chris Theis
> Subject: Re: AW: Combining data from runs on different =
> machines/architectures
>
>
> My new simulations take more time per primary so I have to send jobs
> with less primaries to the cluster in order to not upset the other
> users. This leads to some disk space problems, as every cycle uses the
> same amount of disk space more or less independent of the number of
> primaries. So I am asking myself the question to run for instance 1 run
> with 1 cycle with 500'000 primaries or 1 run with 5 cycles with 100'000
> primaries each? I think the second version would be the better one, but
> I don't understand the difference.
>
> Could you tell me a little bit more about cycles and runs or give me a
> link to resources? What version would you prefer?
>
>
>
>
Received on Mon Mar 22 2010 - 17:04:12 CET

This archive was generated by hypermail 2.2.0 : Mon Mar 22 2010 - 17:04:12 CET