RE: AW: Combining data from runs on different machines/architectures

From: Chris Theis <Christian.Theis_at_cern.ch>
Date: Mon, 22 Mar 2010 16:22:25 +0100

Hi Roger,

instead of running for example 5 cycles of 2.000 particles on 3 machines
you can also run 1 cycle of 10.000 particles on 3 machines. The average
result will be the same as you've always run 30.000 particles and as
long as you have several result files you can always do your statistical
analysis to get the uncertainty.

Cheers
Chris

> -----Original Message-----
> From: Roger H=E4lg [mailto:rhaelg_at_phys.ethz.ch]
> Sent: 22 March 2010 16:14
> To: Chris Theis
> Cc: fluka-discuss_at_fluka.org
> Subject: RE: AW: Combining data from runs on different
> machines/architectures
>=20
> Hi Chris
>
> Thank you for your explication. I always start my parallel runs with
> different initial random seeds.
> My question is now if I can get the statistical advantage of different
> cycles in one run also by using several runs in parallel with different
> initial random seeds but only one cycle each? So lets say 100 runs in
> parallel with different initial random seed, 1 cycle and 10.000
> primaries. Or is it still necessary to have the runs with lets say 5
> cycles and 2.000 primaries each?
> With only 1 cycle I could save quite an amount of disk space,
> independent of the number of primaries per run.
>
> Greetz
>
> Roger
>
> On Fri, 2010-03-12 at 20:29 +0100, Chris Theis wrote:
> > Hi Roger,
> > I take the liberty of sending a copy of the mail/answer to the
> > discussion list in case your question is also interesting for other
> > users.
> >
> > > My new simulations take more time per primary so I have to send
> jobs
> > > with less primaries to the cluster in order to not upset the other
> > > users. This leads to some disk space problems, as every cycle uses
> the
> > > same amount of disk space more or less independent of the number of
> > > primaries. So I am asking myself the question to run for instance 1
> run
> > > with 1 cycle with 500'000 primaries or 1 run with 5 cycles with
> 100'000
> > > primaries each? I think the second version would be the better one,
> but
> > > I don't understand the difference.
> >
> > > Could you tell me a little bit more about cycles and runs or give
> me a
> > > link to resources? What version would you prefer?
> >
> > the terminology and the differences are indeed a bit tricky. Let's
> > imagine the following scenario:
> >
> > - You launch one simulation of 1 run with 1 cycle of 500.000
> particles.
> > FLUKA will automatically take the first default
> > random seed (unless you specify a seed manually!) and will start the
> > transport. So after 100.000 particles the
> > random seed will have a certain value which we call N100K, after
> > 200.000 it will be N200K etc. So at the end of the
> > one cycle it will have the value N500K and you will have one final
> > result for your scoring, which could be energy
> > deposition in one region for example.
> >
> > - Now we launch one simulation of 1 run which has 5 cycles of 100.000
> > particles per cycle. Unless specified otherwise
> > FLUKA will take the default seed, which is identical to the scenario
> > above. After the first cycle of 100.000 is completed
> > FLUKA writes out the current seed which will and should be identical
> > to N100K (there might be divergences but this is
> > a technical problem which we ignore and should normally not happen).
> > For the start of the second cycle FLUKA will
> > read in the last seed (N100K) and start from this. It will do this
> > for each cycle and at the end the last random seed will
> > be equivalent to N500K. This means that the sequence of random
> > numbers in these two scenarios has been identical and thus,
> > also the final averaged result over the 5 cycles
> > will be identical. The only difference to the first
> > scenario is that now you can do some statistics calculations because
> > for the final result you need to average over
> > the 5 cycles and this means you also obtain the standard deviation,
> > which of course is not possible if you only have 1 result.
> > As FLUKA does not provide as any statistical information
> > on the evolution of the results during the
> > simulation via second or third order statistical moments, you can
> only
> > obtain an idea about the significance of the
> > result if you run several cycles and calculate at least the
> deviation
> > from the mean. The final average result will
> > of course not change for both cases but the second case is preferable
> > as you obtain more valuable information.
> >
> > - Let's imagine a third scenario: we launch 5 runs of 1 cycle of
> 100.000
> > particles on 5 different machines. So in the
> > end we will also have 500.000 particles in total and should get the
> > same result as in the first two cases. Actually
> > this is a common misconception because by default FLUKA will use the
> > default seeds for each run which means that
> > we will have 5 runs with identical sequences for the random numbers.
> > This means that the results of all 5 runs will
> > be identical and thus, you have the perfect simulation which will
> have
> > a standard error of 0% :-)
> >
> > Therefore, you have to be careful if you launch several runs in
> parallel
> > on different machines to manually specify a different random seed =
in
> > each input file for each machine.
> >
> > In general with MC simulations one should also be careful how to
> choose
> > seeds to avoid nasty problems like masked cross-correlation,
> > non-decaying auto-correlation etc. But in case of FLUKA's random
> number
> > generator the inventors Marsaglia & Tsang already paid attention to
> this
> > problem and thus, you should not worry about it.
> >
> > Cheers
> > Chris
> >
> > ________________________________
> >
> > From: Roger H=3DE4lg [mailto:rhaelg_at_phys.ethz.ch]
> > Sent: Fri 12/03/2010 19:14
> > To: Chris Theis
> > Subject: Re: AW: Combining data from runs on different =3D
> > machines/architectures
> >
> >
> > My new simulations take more time per primary so I have to send jobs
> > with less primaries to the cluster in order to not upset the other
> > users. This leads to some disk space problems, as every cycle uses
> the
> > same amount of disk space more or less independent of the number of
> > primaries. So I am asking myself the question to run for instance 1
> run
> > with 1 cycle with 500'000 primaries or 1 run with 5 cycles with
> 100'000
> > primaries each? I think the second version would be the better one,
> but
> > I don't understand the difference.
> >
> > Could you tell me a little bit more about cycles and runs or give me
> a
> > link to resources? What version would you prefer?
> >
> >
> >
> >
>=20
Received on Mon Mar 22 2010 - 17:07:20 CET

This archive was generated by hypermail 2.2.0 : Mon Mar 22 2010 - 18:04:21 CET