AW: Combining data from runs on different machines/architectures

From: Chris Theis <Christian.Theis_at_cern.ch>
Date: Wed, 10 Jun 2009 20:49:46 +0200

Hi Roger,

> ....
> args=-N 0 -M 1
> Queue
>
> args=-N 1 -M 1
> Queue
[SNIP]

I'm afraid that this won't work because FLUKA requires the last random
seed of cycle 1 in order to start cycle 2 of one batch that was started
with one specific seed. You can think of one cycle as a simple subset in
the sequence of particles that are transported. In order to retain the
sequence of random numbers that originate from one specific starting
seed each cycle has to be finished before the next one can start. Thus,
you cannot send several cycles to CONDOR in parallel. However, there is
a way to achieve what you want.

We've had a very gifted student who implemented a framework on top of
CONDOR which automatizes the submission & parallelization of FLUKA jobs
via a simple but powerful web interface. It allows you to submit FLUKA
jobs to CONDOR and provides status and progress monitoring via webpages.
You can find a presentation of the project at:

http://info-fluka-discussion.web.cern.ch/info-fluka-discussion/talks/Pereira_FlukaCluster_230409.pdf
<http://info-fluka-discussion.web.cern.ch/info-fluka-discussion/talks/Pereira_FlukaCluster_230409.pdf>
The basic idea is that you define the number of cycles/CPU & the number
of CPU cores (N) that you would like to use. The framework will then
automatically create N input files with different RANDOMIZE cards and a
subdirectory structure + the respective CONDOR submission files. Then
all the jobs are sent to the cluster and the rest is handled by a
dynamic fair-share policy which we've developed to maximize the
efficiency of the resource usage. In case you're interested in more
details you can take a look at www.cern.ch/coflu
<http://www.cern.ch/coflu>

However, you'll probably need a CERN account to access the page. In case
you cannot get access then let me know and I'll see to it that you get
access.

> "It is MANDATORY to use only seeds output information as written by the
> program in earlier runs ON THE SAME COMPUTER PLATFORM. Otherwise the
> randomness of the number sequence would not be guaranteed."
>
> What do you think about that?

Strictly speaking you are not using different platforms. Even though you
have 64-bit chips in your cluster they are switched to compatibility or
legacy mode to run FLUKA, so they all appear to be (more or less)
identical x86 CPUs. Things would become tricky if you had a heterogenous
cluster which would include for example different CPU types like RISC
processors.

However, my remark regarding slight differences in Intel & AMD CPUs
still stands valid. We've already seen divergences of the random number
sequence when starting the identical job on an Intel and on an AMD CPU.
It seems that floating point rounding is under some circumstances
handled differently even though strict double precision mode is used by
FLUKA. However, this should be no problem at all under normal
circumstances. Only if you really have to fully reproduced a random
number sequence then you have to make sure that you re-run the job on
the same CPU type.

> For the statistical evaluation is there a difference if you calculate
> many runs with different initial seeds with just one cycle or calculate
> less runs but each with several cycles?

Yes and no - a detailed answer to this question is actually not so
trivial and would venture off into the quite complex field of random
number generation for which this discussion list might not be the right
place. In practice you don't need to worry about this due to the quality
of FLUKA's random number generator.

Hope that helps
Chris
Received on Thu Jun 11 2009 - 09:13:30 CEST

This archive was generated by hypermail 2.2.0 : Thu Jun 11 2009 - 09:13:32 CEST