RE: RANDOMIZ WHAT(2) field and the effect on rfluka "chained" samples

From: Popescu, Razvan <popescu_at_bnl.gov>
Date: Thu, 17 Jan 2013 22:58:01 +0000

Mario,

Thanks a lot for you answer!

My problem, however, is different -- but in the meantime I've learned more about it and it looks now as a bug.

Here is the story in more detail:
- When using the RANDOMIZ card I do select a unique seed for each job (I use a formatted read of a chunk from /dev/urandom to be fancy :-) ) -- each job submitted to the cluster is started with a uniquely generated seed, a different one for each parallel run. That's good and not the issue we faced.
- The trouble occurs in the sequence (chain of "cycles") within an individual job.
- Let's assume an example where I start 10 jobs each running 5 cycles (e.g. qsub -t 1-10 rfluka -N0 -M5 Test.inp)
- Now let's focus on what happened inside 1 (of 10) job, tasked with running 5 cycles...

- The pathological observation was that all random seed files "ran...001, ran...002, ... ran...005" except the last one(ran...006), were identical, and all computation results generated by the 5 cycles were also identical!
- It all seemed like the result of re-setting the given random seed at the beginning of EACH CYCLE....when flukahp was supposed to parse the input file...
- So that's why I wrote my initial message ...
- .. seeing how rfluka sets up the random files on units 1 and 2, how it fetches the default "random.dat" before cycle #1, etc... I assumed that flukahp knows nothing of the run history, sees the RANDOMIZ card with external seed and overwrites the file at unit 1 (as it seems to do with the default copy of random.dat before cycle #1) -- so I assumed that it does that in all subsequent cycles past #1, and consequently it generates 5 identical copies of cycle #1 (which was what I found!!!) -- therefore the conclusion that once you input the random seed, you cannot chain multiple executions.

- I even changed rfluka to remove the RANDOMIZ card after cycle #1 !!! (and it was doing what I wanted)

It turns that I was wrong and the Fluka is much smarter that I assumed (THANKS developers!)
As you know fluka overwrites the file at unit 1 at the first pass over the RANDOMIZ card, but in subsequent cycles is smart enough to recognize the change and doesn't override itself again and again!!! And I had a colleague using my scripts to submit parallel jobs, not reproduce my problem -- WHY?...

BECAUSE apparently I left in TWO RANDOMIZ cards in my input file (my prototype version of submit scripts were quick and dirty, and just inserted a random RANDMIZ card just before the START card disregarding any exiting RANDMIZ card -- I ended up with 2 RAND cards and the repetition, while my colleague had a clean input file just with the random RAND card inserted by the scripts and a clean non-redundant sequence of 5 statistically independent runs per job!!!)
I repeated the tests: both running "rfluka -N0 -M5 Test.inp", one with Test.inp containing TWO RANDOMIZ cards, the other one with a single RANDMOIZ card...

So the bug seems to be triggered by the existence of TWO RANDOMIZ cards in an input file (FLUKA version 2011.2.12 32bit). Here are excerpts of the input and output files for the two tests (one card vs. two cards)

TWO RANDOMIZ CARDS:
$ grep RAND *.inp
RANDOMIZ 1.0 914893.
RANDOMIZ 1.0 12491937.

$ grep RM64 *.out
Test001.out: FLRM64 INITIALIZED 0 0 14893 9 914893
Test001.out: FLRM64 INITIALIZED 0 0 91937 124 12491937
Test002.out: FLRM64 INITIALIZED 0 0 14893 9 914893
Test002.out: FLRM64 INITIALIZED 0 0 91937 124 12491937
Test003.out: FLRM64 INITIALIZED 0 0 14893 9 914893
Test003.out: FLRM64 INITIALIZED 0 0 91937 124 12491937
Test004.out: FLRM64 INITIALIZED 0 0 14893 9 914893
Test004.out: FLRM64 INITIALIZED 0 0 91937 124 12491937
Test005.out: FLRM64 INITIALIZED 0 0 14893 9 914893
Test005.out: FLRM64 INITIALIZED 0 0 91937 124 12491937

While the correct config seems to initialize differently the random no generator...

ONE RANDOMIZ card:
$ grep RAND *.inp
RANDOMIZ 1.0 2916035.

$ grep RM64 *.out
Test001.out: FLRM64 INITIALIZED 0 0 16035 29 2916035
Test002.out: RM64 INITIALIZED: 16035 29 793273129 0
Test003.out: RM64 INITIALIZED: 16035 29 673011689 1
Test004.out: RM64 INITIALIZED: 16035 29 625750494 2
Test005.out: RM64 INITIALIZED: 16035 29 737036744 3

I'd love to understand what the various parameters of the FLRM64 and RM64 entries mean.

Anyway, I think the observation could be already irrelevant, as the issue was uncovered in version 2001.2.12 and maybe is already fixed in the current ...2.17.

I can't test it because I'm still waiting for a fix, or feedback, on a much more damaging issue that I have will all 64bit versions I've tested (2.16 and 2.17) -- core dump, no run -- filed to the discussion list awhile ago but w/o any replies!!!

That's where we could really use some help! Can anybody shade some light into that matter?!

Thanks again,
Razvan

-----Original Message-----
From: Santana, Mario [mailto:msantana_at_slac.stanford.edu]
Sent: Thursday, January 17, 2013 12:42 PM
To: Popescu, Razvan; fluka-discuss
Subject: Re: RANDOMIZ WHAT(2) field and the effect on rfluka "chained" samples

Hi Razvan,

Yes, if you are willing to run parallel jobs you should not have the same RANDOMIZ WHAT(2) card in each of them, otherwise each of the jobs (and each sequence if you run more than one job in series) will be identical.

I copy below a bash script that you could use to send parallel jobs. This is just an example, there are several ways to do it and you may choose to make seed numbers more different to one another (although randomization should be fine even if each seed differs by one unit, as in the example).

You will have to edit the submittal card (bsug -q ...) to match the syntax and instructions compatible with your cluster or computer system.

Mario

#!/bin/bash
echo "auto.sh input_file_name(no .inp suffix) first_random last_random"
#
file=`echo $1 | sed 's/\.inp//'`
i=$2
while [ $i -le $3 ]
do
  if [ $i -le 9 ]
  then
   sed "s/\(RANDOMIZ[ ]*1.0[ ]*\)1/\1$i/" $file.inp > $file.$i.inp elif [ $i -le 99 ] then
   sed "s/\(RANDOMIZ[ ]*1.0[ ]*\) 1/\1$i/" $file.inp > $file.$i.inp elif [ $i -le 999 ] then
   sed "s/\(RANDOMIZ[ ]*1.0[ ]*\) 1/\1$i/" $file.inp > $file.$i.inp elif [ $i -le 9999 ] then
   sed "s/\(RANDOMIZ[ ]*1.0[ ]*\) 1/\1$i/" $file.inp > $file.$i.inp
  elif [ $i -le 99999 ]
  then
   sed "s/\(RANDOMIZ[ ]*1.0[ ]*\) 1/\1$i/" $file.inp > $file.$i.inp
  fi
  bsub -q xxl rfluka -e ./flukahp -N 0 -M 1 $file.$i i=`expr $i + 1` done

On Jan 14, 2013, at 2:09 PM, Popescu, Razvan wrote:

Hi,

Is there a way to run a multiple sample calculation, driven for example by something like "rfluka -M5 <inputfile>", but with an externally given random seed (by RANDOMIZ WHAT(2) field)...?
I'm trying to schedule parallel runs on a computing cluster, dispatching each with a different RANDOMIZ card, but would like to have each job calculate 5-10 samples instead of only one. It appears that the presence of the RANDOMIZ card overrides the random seed with the same value, at the beginning of each cycle, nullifying the neat propagation of rand seed from a cycle to the next, done in rfluka...

Am I mistaken?
Is there a way to override the default random seed just at the beginning of cycle 1...?

Thanks,
Razvan
Received on Fri Jan 18 2013 - 10:29:59 CET

This archive was generated by hypermail 2.2.0 : Fri Jan 18 2013 - 10:30:13 CET