RE: fluka job crashing

From: Sudeshna Banerjee <Sudeshna.Banerjee_at_cern.ch>
Date: Thu, 4 Apr 2013 10:12:21 +0000

This could be my problem too. But it has happened everytime I have submitted the batch job, but after several hours of running. If I try with a simplified geometry then the job completes properly and quickly.

That is why I wanted to know if there is a possiblity to run for only a few events (few collisions in my case), so that the job (with the full CMS geometry) has a chance to finish soon.

Sudeshna Banerjee
________________________________________
From: Alfredo Ferrari [alfredo.ferrari_at_cern.ch]
Sent: 04 April 2013 11:54
To: Sudeshna Banerjee
Cc: fluka-discuss_at_fluka.org
Subject: Re: fluka job crashing

.... bus errors are usually problems connected with the operating
system/hardware of the computer you are running on and normally they have
nothing to do with Fluka. On our cluster we get similar errors (very rare)
if the node the batch job is running on temporarily loses the nfs
connection to the master node, but I do not know if this applies to your
case as well

                     Alfredo


+----------------------------------------------------------------------+
| Alfredo Ferrari || Tel.: +41.22.76.76119 |
| CERN-EN/STI || Fax.: +41.22.76.69474 |
| 1211 Geneva 23 || e-mail: Alfredo.Ferrari_at_cern.ch |
| Switzerland || |
+----------------------------------------------------------------------+

On Thu, 4 Apr 2013, Sudeshna Banerjee wrote:

> Hello,
>
> I am trying to run fluka with a geometry file for the CMS detector. But
> my batch jobs are failing after several hours. A core file is created but
> the *.err, *.out and *.log files do not show any error messages. The only
> error I see is in the batch job submission log file. It says -
>
> ======================= Running FLUKA for cycle # 1 =======================
> /afs/cern.ch/user/b/bhat/scratch1/sudeshna/fluka/flutil/rfluka: line 358:
> 5268 Bus error
> (core dumped) "${EXE}" < "$INPN" 2> "$LOGF" > "$LOGF"
>
> ____________________________________________________________________________
> How do I find out what went wrong ?
> Also, is there a way to control the number of collisions that are generated
> ? I am guessing that if I can run the job for 1 or 2 collisions, then I will
> not have to wait too long to find out if the job is going to fail.
>
> Thanks
> Sudeshna Banerjee
>
>
>
>
>
Received on Thu Apr 04 2013 - 20:47:58 CEST

This archive was generated by hypermail 2.3.0 : Thu Apr 04 2013 - 20:48:29 CEST