Print

Print


Hoi,

just a few thoughts. Not prioritized.

1. We can  run over all  data and all  MC within 4 hours  under normal
   network/server conditions  with 20 parallel  jobs. We don't  need >
   100 jobs for  that.  Not with anaRecoil, but  with anaQA. Identical
   output, but  no superfluous  histograms. There have  been exchanges
   about those histograms and they need not repeated here.

2. A very serious  problem (in addition to the raw  number of jobs) is
   that  Daniele's chain  do  not make  use  of a  'new' input  format
   implemented weeks  ago in both anaQA and  anaRecoil: While chaining
   you can either specify just a filename or a filename and the number
   of events. If  you do the former, ROOT will  open the file, extract
   the number of events, close it.   Go to the next file.  Do that for
   all files in the chain.  Multiply by 100, if all of your jobs start
   up at  the same time  in the long  queue.  (Or at least  125.)  You
   will completely saturate  and kill any server.  On  the other hand,
   if you  do specify the number  of events, ROOT  opens the rootfiles
   only for processing, and then we  have some CPU cycles to spend and
   the access is quickly becoming asynchronous. It might be beneficial
   to switch to the new chaining scheme. At least consider it.

3. The difference between the servers can be extracted from

   http://monitor/host.php

   and entering sulyky% into the search field. 

    sulky25:/AWG18       619788288 410758632 195965336    68%    /a/sulky25/AWG18
    sulky26:/AWG23       619788288  168376 580893672     1%    /a/sulky26/AWG23
    sulky13:/AWG7        1000652800 781913464 205071112    80%    /a/sulky13/AWG7
    sulky09:/AWG8        500340736 492076992 8199392    99%    /a/sulky09/AWG8

   AWG8  is  on  one  of  the  older  servers  available  at  SLAC,  a
   Netra-t-1400/1405. It was beaten to death repeatedly. AWG7 was much
   better, and AWG12 as well.

   AWG18 is on a sun Fire-280R.  These servers should be able to serve
   several TB  (according to SCS). We  replaced an aging  Netra of the
   Group C disks in May with such a model, and the performance went up
   significantly. You can still kill it, and you can even do that with
   block I/O and HBK (Henning can tell the story). 

4. I have not yet verified that the messages

 Error in <TFile::TFile>: file /nfs/farm/babar/AWG18/ISL/sx-080702/data/2000/output/outputdir/AlleEvents_2000_on-1099.root does not exist

   are actually  bad when encountered while processing  chains (in the
   new scheme). The situation might be different when trying to chain.
   Note that I do not say it does not matter.

Cheers,
--U.