Print

Print




> -----Original Message-----
> From: Yang, Wei 
> Sent: Saturday, April 25, 2009 4:25 AM
> To: Young, Charles C.; atlas-sccs-planning-l
> Subject: Re: [Usatlas-prodsys-l] first HammerCloud jobs for US cloud
> 
> Hi Charlie,
> 
> I don't know the answer. HammerCloud is a new thing. I will 
> ask on Wed's US phone meeting. I do have some guesses:
> 
> "mean prepare inputs time" should at least include copying 
> files from storage to batch nodes. This is likely the longest 
> part. It may also include querying local file catalogs for 
> file locations in storage. This should be pretty quick, in 
> seconds. I guess it doesn't include transferring data from 
> BNL to SLAC, as that will be a measurement of DDM 
> performance, and should be much longer than 48s. It also 
> shouldn't include time to access remote condDB. 
> 
> It is a little surprise that it took hours for other sites to 
> get data to their batch nodes. But if there are lots of input 
> files, that could be. For SLAC, we only copy non-ROOT files 

The logs indicate each job has about 150 files. Any idea how large each one is so we can do a reality check? 

> (like DBRelease.tar.gz) to batch nodes.
> ROOT files are read directly from the storage. Again, shorter 
> MPIT doesn't mean our storage is better.
> 
> There is no easy way to tell CPU utilization for a job 
> because there is no like between LSF IDs and Panda IDs.
> 
> We report to US ATLAS a list of CPU types and numbers of 
> cores of each type.
> US ATLAS calculates a weighted average based on their SI2K 
> (will use HEP-SPEC in the future), and use it to count the 
> CPU usage for each jobs.

Thanks for the feedback. Are we sure about this point? The forwarded message has a pointer to http://gangarobot.cern.ch/st/test_253/, where we see the same 4 plots on page 3 of PowerPoint file. At this URL, they are explained as "CPU/Walltime is the CPU Percent Utilization", i.e. without cpufactor. It makes sense to monitor this ratio, but I don't understand the usage of a ratio

 utilization = cpuconsumption/cpufactor/(stoptime-starttime).

that includes cpufactor. No idea what it would me. Maybe we can ask if the definition on page 2 is a typo. 

> 
> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
> 
> 
> 
> 
> 
> > From: "Young, Charles C." <[log in to unmask]>
> > Date: Thu, 23 Apr 2009 23:48:50 -0700
> > To: Wei Yang <[log in to unmask]>, atlas-sccs-planning-l 
> > <[log in to unmask]>
> > Cc: "Young, Charles C." <[log in to unmask]>
> > Subject: RE: [Usatlas-prodsys-l] first HammerCloud jobs for US cloud
> > 
> > Hi Wei,
> > 
> > Thanks! Some questions. What is involved in the "mean 
> prepare inputs time"
> > step? Is it copying input file from (local?) storage to 
> worker node? 
> > Is there some preparation of the input data beyond moving it around?
> > 
> > Can we find CPU percent normalized to the execution step only? I.e. 
> > exclude file copy overhead.
> > 
> > Don't understand definition on page 2. utilization = 
> > cpuconsumption/cpufactor/(stoptime-starttime). What is cpufactor?
> 
>