LISTSERV mailing list manager LISTSERV 16.5

Help for ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L Archives

ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L  July 2007

ATLAS-SCCS-PLANNING-L July 2007

Subject:

Minutes of GT&S Meeting 25th July 2007

From:

"Stephen J. Gowdy" <[log in to unmask]>

Date:

25 Jul 2007 17:52:03 +0200 (CEST)Wed, 25 Jul 2007 17:52:03 +0200 (CEST)

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (164 lines)

GT&S Coordination 	  	Wednesday 25 July 2007
from 16:30 to 18:00
at CERN ( 32-S-C22 )
chaired by: Stephen Gowdy
Description:

Phone CERN 76000 code 0124636# (leader is 0114663#)
       or https://audioconf.cern.ch/call/0124636 .

Present: John, Alexei, Stephen, Gilbert, Dietrich, Sanjay, Kaushik,
 	 Simone

Wednesday 25 July 2007

16:30  	Minutes of last meeting and action list (05')

 	No corrections.

16:35  	Hot topics (20')
     * Pilot jobs on EGEE (15')				Simone Campana

       Initiated about a month ago. Every slot at CERN for production
       jobs was taken by Sanjay's pilots and using no CPU time. The
       problem was worse because the length of the queue was three
       weeks for wall clock and one week for CPU. The problem of long
       queues is independent of the pilot jobs. No other jobs could get
       through because the queue of pilots was long. Sanjay
       investigated and found a problem so this should not happen
       again. Isn't a bug a bug? Couldn't a normal job also have a bug
       that held a CPU for the whole time allowed.

       In this case the was a gLite upgrade which caused a loss of
       tracking the number of jobs running. In general the jobs should
       kill itself in this case. No daemon code was written after
       this. Previously it kept trying to contact the server for a
       job.

       Having shorter queues might be a good idea to avoid accumulating
       pilots. If a job only takes 2 hours the next job would go to the
       same pilot and another one would be idle. Both CERN and NIKHEF
       now have queues of 1 day CPU and 1.5 days wall clock. For
       production at CERN all jobs are mapped to atlasprd
       account. Might be better to have multiple pool accounts,
       hopefully soon. It will either be today or tomorrow or a couple
       of weeks (person going on holiday tomorrow). Should ask other
       sites to also have pool accounts for production as they do for
       user jobs. One issue we need to worry about is the batch system
       coping with a large number of shorter jobs.

       One difference with PANDA is that it only runs one job per
       pilot. CRONUS runs as much as possible. If the queue is about
       the same length as a job then CRONUS would only run one
       too. PANDA chose this way as site admins preferred this. It does
       put a heavy load on PANDA as some jobs can be as short as thirty
       minutes. Otherwise it could mess up local priority system.

       Switching identity might be a bigger issue. GLEXEC should allow
       a switch once only. But if you run analysis for many users they
       (if desired by the site) you might want to change the user more
       than once.

       PANDA will use GLEXEC but doesn't yet. No sites have requested
       it yet. As they only run one job by pilot they don't need switch
       more than once. Also the pilot kills itself if it doesn't find a
       job.

       Should we perhaps recommend that ATLAS pilot jobs only run one
       job? Sanjay would like to look at the statistics from CERN and
       NIKHEF to see what experience is gained.

       It would hurt the production system to have a draconian limit of
       one day. We should try to avoid this. However, some users also
       have jobs that occupy a CPU due to errors.

       Will talk about this at next weeks meeting and try to reach a
       conclusion.

     * CHEP Reviewers (05')

       Laura has put forward Sanjay's name. Do we need another person?
       There are 50 for all of ATLAS. There are only around 6 papers in
       the Grid area. Stephen will also act as a reviewer.

       Perhaps need to have practice talks in the first part of the
       last week in August, some folk will disappear for the GDB at the
       end of that week.

16:55  	ProdSys issues (10')				Luc GOOSSENS
17:05  	Distributed Analysis issues (10')		Dietrich Liko

       Current issue is the use of SLC4 32bit machines. FZK seems to be
       where this is an issue just now. They are using the SLC3 kit on
       these machines. The compilation seems to succeed but run time
       fails. Hopefully to solve it for the release 12 by shipping the
       compiler with the release. For release 13 will do a
       re-installation to use the SLC4 kits. Running binaries is fine
       (like production).

       Should we move away from compilation on the WN? Currently both
       GANGA and pathena support compilation this way. The reason for
       doing compilation the WN is to allow adoption of environment
       there. Should perhaps worry about undefined environments and
       what they mean for reproducibility.

17:15  	DDM issues (10')
     * Deployment (05')					Massimo Lamanna
     * Development (05')					Miguel Branco
17:25  	Tier-0 (10')					Luc GOOSSENS
17:35  	Job Transformations (05')			Manuel  Gallas
17:40  	Software Integration issues (05')		Alexei Klimentov

       Planning a series of function tests. One for DQ2 0.3, on for LFC
       and a third on for PANDA. This is in the border between
       operations, GT&S and SWING. For DQ2 want to check that all sites
       are ready for a predefined number of files shipped from Tier-1s
       and Tier-2s. Would measure a metric of time from subscription
       till first file delivered. The recent table of conditions
       doesn't look good. For LFC need the test instance from
       Jean-Phillipe, which test the new bulk queries and deletes. For
       PANDA will run the server at CERN. Will start discussion on
       preliminary timescale tomorrow, looking like the first week of
       August just now.

       M3 data is organised in datasets. The convention is being
       discussed for M4 data. Will be part of a document that will be
       released this month for the dataset naming convention.

17:45  	Grid middleware news (10') 	EGEE/OSG/NG
     * EGEE/LCG (05')					Laura Perini
     * NG (05')						Alexander Read
     * OSG (05')						Michael Ernst
17:55  	A.O.B. (05')

Action Items:

070725 Sanjay	Examine statistics from CERN & NIKHEF for pilots jobs

070620 Stephen	Put together LCG Metric note for further discussion.
        070725 Not done yet.

070606 Kaushik  Ask Ian & Pavel if we can switch the AOD merge to 20:1
 		and if we can do it for everything
        070725 Some discussion during software week. Now put to 10:1, due
 	      perhaps space on the node, will check. Certainly done.

070523 Stephen	Email Kors about VOBOX Tier-1 Service Level Agreement.
        070606 Not done yet.
        070620 Not done yet.
        070725 Not done yet.

070523 Dietrich Summaries available DA documentation to decide
 		what is needed
        070606 Not done yet by Dietrich. Has been nicely summaries by
 	      Constantine in the analysis model meeting last
 	      week. This gives a good overview. Action is done. Some
 	      discussion about the support model beyond the
 	      documentation. HyperNews seems to be working well for
 	      GANGA, hope that more people can answer questions as the
 	      amount of requests scale up.
        070725 Nothing more done yet.



Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

September 2016
July 2016
June 2016
May 2016
April 2016
March 2016
November 2015
September 2015
July 2015
June 2015
May 2015
April 2015
February 2015
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
September 2013
August 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use