LISTSERV mailing list manager LISTSERV 16.5

Help for ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L Archives

ATLAS-SCCS-PLANNING-L Archives


ATLAS-SCCS-PLANNING-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L Home

ATLAS-SCCS-PLANNING-L  August 2006

ATLAS-SCCS-PLANNING-L August 2006

Subject:

RE: Minutes of ATLAS/SCCS Planning Meeting (9th August 2006)

From:

"Su, Dong" <[log in to unmask]>

Date:

10 Aug 2006 03:57:20 -0700Thu, 10 Aug 2006 03:57:20 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (271 lines)


Some remarks below on the various items.  

Su Dong  

> -----Original Message-----
> From: [log in to unmask] [mailto:owner-atlas-
> [log in to unmask]] On Behalf Of Stephen J. Gowdy
> Sent: Wednesday, August 09, 2006 10:32 AM
> To: atlas-sccs-planning-l
> Subject: Minutes of ATLAS/SCCS Planning Meeting (9th August 2006)
> 
> ATLAS SCCS Planning 09Aug2006
> -----------------------------
> 
>   SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#
> 
> Present: Stephen, Su Dong, Randy, Shirley, Chuck, Ariel, Booker,
>  	 Richard, Wei
> 
> Agenda:
> 
> 1. DQ2 Status/Web Proxy
> 
>     We now have user account to run under (atldq2). Sent request to
>     have access to atl-dq2, will be added to the netgroup that gives it
>     access to both atl-prod machines.
> 
>     Last week either before or after the meeting limited access to the
>     web server to onsite only and have seen no problems.
> 
>     Chuck opened a ticket about the broad access to that machine for
>     the mysql. BobC forwarded it to CERN and then the ATLAS folk. The
>     response was to try to get the information out of the Google
>     cache. The mechanism to determine which machines have MySQL
>     installed will not work with this machine as it is installed by
>     pacman, not via RPM.
> 
>     The MySQL user dq2user should only have access to the data tables
>     it needs. It should not need to add users or change permissions. We
>     should ask ATLAS folk if it really needs access to configure new
>     databases etc. Update, create records, create or drop entire
>     tables. If someone gets a hold of the password they could create
>     tables which contain malware or other things you don't want. Expect
>     that all that is needed is update, add and alter records in the
>     table.
> 
>     250GB disk space was full, Wei and Randy talked. They thought there
>     was another server on order... so they doubled the size. There is a

This link pointed to by Nurcan Ozturk is quite interesting:
http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?dash=ddm
Given the typical remaining available space elsewhere, it may not be surprising that a total space of 250Gb would result in a battle at the limit all the time.   

>     script that cleans up space that Stephen currently runs
>     manually. Will run it shortly to see how much space it frees up
>     (121GB). US ATLAS says you need about 50GB per CPU. Recently signed
>     an MoU to start the process to get the money for the Tier-2
>     site. The current plans says to spend twice as much on storage as
>     on CPU. Will not get the ATLAS money till the 15th September. There
>     is a storage order going out shortly for BaBar (one of twice a
>     year), as BaBar will not need this immediately it could be borrowed
>     by ATLAS till BaBar needs it and ATLAS' order has been
>     placed. Would be useful to know how much space needs to be
>     available for users. Is useful to see the site opened up to other
>     US ATLAS users. Need to be doing both user and production
>     jobs. There will be an advisory board to help settle any issues
>     between these two needs. Hope to be able to deliver some resources
>     prior to the October Physics Jamboree. Need to determine what is
>     exactly needed soon to be useful for then.

Jumps start the T2 in advance to the Oct Jamboree is a great goal to meet the proposed ambition of a premier T2. There are many issues on what are the expected usage, in particular the resources devoted to local interactive/batch from users logging in to SLAC vs Grid jobs. I would think we want to give the former more priority, but the T2 has contracted promise to serve the GRID ? These are excellent topics for the T2 advisory board as to what we collectively want as users and the opinions should be carried to the Harvard T2 meeting and clarify the policy.      
 
> 
>     Need to setup a web page to describe how outside ATLAS users can
>     register at SLAC. There is an ATLAS group setup for accounts. Need
>     a way to identify new user requests are valid. There is a
>     database of ATLAS users at CERN but doesn't tell us which users
>     should have SLAC Tier-2 access. Perhaps have a list of US groups
>     which should have access and determine which institution a user
>     comes from. Should have a single web point of entry for ATLAS
>     users and who to complain to. Who's responsibility is that?
>     Richard's. SCCS will then need to hand off certain problems to
>     others. Ariel can help provide feedback and actual help.

The T2 users may not be like traditional SLAC users with the residence at SLAC mentality and awareness of how to find things at SLAC through the general SLAC information channels (even I cannot find anything any more in the new SLAC web structure). So I strongly believe that we need a single entry web page for all ATLAS T2 users to summary in one place the essential usage information. This is the face of our Tier-2. 

The users registration system should be no different to BaBar: each user can be identified through the STA system with their supervisor as their group PIs, and the group PIs can be traced to associated experiment (although many cases could be both BaBar and ATLAS). This requires the group PIs to register first, which is something can be brought up at the T2 advisory. Most of the US Western groups are already here (may be not Arizona; and LBNL may need a separate registration). How does the users from non Western groups, or global ATLAS user register is a much trickier issue and is again a good topic for the Harvard Tier-2 meeting to clarify. If you go through the BNL 'getting started' page: http://www.acf.bnl.gov/UserInfo/GettingStarted/ , they do have a system to register users for only wanting to use computing remotely (I am one of them) and rarely or never comes on site. We certainly need to be able to offer similar service. 

The US Tier-2 web pages are all accessible through US ATLAS Tier-2 support page:  http://www.usatlas.bnl.gov/twiki/bin/view/Admins/WebHome#Tier2_Site_Web_Pages
and the pages for the 3 center look rather different. We clearly need to produce something to come in here too, and hopefully better. It is useful for computing management, but more importantly it should provide the info for users.   

Regarding some of the information for users at SLAC, if you look at the SLAC ATLAS page: 
    http://www.slac.stanford.edu/exp/atlas/
The computing section under the general info section on the top already have some pieces of computing user info links and pages Stephen and I put together, including some SLAC specific environment setup. The CERN computing account setup info page I accumulated: 
   http://www.slac.stanford.edu/exp/atlas/general/computing-account.html
certainly saved many explanations to the new comers whom I have seen to have fallen into traps described in there without reading this first. The effort was certainly worthwhile and already benefiting from don't have to repeatedly answering the same questions. However, this is for us as users going to CERN. We haven't done much in assembling a more concise summary for new people coming into the SLAC system, as up to now most of us are long time SLAC staff. I am already feeling it from new employees, users and students and there are similar questions on the SLAC accounts/E-mail etc. We clearly need some concise summary on SLAC as well for the T2 page. It would be great if SCCS takes the lead on that and the SLAC ATLAS group users of course should actively contributing as well. Ariel can help to tell us from the eyes of a new employee in what way the information can be put together to be most appealing to new users. 
  The SLAC computing account request page does have clear instructions on all aspects registering new accounts. However, for new ATLAS T2 users, they mostly only need a UNIX account, what we want is hopefully the concise explicit steps for them to get the process done in 15min, similar to the BNL instructions, integrating user registration and computing account registration in one place (I hope we can do better on the user reg than BNL which is a bit more complicated tan necessary). 
  The public machine page at SLAC:      
http://www.slac.stanford.edu/comp/unix/public-machines.html
is great and really helps the users knowing the basics of what they are running on. I wish there is a similar summary on the batch queues. I found the batch documentation is pretty poor everywhere: CERN, BNL. The standard LSF documentation only tells you the command, but no information on what queues exist at a given site and what do they mean: long mean how many minutes ? CPU seconds mean raw machine time or normalized CPU time ? and how that is defined ? etc, which would be really nice being put together in one page. I think there is a ticket from me on this in the system, but still being worked on.    

> 
>     Wei will be "project" lead for getting required infrastructure created
>     as needed for the October Workshop.
> 
>     Recent lesson learned, real ATLAS people don't use the grid. It is
>     used for ATLAS simulation production. Folk doing work don't need
>     that level or resources or complication.

Indeed the most significant human interface to computing usage at the moment before large dataset come in is probably still the traditional way of user login locally to develop code and run many small test jobs (we cannot generate large datasets anyway). The interactive response of a local T2 is important.   

I hope we will have a scheme to deal with priorities for some local ATLAS batch jobs. In particular, Zach Marshall (zmarshal) is working on parameterized shower simulation which is aimed at large savings of CPU times. However, to get there, he still needs to run many large simulation jobs for the validation and tuning. If there is a dedicated queue he can use, or there is some ways of allowing more jobs from him, it is an ATLAS priority need within the current resource share ATLAS can have.         

> 
> 2. Trigger Farm Status
> 
>     Steffen is happy that all the physical preparations are being made
>     on the correct timescale. Richard thought this wasn't going to be
>     possible but now John and Boris have pulled off some miracles to
>     keep everyone happy.
> 
>     Some discussion about bringing in an HP Pro Curve switch. This is
>     what ATLAS will actually be using. Gary and Charley would be
>     interested in knowing more information about this.
> 

The ProCurve switch will only be used in the isolated subnet of the ATLAS online farm behind a Cisco 6500.  

> 3. ATLAS Oracle Server
> 
>     Last week were looking for rack space. Since them some things have
>     become higher priority, hopefully get back to it soon.
> 
> 4. Slots for ATLAS Production jobs and other batch related stuff
> 
>     For the priority LSF group should be no problem, Neal back next
>     week.
> 
>     Wei tried sending some of the failing jobs (the pre-12 series
>     releases with old Job Transforms) to a couple of slow machines that
>     do have Internet access. They didn't work well though as memory was
>     low.
> 
>     For database updates there is a web page that describes what
>     version needs to be used.
> 
>     Will be a discussion with the US ATLAS which can perhaps help
>     understand some of the issues with batch access to the Internet.
> 

The model for CondDB access for the long term computing operations should also be a topic for the Tier-2 workshop. Local DB replica are definitely more efficient and performant, but the scheme for the CondDB updates (needed very frequently) is probably a significant industry.   

>     Need to understand if the SLAC Grid resources should be used by
>     anyone or just ATLAS. Also need to understand if we should offer
>     Tier-2 service to folk from the East Coast or even abroad.
> 
> 5. AOB
> 
>     DQ2 Subscriptions:
> 
>     How does DQ2 subscribe to data? There is command that can be run to
>     do this. Production jobs can actually pull in data also if they
>     need it.
> 
>     Memory Use:
> 
>     Also still have the question about how much memory is needed. Very
>     important to know if we need 2GB per job or just one. The person
>     Stephen would talk to to get a definitive answer is on holiday just
>     now and back next week. Can this wait till next week? There is an
>     order going out soon (as mentioned above) so can lend ATLAS
>     10TB. All other Tier-2s are getting 1GB per core of memory. Will
>     ask at the Workshop next week also.
> 
>     Replica Database:
> 
>     The current one should work with the latest jobs, it was installed
>     recently.
> 
>     Meeting Room Time:
> 
>     Will try to extend to 1.5 hours.
> 
>     ATLAS Environment:
> 
>     Will talk about this tomorrow at the SLAC ATLAS TDAQ meeting.
> 
> Action Items:
> -------------
> 
> 060809 Richard	Come back with person responsible for web site
> 
> 060809 Stephen	Ask what dq2user needs to do in MySQL
> 
> 060802 Stephen	Email Su Dong about Tier2 Workshop
>         060809 Done.
> 
> 060802 Stephen	Find out about failing ATLAS jobs at SLAC
>         060809 Wei asked if anything useful and there were 9 completely
>                successful jobs (but those were tests). We should find
>  	      out if the intermediate files are kept and are useful.
> 
> 060726 Randy	Talk to Richard about DQ2 Workshop
>         060802 Wei is well along on planning to going, but BNL has some
>  	      trouble giving access. Normal process takes 90 days but
>  	      hope to be able to do something faster by the 10th
> September.
>         060809 Has site access approval. Done.
> 
> 060726 Stephen	Find out about maximum memory and local storage per job
>         060802 No news yet.
>         060809 No news yet.
> 
> 060412 Systems  Provide Oracle service for ATLAS Trigger testing (RT
> 46089)
>         060419 No ticket yet, so nothing done.
>         060426 Now have ticket 46089.
>         060503 No news.
>         060524 Steffen has provided configuration information. Now in
> Chuck's
>  	      hands.
>         060628 Randy will ask Chuck about status.
>         060726 First on list for V240 but not sure when it will
>                happen. Will put a T3a on it.
>         060802 John checking for rack space.
>         060809 Still needs allocated rack space.
> 
> 060224 Chuck	Will check on web server request for DQ2 machine
>         060301 Waiting for web server request information from Stephen.
>         060308 Haven't checked yet; haven't received Stephen's request yet.
>         060315 Still not sent Chuck information.
>         060405 No update.
>         060412 No update.
>         060419 No update.
>         060426 Resubmitted request as it was lost before.
>         060503 Not heard anything for about two weeks.
>         060524 Stephen need to update ticket, moving from yakut to DQ2
> machine.
>         060628 Request updated now running on atl-dq2, waiting for scan.
>         060726 Meeting on Monday about this.
>         060802 Requesting new user to run services, blocked offsite web
>  	      access, Need to explore MySQL security.
>         060809 New user available, more ideas for MySQL security.
> 
> 060224 Richard	Discuss ATLAS trigger machines with others in SCCS
>         060301 Only limited response from John W was resigned
>  	      acceptance... need to work on an actual deployment plan as
>  	      there are real issues to be solved.
>         060308 John aware and in plans as much as anything is. New
>  	      engineer will take over.
>         060315 No update.
>         060405 No update.
>         060412 No update.
>         060419 No update.
>         060426 No update.
>         060503 No update.
>         060524 RT 45823. Engineer looking at power availability. On track
> for
>  	      August.
>         060628 Understand schedule, Randy will make sure John is aware.
>         060726 Need to nail down when power will be available. Steffen
>  	      things he can make it happen with existing equipment.
>         060802 Looks like this will fit in SCCS. Can reuse rack,
>                switches and fibres.
>         060809 Everything looking good for this now.




Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

September 2016
July 2016
June 2016
May 2016
April 2016
March 2016
November 2015
September 2015
July 2015
June 2015
May 2015
April 2015
February 2015
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
September 2013
August 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use