LISTSERV 16.5 - ATLAS-SCCS-PLANNING-L Archives

Subscriber's Corner
Email Lists
ATLAS-SCCS-PLANNING-L Archives

ATLAS-SCCS-PLANNING-L@LISTSERV.SLAC.STANFORD.EDU

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		ATLAS-SCCS-PLANNING-L Home
		ATLAS-SCCS-PLANNING-L January 2008
Subject:
FDR-1. Was [RE: Minutes of ATLAS/SCCS Planning Meeting 23rd Jan 2008]
From:
"Young, Charles C." <[log in to unmask]>
Date:
28 Jan 2008 10:41:38 -0800Mon, 28 Jan 2008 10:41:38 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (196 lines)
Hi,

Based on reports at today's US ATLAS FDR meeting (http://indico.cern.ch/conferenceDisplay.py?confId=27938), one complete copy occupies

	~10 TB for AOD
	~50 TB for ESD

Tier 2's are expected to have all AOD. Tier 2's are not required to have ESD, but can subscribe to as much as they want. 

After the above datasets have been produced and transferred, we will stop *producing* more data -- it cannot produce fast enough to mimic the ATLAS detector anyway. However, we will continue to *transfer* the same data over and over for (I think) 3 more weeks. The goal is to demonstrate system stability. Another goal is for Tier 2's to see if they can support analysis on the first dataset while more data come in. 

Someone asked if we need more space for a second copy of AOD. This is a likely scenario in real life when we have updated constants, code fixes, and so on. There was no clear statement how much extra AOD space is needed. I suggest we budget for a full second copy, i.e. ~10 TB. 

More issues will no doubt crop up as we get going with FDR. Do we need to meet weekly for a while? Cheers. 

							Charlie
--
Charles C. Young
M.S. 43, Stanford Linear Accelerator Center       
P.O. Box 20450                                         
Stanford, CA 94309                                      
[log in to unmask]                                
voice  (650) 926 2669                         
fax    (650) 926 2923                       
CERN GSM +41 76 487 2069 

> -----Original Message-----
> From: [log in to unmask] 
> [mailto:[log in to unmask]] On 
> Behalf Of Stephen J. Gowdy
> Sent: Wednesday, January 23, 2008 6:58 PM
> To: atlas-sccs-planning-l
> Subject: Minutes of ATLAS/SCCS Planning Meeting 23rd Jan 2008
> 
> ATLAS SCCS Planning 23Jan2008
> -----------------------------
> 
>   9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#
> 
> Present: Stephen, Wei, Randy, Chuck, Peter, Richard, Bob
> 
> Agenda:
> 
> 1. DQ2 Status/Web Proxy
> 
>     There is a problem with DQ2. The number of jobs running is much
>     higher than at SLAC. This is the third time this problem has been
>     seen. They are still working on it. PANDA thinks there 
> are 800 when
>     there are really only 500. Over the weekend it thought there was
>     more than 1k but was only 700 or 800 hundred.
> 
> 2. Tier-2 Hardware
> 
>     The Tier-2 CPUs are there but not installed yet. All the Dell
>     machines are being installed but at the same time there is a
>     reduction in the number of machines previously installed. They are
>     being moved to the new Black Box. Will be increasing the fair
>     shares a little but not to the final number for a month or so. The
>     Dell machines are too long to fit in the Black Box, so need to
>     deinstall machines from the water cooled racks and moved to the
>     Black Box at IR12. The new systems will be put into the water
>     cooled racks. While installing the new machines will cable up the
>     older machines in the Black Box. Not sure what staff levels will
>     be so hard to give concrete schedule.
> 
>     Making sure that the storage will be able to be plugged 
> in. Looking
>     at what is needed from GLAST and BaBar to put order together. Have
>     a fairly reasonable offer in hand. The Thumpers will use 1TB
>     disks. Each server will use 4 Gb ports so the storage will use
>     almost as much networking ports as the CPUs. If this continues the
>     Tier-2 will not lean on the hardware from the rest of the lab but
>     it will on the support. As we are buying the same for everyone can
>     make priority based decisions on who needs it at any particular
>     time. Not evident that there is no where to put them. There is an
>     issue with 10gigE Intel cards in the Thumpers reported, but
>     shouldn't effect us. Might see some Thumper-like device that is
>     really a JBOD with 10gigE on the motherboard.
> 
> 3. AOB
> 
>     - Gatekeeper
> 
>     Could probably use a faster machine. We also need more gridFTP
>     machines. If we use 3 20Zs that would probably be enough. Not sure
>     if multicore machines would help. Best way to get fastest machines
>     is to use machines bought for batch. There is nobody's money ear
>     marked for this. Was wondered if we could use the BaBar machines
>     for ATLAS instead of buying new ones but it was too late. But
>     perhaps could sell some of the Dell 1950s if they are not
>     needed. For gridFTP not completely sure where the CPU time is
>     spent. Might be interesting to find out what other Tier-2s are
>     using. Could take some of the opterons machines bought 
> for ATLAS to
>     do this. Memory is one of the issues just now, could add more
>     memory.
> 
>     Will take some boers consistently for these services. Need four
>     machines, taking the last four will take it down to 135.
> 
>     - External Security Review
> 
>     In late February will be having an external person doing 
> a external
>     scan. On the week of March 15th will start trying to do 
> penetration
>     from internal. Will start having meetings this afternoon 
> to tighten
>     up security at lab. Some examples will be screen savers with
>     passwords, running crack on local passwords, etc. This will be a
>     pretty tough review, so everyone needs to help. Some extra staff
>     time will need to be put on security.
> 
>     For ATLAS need to make sure things like the mySQL server is up to
>     date.
> 
>     - Network
> 
>     Working up upgrading external network to 10gigE.
> 
>     - Job slots on xxl
> 
>     At the moment the limit is 62. xlong is about a day and many jobs
>     are running longer than that so they need to be moved to 
> xxl. Would
>     like to increase the number of slots in xxl to something like
>     256. Wei has a graph to saw that 400 jobs had been killed due to
>     this. Could also increase the length of xlong queue, try to keep
>     that around a day. ATLAS jobs are meant to take around a day. Also
>     need to make sure jobs don't run forever. US ATLAS should 
> define an
>     amount of CPU time that is needed at the Tier-2 queues. Will
>     discuss in email about changing length limits. Will also have a
>     meeting at SLAC to see if xxl could accept more jobs.
> 
>     Is there one type of machine that kills jobs? BaBar is asking to
>     decrease the CPUF on the Dells as they are not performing as well
>     as expected.
> 
>     Working on a proposal to submit jobs to a single queue
>     ("general"). You would specify the CPU time needed. Short jobs
>     would be run first but as longer jobs aged they would get higher
>     priority to start. This would be useful with jobs start requesting
>     multiple cores, could put short jobs on cores will you create a
>     free machine.
> 
>     - Stephen Leaving
> 
>     Leaving ATLAS and SLAC and moving to CERN and CMS. It has been
>     getting talked about for a long time but is finally happening in a
>     fast fashion. Peter will be taking over this meeting. Stephen &
>     Peter need to talk to come up with a transition plan, expect
>     Stephen will still run at least the next meeting.
> 
>     - SL4 status
> 
>     The new machines are being installed with RHEL4. Have had 
> the green
>     light from GLAST on moving to RHEL4 and 64 bit. BaBar have been
>     running on it but not compiling on it. ATLAS also doesn't compile
>     in 64-bit mode.
> 
>     - FDR
> 
>     All US ATLAS Tier-2 sites are participating. Need to upgrade the
>     network. Could perhaps move the switches needed first. Have done
>     two already but those were the easy ones to schedule. The 
> next ones
>     will not be so easy as they effect more people. Would probably be
>     too much work to do them all at the same time. Small steps are
>     generally better. Have learnt that running name servers on local
>     machines has caused problems, could perhaps stop doing 
> that. Do now
>     have enough UPS to support the first module for the 10gigE uplink
>     to ESnet.
> 
> Action Items:
> -------------
> 
> 080123	Stephen  Find out if there is or can be a US 
> ATLAS batch limit(s)
> 
> 
> --
>   /------------------------------------+-------------------------\
> |Stephen J. Gowdy, SLAC               | CERN     Office: 32-2-A22|
> |http://www.slac.stanford.edu/~gowdy/ | CH-1211 Geneva 23        |
> |                                     | Switzerland              |
> |EMail: [log in to unmask]       | Tel: +41 22 767 5840     |
>   \------------------------------------+-------------------------/
>
Top of Message | Previous Page | Permalink
Search Archives

Advanced Options
Options

		Log In
		Get Password

		Search Archives

		Subscribe or Unsubscribe