[Please remember to send me updates for action items]
ATLAS SCCS Planning 22Nov2006
-----------------------------
9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#
Present: Richard, Charlie, Wei, John, Chuck, Bill, Randy, Stephen, AndyH
Agenda:
1. DQ2 Status/Web Proxy
Have lots of garbage in the DQ2 area. Wei has a good idea how to
clean it up but want confirmation from Wensheng.
2. ATLAS Software Area
Two occasions when updating the software installation area it
failed in an unrecoverable fashion. Will request more space for
releases and update one while the other is in production. Also put
pressure on pacman development to prevent such occurrences.
3. Tier-2 Purchases
Getting estimates together just now. Will not get the 6140 Disk
arrays this time. Will get 3511s instead which already have
experience with. This is slightly more expensive due to the way the
disks get attached to servers but should work out more-or-less the
same number of TB.
Will buy (by cost) 2/3 CPU and 1/3 storage. Hope to get in the
region of 64TB storage.
Optimistically hope to have the order out by the end of the year.
Power remains a major issue. Having the machines before the
beginning of the new year wouldn't make sense due to that. Perhaps
there will be a committee to allocate power within the computing
centre based on Scientific Priorities. Had negotiated to get some
space outside building 50, having completed these the owner of the
space decided due to space shortages needed to postpone any
decisions. Expect to shutdown video conferencing facilities in
SCCS. There is a new transformer and things from Boris, things are
looking up. They do have a long lead time. Tier-2 agreement would
help ensure it gets priority.
4. AOB
- ATLAS Tier-2 Meeting at ANL
https://www.usatlas.bnl.gov/twiki/bin/view/Admins/Tier2Workshop8Dec06
Need to decide if Andy should be there.
- xrootd (Andy present for this)
Current ATLAS copies data from a SE to the local disk on the
Worker Nodes. It then also copies back output to SE. Believe that
the network capacity wouldn't cope with this type of model. If
1000 jobs all try to transfer 100s of MB files things will get
clogged. Depending on the "ingest" rate doing a network read is
probably the best bet. Another benefit is if things go wrong you
don't leave a bunch of turd files around. Andy would recommend
adopting a BaBar model where you transfer files to local storage
and then start the jobs that read from an xrootd pool.
Would like to get at SLAC all the AOD files local to SLAC. Would
put them in HPSS and have xrootd serve them to the batch system.
A copy eats data a lot faster than any physics
application. Worried about network meltdown in an uncontrolled
environment. Servers are attached with 1GB and there is only a
10GB backbone...
This is of course a Tier-1 type issue rather than Tier-2. Do need
something better than an NFS server. With NFS there is limit on
the rate and you are limited to only one server.
Would be also an opportunity to provide a simple recipe for Tier-2
sites for running a storage system compared with dCache (you
don't need databases, backups etc for xrootd). Also provides a
very easy way of getting data out of mass storage if you have
one without anyone noticing if servers or disks die.
Have ATLAS requirements on how to copy data back from the batch
nodes. Can either use xrdcp or write directly into xrootd. DQ2
uses srmcp or gsiftp to copy things back and forth. Could run a
gsiftp server but it doesn't know how to deal with multiple file
systems. Can use LBL srm for a single xrootd server as an
interface. Can attempt to get an SRM that can plug into xrootd,
have been trying to use the Fermilab one. Have been trying but no
success yet. Fabrizio is coming to SLAC next week (or the one
after) for two weeks so he can perhaps work on it then.
Currently ATLAS is using gsiftp and SRM. FTS is used to schedule
the transfers with these type types of services. The current
version of SRM has some shortcomings and tests are starting to be
done with SRM 2.2. If there are more than one servers then need
to use SRM.
If there are transfers into one xrootd cluster for import/export
and separate one for serving the batch system need to work about
the time to migrate to and from the batch system. Are jobs
scheduled very close to the data transfers? Should aim to get an
SRM interface for xrootd. An issue has been getting the SRM java
stuff working. Can hopefully resolve this when Fabrizio is at
SLAC.
Can use Grid certificates to authenticate to xrootd.
No one has been talking to Andy about ATLAS usage.
Would like to get other people interested in using xrootd.
Action Items:
-------------
061129 Stephen Email Andy about "ingest" rates
061129 Wei Email other Tier-2s about xrootd
061129 Stephen Request more release area AFS space
061122 Wei Attempt to implement security recommendations
061115 Wei Add monitoring disk space for DQ2 to Ranger.
061122 Not done yet. Also need to monitor the GUMS server.
061108 Richard Discuss with SLAC Security longterm approach to ATLAS VO
061115 No information.
061101 Richard/Bill Convene advisory group regarding CPU/disk split.
061108 Have emailed Gordon. To be done.
061115 Gordon and Bill will meet today to discuss it.
061025 Stephen Check web server approval status
061101 Have opened up that hornet's nest.
061108 No news for the last week. Need to keep the discussion
going.
061115 Teresa trying to get web team together for a meeting
next week.
061122 Everyone at the meeting except Stephen went. They want
to know if we are up to date with security patches. BobC
would like us to use an IP range for protection in MySQL
instead of the domain name. Also not sure if the web
server interface needs read/write or just read
access. Could set the privilege of the web server to
make security tighter, also by putting it on a different
machine. Believe the Site Services python scripts update
the TiersOfAtlasCache.py file. There was a general
unhappiness of how security was handled. Folk are going
to take this message to the Tier-2 meeting next
month. The verdict was allowed under protest. Recognised
this is an important commitment but need to work with
authors
to improve system.
061018 Wei Test gridftp with xrootd federation
061025 Probably not very soon, but should be on the agenda.
061101 No change.
061108 Will remove from Agenda and leave as action item.
061115 Nothing done yet.
061122 Nothing happened yet. Discussion with Wilko about
setting up an xrootd machine for testing with ATLAS data
transfers. With SRM need to have all xrootd machines
exposed to the Internet, so not a great solution.
061004 Randy Find out about xrootd for ATLAS plans
061018 no news
061025 No information yet. Andy probably knows something.
061101 No info.
061108 Need to get Andy involved in this discussion. Might be
useful for Andy to go to the December meeting in
Arlington.
061115 Will attempt to get Andy on the phone next week.
061122 Didn't actually happen. Next time?
|