

[Please remember to send me updates for action items]

ATLAS SCCS Planning 22Nov2006

  9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#

Present: Richard, Charlie, Wei, John, Chuck, Bill, Randy, Stephen, AndyH


1. DQ2 Status/Web Proxy

    Have lots of garbage in the DQ2 area. Wei has a good idea how to
    clean it up but want confirmation from Wensheng.

2. ATLAS Software Area

    Two occasions when updating the software installation area it
    failed in an unrecoverable fashion. Will request more space for
    releases and update one while the other is in production. Also put
    pressure on pacman development to prevent such occurrences.

3. Tier-2 Purchases

    Getting estimates together just now. Will not get the 6140 Disk
    arrays this time. Will get 3511s instead which already have
    experience with. This is slightly more expensive due to the way the
    disks get attached to servers but should work out more-or-less the
    same number of TB.

    Will buy (by cost) 2/3 CPU and 1/3 storage. Hope to get in the
    region of 64TB storage.

    Optimistically hope to have the order out by the end of the year.

    Power remains a major issue. Having the machines before the
    beginning of the new year wouldn't make sense due to that. Perhaps
    there will be a committee to allocate power within the computing
    centre based on Scientific Priorities. Had negotiated to get some
    space outside building 50, having completed these the owner of the
    space decided due to space shortages needed to postpone any
    decisions. Expect to shutdown video conferencing facilities in
    SCCS. There is a new transformer and things from Boris, things are
    looking up. They do have a long lead time. Tier-2 agreement would
    help ensure it gets priority.

4. AOB

    - ATLAS Tier-2 Meeting at ANL
      Need to decide if Andy should be there.

    - xrootd (Andy present for this)

      Current ATLAS copies data from a SE to the local disk on the
      Worker Nodes. It then also copies back output to SE. Believe that
      the network capacity wouldn't cope with this type of model. If
      1000 jobs all try to transfer 100s of MB files things will get
      clogged. Depending on the "ingest" rate doing a network read is
      probably the best bet. Another benefit is if things go wrong you
      don't leave a bunch of turd files around. Andy would recommend
      adopting a BaBar model where you transfer files to local storage
      and then start the jobs that read from an xrootd pool.

      Would like to get at SLAC all the AOD files local to SLAC. Would
      put them in HPSS and have xrootd serve them to the batch system.

      A copy eats data a lot faster than any physics
      application. Worried about network meltdown in an uncontrolled
      environment. Servers are attached with 1GB and there is only a
      10GB backbone...

      This is of course a Tier-1 type issue rather than Tier-2. Do need
      something better than an NFS server. With NFS there is limit on
      the rate and you are limited to only one server.

      Would be also an opportunity to provide a simple recipe for Tier-2
      sites for running a storage system compared with dCache (you
      don't need databases, backups etc for xrootd). Also provides a
      very easy way of getting data out of mass storage if you have
      one without anyone noticing if servers or disks die.

      Have ATLAS requirements on how to copy data back from the batch
      nodes. Can either use xrdcp or write directly into xrootd. DQ2
      uses srmcp or gsiftp to copy things back and forth. Could run a
      gsiftp server but it doesn't know how to deal with multiple file
      systems. Can use LBL srm for a single xrootd server as an
      interface. Can attempt to get an SRM that can plug into xrootd,
      have been trying to use the Fermilab one. Have been trying but no
      success yet. Fabrizio is coming to SLAC next week (or the one
      after) for two weeks so he can perhaps work on it then.

      Currently ATLAS is using gsiftp and SRM. FTS is used to schedule
      the transfers with these type types of services. The current
      version of SRM has some shortcomings and tests are starting to be
      done with SRM 2.2. If there are more than one servers then need
      to use SRM.

      If there are transfers into one xrootd cluster for import/export
      and separate one for serving the batch system need to work about
      the time to migrate to and from the batch system. Are jobs
      scheduled very close to the data transfers? Should aim to get an
      SRM interface for xrootd. An issue has been getting the SRM java
      stuff working. Can hopefully resolve this when Fabrizio is at

      Can use Grid certificates to authenticate to xrootd.

      No one has been talking to Andy about ATLAS usage.

      Would like to get other people interested in using xrootd.

Action Items:

061129 Stephen	Email Andy about "ingest" rates

061129 Wei	Email other Tier-2s about xrootd

061129 Stephen	Request more release area AFS space

061122 Wei	Attempt to implement security recommendations

061115 Wei	Add monitoring disk space for DQ2 to Ranger.
        061122 Not done yet. Also need to monitor the GUMS server.

061108 Richard	Discuss with SLAC Security longterm approach to ATLAS VO
        061115 No information.

061101 Richard/Bill Convene advisory group regarding CPU/disk split.
        061108 Have emailed Gordon. To be done.
        061115 Gordon and Bill will meet today to discuss it.

061025 Stephen	Check web server approval status
        061101 Have opened up that hornet's nest.
        061108 No news for the last week. Need to keep the discussion 
        061115 Teresa trying to get web team together for a meeting
               next week.
        061122 Everyone at the meeting except Stephen went. They want
 	      to know if we are up to date with security patches. BobC
 	      would like us to use an IP range for protection in MySQL
 	      instead of the domain name. Also not sure if the web
 	      server interface needs read/write or just read
 	      access. Could set the privilege of the web server to
 	      make security tighter, also by putting it on a different
 	      machine. Believe the Site Services python scripts update
 	      the file. There was a general
 	      unhappiness of how security was handled. Folk are going
 	      to take this message to the Tier-2 meeting next
 	      month. The verdict was allowed under protest. Recognised
 	      this is an important commitment but need to work with 
 	      to improve system.

061018 Wei	Test gridftp with xrootd federation
        061025 Probably not very soon, but should be on the agenda.
        061101 No change.
        061108 Will remove from Agenda and leave as action item.
        061115 Nothing done yet.
        061122 Nothing happened yet. Discussion with Wilko about
 	      setting up an xrootd machine for testing with ATLAS data
 	      transfers. With SRM need to have all xrootd machines
 	      exposed to the Internet, so not a great solution.

061004 Randy	Find out about xrootd for ATLAS plans
        061018 no news
        061025 No information yet. Andy probably knows something.
        061101 No info.
        061108 Need to get Andy involved in this discussion. Might be
 	      useful for Andy to go to the December meeting in
        061115 Will attempt to get Andy on the phone next week.
        061122 Didn't actually happen. Next time?