Hi, We finished our initial testing of 8-core machines at DZERO for the Level-3 trigger/DAQ (these are configured as a large farm running a software-only trigger). Unfortunately, before we could run careful memory usage tests we had to return the tests. The node had 8 cores and 8 gigs of memory (x64, obviously). We did have to make one change to the linux kernel (easy): /proc/sys/kernel/msgmni -- up that to 64 message queues. But I doubt normal usage will have to deal with that -- we use these message queue's to coordinate the movement of data through the system. We don't have exactly 8 independent executables. Rather we have an event builder in each node which sends fully built events to one of 8 trigger executables. These message queues do the coordination. As far as the simple memory pressure tests we did do, with 7 of the images running there didn't seem to be much of a problem. Certainly it was not the bottle neck. Sorry this isn't more helpful. But at least for the DØ Trigger we've not hit any limits. Cheers, Gordon. P.S. I remember the VAX where we talked about never _ever_ using up all the address space that 32 bit machine had. Ha! > -----Original Message----- > From: [log in to unmask] [mailto:owner- > [log in to unmask]] On Behalf Of Stephen J. Gowdy > Sent: Wednesday, August 15, 2007 7:07 PM > To: ATLAS SCCS Planning > Subject: Minutes of ATLAS/SCCS Planning Meeting 15th August 2007 > > ATLAS SCCS Planning 15Aug2007 > ----------------------------- > > 9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935# > > Present: Wei, Stephen, Len, Richard, Chuck, Booker > > Agenda: > > 1. DQ2 Status/Web Proxy > > BNL is reallocating machines just now so have shutdown. Also > problems at CERN with central DQ2 services. Things at SLAC are > running fine. > > 2. Tier-2 Hardware > > Every site should have the new hardware up and running by 1st > April. Several sites had different kinds of estimates, SLAC for > instance assumed the 2008 commitments were needed at the end of > 2007. Thing that JimS took 20% off the contributions suggested from > the sites. > > The longer we wait the better value we can get but it will also > mean we risk loosing the rack space where they would go. Right now > PPA is deciding who uses what space. We know just now where it > would go but in six months. Having had a bit of trouble this year > with having the money but no equipment thing we should think about > buying earlier than later. > > There is the water cooled racks and some space on the second floor > (need to worry a little about weight). After that it isn't clear > where we'll have space. It might all be occupied by PII, BSD, > etc. It is first come first served with the option that anyone can > go to the management to get that changed. It will dawn on everyone > that we run out of space at the end of the year more folk will > (need to) go that route. > > The BaBar purchase needs to be up and running by the 1st of the > year. This needs a new Black Box. The purchase is being determined > based on the total power that can be put in it. Not likely to be > the fastest and hottest CPUs you can get. Might get more CPU/$$ > going down a notch or two for the power limit of Black Box. In the > water cooled racks you need to get the most expensive CPU to get > the most out of them. It is something like $500k for eight racks > (of around 40U). This will take around $4M in computing > power. Normally to a outside contract you add 60% without CEF > actually doing anything, with about $350k going to the > contractor. The Black Box comes out something like $3k rack unit in > black box, but you are limited by power. > > The infrastructure is getting more expensive than the machines. A > white paper from Stanford shows this and also the electrical bill > getting more expensive than the hardware in the future. Have also > had in the past that people dominated 2:1 in the past but this > trend is reversing. > > Thinking of building some terraces into the hillside around > IR8. Try to do it as low cost but well planned as possible. Looking > out three years ahead. This will provide facilities to SLAC and > Stanford. Need to somehow survive till then, perhaps more water > cooled racks but the chilled water in Building 50 is also at > capacity. Black Boxes also can't be added in the same place as need > separate power and cooling provided. The industry might learn as > people can buy less hardware as they are paying more for > infrastructure that they need to make cooler stuff. > > So our purchase should probably be on the same timescale as the > BaBar one. They are in a hurry though. We should keep open that we > use the same purchasing and evaluation effort. Perhaps the ATLAS > stuff gets the same hardware with faster CPUs in it. So will > probably tie these together. > > 3. AOB > > - SLAC ATLAS Group Allocation > > Came to Richard's knowledge that currently the SLAC ATLAS group > has a special fairshare (the group in question isn't actually for > the SLAC ATLAS group, but for any ATLAS users wanting to use the > Tier-2 in batch mode) on the Tier-2. Can not run a Tier-2 and > give your local users special access. There should be lab funded > machines for general use for PPA employees. Will put a large > fraction of Black Box #1 into this. Then all groups will get > their what they expect. > > Need to try to keep separate "local" users and general ATLAS. The > "local" is the informal consortium of universities that supports > the Tier-2. Need to be able to support analysis at the Tier-2 by > giving them enough cycles etc. For local analysis activities can > only let in folk in the Western Tier-2 Consortium. Many people > are trying to do their own thing at their institution so the load > may not be as high as expected. > > As a local person you could use the Tier-2 as a general ATLAS > person or the Tier-3 as a local user. The same issue is coming up > at CERN for local analysis here. > > Technically it isn't difficult to setup another LSF group. Not > quite sure how to separate disk space usage though. Lab > management has not been asked about an AllUsers disk pool. If > there will be heavy use need some disk space funded as cannot use > the Tier-2 disk space for local usage only. It is thought that > the 20% reduction by JimS was for US usage. Trying to serve a > diffuse community with a storage area isn't easy, best to have > well identified set of users. > > Something that works for a production activity with one master > won't necessarily work in a more chaotic usage. BaBar is looking > at using the local disk as a temporary storage area for skimming, > which could be there for many hours. Could end up having more > storage on the nodes due to the more-or-less minimum size disks > coming with machines these days. > > - xrootd > > Some issues with redirector and ATLAS software, not sure where > the problem is. > > Wei things there may also be some problems with 64 bit but there > are some other issues Andy should address. > > Action Items: > ------------- > > 070815 Wei Thing about how we maintain lists of local people etc > > 070801 Stephen See when new Tier-2 hardware is needed > 070815 Received answers independently. > > 070725 Stephen Try to test eval01 > 070801 Didn't have access when attempted, Booked fixed that. > Problem with ATLAS software (hopefully trivial). > 070815 Installed new software to get around the problem. Not > tested yet. Have order 128 cousins for it, have bought > the machine. > > 070711 Stephen Find out about benchmarks for CPUs for next > purchase > 070718 Not done yet. > 070801 Extracted data from our Production Database, need to > analyse it still. > 070815 Not done yet.