Hi,
We finished our initial testing of 8-core machines at DZERO for the Level-3 trigger/DAQ (these are configured as a large farm running a software-only trigger). Unfortunately, before we could run careful memory usage tests we had to return the tests. The node had 8 cores and 8 gigs of memory (x64, obviously).
We did have to make one change to the linux kernel (easy): /proc/sys/kernel/msgmni -- up that to 64 message queues. But I doubt normal usage will have to deal with that -- we use these message queue's to coordinate the movement of data through the system. We don't have exactly 8 independent executables. Rather we have an event builder in each node which sends fully built events to one of 8 trigger executables. These message queues do the coordination.
As far as the simple memory pressure tests we did do, with 7 of the images running there didn't seem to be much of a problem. Certainly it was not the bottle neck.
Sorry this isn't more helpful. But at least for the DØ Trigger we've not hit any limits.
Cheers,
Gordon.
P.S. I remember the VAX where we talked about never _ever_ using up all the address space that 32 bit machine had. Ha!
> -----Original Message-----
> From: [log in to unmask] [mailto:owner-
> [log in to unmask]] On Behalf Of Stephen J. Gowdy
> Sent: Wednesday, August 15, 2007 7:07 PM
> To: ATLAS SCCS Planning
> Subject: Minutes of ATLAS/SCCS Planning Meeting 15th August 2007
>
> ATLAS SCCS Planning 15Aug2007
> -----------------------------
>
> 9am, SCCS Conf Rm A, to call in +1 510 665 5437, press 1, 3935#
>
> Present: Wei, Stephen, Len, Richard, Chuck, Booker
>
> Agenda:
>
> 1. DQ2 Status/Web Proxy
>
> BNL is reallocating machines just now so have shutdown. Also
> problems at CERN with central DQ2 services. Things at SLAC are
> running fine.
>
> 2. Tier-2 Hardware
>
> Every site should have the new hardware up and running by 1st
> April. Several sites had different kinds of estimates, SLAC for
> instance assumed the 2008 commitments were needed at the end of
> 2007. Thing that JimS took 20% off the contributions suggested from
> the sites.
>
> The longer we wait the better value we can get but it will also
> mean we risk loosing the rack space where they would go. Right now
> PPA is deciding who uses what space. We know just now where it
> would go but in six months. Having had a bit of trouble this year
> with having the money but no equipment thing we should think about
> buying earlier than later.
>
> There is the water cooled racks and some space on the second floor
> (need to worry a little about weight). After that it isn't clear
> where we'll have space. It might all be occupied by PII, BSD,
> etc. It is first come first served with the option that anyone can
> go to the management to get that changed. It will dawn on everyone
> that we run out of space at the end of the year more folk will
> (need to) go that route.
>
> The BaBar purchase needs to be up and running by the 1st of the
> year. This needs a new Black Box. The purchase is being determined
> based on the total power that can be put in it. Not likely to be
> the fastest and hottest CPUs you can get. Might get more CPU/$$
> going down a notch or two for the power limit of Black Box. In the
> water cooled racks you need to get the most expensive CPU to get
> the most out of them. It is something like $500k for eight racks
> (of around 40U). This will take around $4M in computing
> power. Normally to a outside contract you add 60% without CEF
> actually doing anything, with about $350k going to the
> contractor. The Black Box comes out something like $3k rack unit in
> black box, but you are limited by power.
>
> The infrastructure is getting more expensive than the machines. A
> white paper from Stanford shows this and also the electrical bill
> getting more expensive than the hardware in the future. Have also
> had in the past that people dominated 2:1 in the past but this
> trend is reversing.
>
> Thinking of building some terraces into the hillside around
> IR8. Try to do it as low cost but well planned as possible. Looking
> out three years ahead. This will provide facilities to SLAC and
> Stanford. Need to somehow survive till then, perhaps more water
> cooled racks but the chilled water in Building 50 is also at
> capacity. Black Boxes also can't be added in the same place as need
> separate power and cooling provided. The industry might learn as
> people can buy less hardware as they are paying more for
> infrastructure that they need to make cooler stuff.
>
> So our purchase should probably be on the same timescale as the
> BaBar one. They are in a hurry though. We should keep open that we
> use the same purchasing and evaluation effort. Perhaps the ATLAS
> stuff gets the same hardware with faster CPUs in it. So will
> probably tie these together.
>
> 3. AOB
>
> - SLAC ATLAS Group Allocation
>
> Came to Richard's knowledge that currently the SLAC ATLAS group
> has a special fairshare (the group in question isn't actually for
> the SLAC ATLAS group, but for any ATLAS users wanting to use the
> Tier-2 in batch mode) on the Tier-2. Can not run a Tier-2 and
> give your local users special access. There should be lab funded
> machines for general use for PPA employees. Will put a large
> fraction of Black Box #1 into this. Then all groups will get
> their what they expect.
>
> Need to try to keep separate "local" users and general ATLAS. The
> "local" is the informal consortium of universities that supports
> the Tier-2. Need to be able to support analysis at the Tier-2 by
> giving them enough cycles etc. For local analysis activities can
> only let in folk in the Western Tier-2 Consortium. Many people
> are trying to do their own thing at their institution so the load
> may not be as high as expected.
>
> As a local person you could use the Tier-2 as a general ATLAS
> person or the Tier-3 as a local user. The same issue is coming up
> at CERN for local analysis here.
>
> Technically it isn't difficult to setup another LSF group. Not
> quite sure how to separate disk space usage though. Lab
> management has not been asked about an AllUsers disk pool. If
> there will be heavy use need some disk space funded as cannot use
> the Tier-2 disk space for local usage only. It is thought that
> the 20% reduction by JimS was for US usage. Trying to serve a
> diffuse community with a storage area isn't easy, best to have
> well identified set of users.
>
> Something that works for a production activity with one master
> won't necessarily work in a more chaotic usage. BaBar is looking
> at using the local disk as a temporary storage area for skimming,
> which could be there for many hours. Could end up having more
> storage on the nodes due to the more-or-less minimum size disks
> coming with machines these days.
>
> - xrootd
>
> Some issues with redirector and ATLAS software, not sure where
> the problem is.
>
> Wei things there may also be some problems with 64 bit but there
> are some other issues Andy should address.
>
> Action Items:
> -------------
>
> 070815 Wei Thing about how we maintain lists of local people etc
>
> 070801 Stephen See when new Tier-2 hardware is needed
> 070815 Received answers independently.
>
> 070725 Stephen Try to test eval01
> 070801 Didn't have access when attempted, Booked fixed that.
> Problem with ATLAS software (hopefully trivial).
> 070815 Installed new software to get around the problem. Not
> tested yet. Have order 128 cousins for it, have bought
> the machine.
>
> 070711 Stephen Find out about benchmarks for CPUs for next
> purchase
> 070718 Not done yet.
> 070801 Extracted data from our Production Database, need to
> analyse it still.
> 070815 Not done yet.
|