Print

Print


I replaced the root version to the one that is recommended by the ATLSA T3 proof working group. It includes the PQ2 tools.

export ROOTSYS=/afs/slac.stanford.edu/g/atlas/packages/root/root5.26.00-proof-slc5_amd64-gcc43

regards,
Wei Yang  |  [log in to unmask]  |  650-926-3338(O)


On Jun 9, 2010, at 5:18 PM, Yang, Wei wrote:

> Hi Bart,
> 
> I made another attempt. Here is what I used to start at client side (assuming bash) on a rhel5-64 machine
> 
> . /afs/slac/g/atlas/packages/gcc432/setup.sh
> export ROOTSYS=/afs/slac.stanford.edu/g/atlas/packages/root/root5.26.00b-slc5_amd64-gcc43
> export PATH=${PATH}:/$ROOTSYS/bin
> export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ROOTSYS/lib
> $ROOTSYS/bin/root
> 
> It seems I was able to load a .par file.  Can you give it a try? Also, remember on atlint01, if you copy a file to /xrootd/proof/bcbutler, you should TDset::Add("root://boer0123//atlas/proof/bcbulter/..."). However, I found that reading from T2 storage seems to be faster than reading from the disks in the proof cluster (without localizer).
> 
> regards,
> Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
> 
> 
> On Apr 29, 2010, at 3:25 PM, Bart Butler wrote:
> 
>> First thing first, I think I killed your cluster. The xrootd mount is no longer readable from atlint01 and I can't submit PROOF jobs to it anymore. This happened after killing my client root session manually after a massively-screwed up job.
>> 
>> Secondly, I am have a hell of a time compiling my shared library correctly. Which version of ROOT is the cluster running? If I'm not running the exact same root version and gcc version as every worker node, I can't make binaries (which is what Booker did with his test package it seems. I do it too when I run PROOF-Lite). And if I can't make binaries, I have to submit source packages. This should be fine but it's never worked well for me. My first theory was that because the packages are kept in a common place on xrootd in my user space, the compilation errors I was getting from some workers were because all 32 (I was never able to connect to 4 of the 36 workers) tried to compile the package at the same time in the same place. Running on a single worker worked fine (but of course was slow). I don't think this compilation issue was the whole story though, because if the single worker thing worked, the next time all workers should have been able to load the compiled version without problems assuming they are all running the same version of ROOT, and they crashed and burned just as badly that time. That's when the cluster itself crashed.
>> 
>> Another thing was that making TDSets from the Tier 2 xrootd storage worked fine, but when I tried using the same files I had copied to the cluster xrootd storage it couldn't find them for some reason.
>> 
>> My log files should be in /xrootd/proof/bcbutler if you guys get the cluster working again.
>> 
>> -Bart
>> 
>> 
>> Yang, Wei wrote:
>>> Hi Bart, David,
>>> 
>>> any news on this?
>>> 
>>> regards,
>>> Wei Yang  |  
>>> [log in to unmask]
>>>  |  650-926-3338(O)
>>> 
>>> 
>>> On Apr 21, 2010, at 12:03 PM, Bart Butler wrote:
>>> 
>>> 
>>> 
>>>> I'll try to run a few jobs tonight and see what happens.
>>>> 
>>>> -Bart
>>>> 
>>>> Yang, Wei wrote:
>>>> 
>>>> 
>>>>> [add Andy Hass ...]
>>>>> 
>>>>> Hi David, Booker,
>>>>> 
>>>>> I mounted the xrootd space of the proof cluster at /xrootd/proof on atlint01.  It looks like we have ~1.8TB total on the cluster. So something ~ 1TB should work.
>>>>> 
>>>>> The cluster should be able to access T2 storage if your provide the URL of those root file to process. But the whole idea of using proof is to avoid network traffic as much as possible. As we are still validation the functions, it would be good to try both. Or if you put half of the data on proof cluster, and leave the other half on T2 storage (no NFS please). 
>>>>> 
>>>>> The proof master node is boer0123. If you copy files to the cluster, the xroot URL is root://boer0123//atlas/proof (I suggest you to create a fizisist sub-dir). 
>>>>> 
>>>>> Booker, it looks like proof also leaves some file in the cluster. How would you suggest to manage the space, by user, by group, or something else?
>>>>> 
>>>>> regards,
>>>>> Wei Yang  |  
>>>>> 
>>>>> [log in to unmask]
>>>>> 
>>>>>  |  650-926-3338(O)
>>>>> 
>>>>> 
>>>>> On Apr 21, 2010, at 8:40 AM, David W. Miller wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> Hi Booker and Wei,
>>>>>> 
>>>>>> I have a few questions: from what machine do we launch the jobs? Any machine at SLAC, but specifying the URI correctly? Also, if the data are on atlasuserdisk or usr in /xrootd/atlas/ is that sufficient?
>>>>>> 
>>>>>> Thanks,
>>>>>> David
>>>>>> 
>>>>>> On Apr 21, 2010, at 17:36 PM, Ariel Schwartzman wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> From: Booker Bense <[log in to unmask]>
>>>>>>> 
>>>>>>> 
>>>>>>> Date: April 21, 2010 16:09:51 PM GMT+02:00
>>>>>>> To: "Schwartzman, Ariel G." 
>>>>>>> 
>>>>>>> <[log in to unmask]>
>>>>>>> 
>>>>>>> 
>>>>>>> Cc: "Yang, Wei" 
>>>>>>> 
>>>>>>> <[log in to unmask]>
>>>>>>> 
>>>>>>> 
>>>>>>> Subject: Re: Proof cluster ready for testing
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 21 Apr 2010, Ariel Schwartzman wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi Booker,
>>>>>>>> 
>>>>>>>> I cannot access this machine remotely:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> ssh -Y boer0123.slac.stanford.edu
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> ssh: connect to host boer0123.slac.stanford.edu port 22: Operation timed out
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> It's on the slac internal network, you'll need to login to a slac 
>>>>>>> machine and run root programs from there. You shouldn't need
>>>>>>> login access to the master node.
>>>>>>> 
>>>>>>> _ Booker C. Bense
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> ==========================================
>>>>>> David W. Miller
>>>>>> ------------------------------------------
>>>>>> SLAC
>>>>>> Stanford University
>>>>>> Department of Physics
>>>>>> 
>>>>>> SLAC Info: Building 84, B-156. Tel: +1.650.926.3730
>>>>>> CERN Info: Building 01, 1-041. Tel: +41.76.487.2484
>>>>>> 
>>>>>> EMAIL:    
>>>>>> 
>>>>>> [log in to unmask]
>>>>>> 
>>>>>> 
>>>>>> HOMEPAGE: 
>>>>>> 
>>>>>> http://cern.ch/David.W.Miller
>>>>>> 
>>>>>> 
>>>>>> ========================================== 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> 
>> 
>