On Fri, 18 May 2018, Andrew Hanushevsky wrote:
> From my reading Lustre blogs, the consensus is that is degrades performance
> of the metadata server and unless you have an overpworing reason to enable
> extended attributes, you should disbale them. I suspect that was the reason.
Yes, that what I also suspect. We (LCLS) also have it turned off (but I
can turn it on).
Cheers,
Wilko
> Andy
>
> On Fri, 18 May 2018, Wilko Kroeger wrote:
>
>>
>> Hello Wei
>>
>> I think the issue is that Lustre is mounted with no xattr support
>> (user_xattr). Maybe one could ask nersc if they would be willing to do so,
>> at least on the dtn's. There might be some reason why they don't do it.
>>
>> Cheers,
>> Wilko
>>
>>
>> On Fri, 18 May 2018, Yang, Wei wrote:
>>
>>> Hi Andy,
>>>
>>> In order to use NERSC Lustre, I following your suggestion to add noxattr
>>> to all.export (at least that is what XrdOucConfig.cc says), but I got:
>>>
>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
>>> =====> all.export /atlas/rucio stage r/o noxattr
>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
>>> =====> all.export /root:/ stage r/o noxattr
>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
>>> =====> all.export /xroot:/ stage r/o noxattr
>>> ...
>>> 180518 10:56:37 6240 XrootdXeq: yangw.20594:27@cori09-224 pub IPv4 login
>>> 180518 10:56:37 6240 XrdFileCache_Manager: info Cache::Attach()
>>> root://27@localfile:1094//atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip
>>> 180518 10:56:37 6240 ofs_FAttr: Unable to set attr XrdFrm.Pfn from
>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/00/F010FF5A5A18000000008037cd1500000000404%;
>>> operation not supported
>>> 180518 10:56:37 6240 XrdFileCache_File: error File::Open() Create failed
>>> for data file
>>> /atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip,
>>> err=Operation not supported
>>> /atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip
>>> 180518 10:56:39 6240 XrdFileCache_IO: info IOEntireFile::Detach()
>>> 0x1e19300
>>> 180518 10:56:39 6240 XrootdXeq: yangw.20594:27@cori09-224 disc 0:00:02
>>>
>>> And again, file is not cache. So I switched back to use GPFS.
>>>
>>> --
>>> Wei Yang | [log in to unmask] | 650-926-3338 (O)
>>>
>>>
>>> On 5/17/18, 10:20 PM, "Yang, Wei" <[log in to unmask]> wrote:
>>>
>>> The log is at
>>> /global/project/projectdirs/atlas/xcache/cache/dtn04.nersc.gov/xrd/var/log/xrootd.log.
>>> Nothing interesting there.
>>>
>>> I met the same hanging issue on that file. RUCIO returns a long list of
>>> data sources in metalink. I know that the 1st data source was Univ. of
>>> Victory which does not work (even from SLAC). I manually changed the 1st
>>> data source to Univ. of Chicago but it still hung. So I attached gdb and
>>> tries a few other files, all works. But when I quite gdb, I see this:
>>>
>>> (gdb) c
>>> Continuing.
>>> [New Thread 0x7f12cd764780 (LWP 12188)]
>>>
>>> Program received signal SIGUSR1, User defined signal 1.
>>> [Switching to Thread 0x7f12cd74a780 (LWP 8916)]
>>> 0x00007f12cc95879b in do_futex_wait.constprop.1 () from
>>> /lib64/libpthread.so.0
>>> (gdb) c
>>> Continuing.
>>> [Thread 0x7f12cd764780 (LWP 12188) exited]
>>> [New Thread 0x7f12cd75c780 (LWP 12271)]
>>>
>>> Program received signal SIGUSR1, User defined signal 1.
>>> 0x00007f12cc95879b in do_futex_wait.constprop.1 () from
>>> /lib64/libpthread.so.0
>>> (gdb) quit
>>> A debugging session is active.
>>>
>>> Inferior 1 [process 8916] will be detached.
>>>
>>> Quit anyway? (y or n) y
>>> Detaching from program:
>>> /global/project/projectdirs/atlas/xcache/test/git/xrdbld/src/xrootd,
>>> process 8916
>>>
>>> I don't know where is this SIGUSR1 coming from, and I quite anyway.
>>> But then that hanging file started working! At this point I don't know
>>> what is going on. The same xcache instance is still running. I will keep
>>> an eye on it.
>>>
>>> --
>>> Wei Yang | [log in to unmask] | 650-926-3338(O)
>>>
>>> ?-----Original Message-----
>>> From: Vakho Tsulaia <[log in to unmask]>
>>> Date: Thursday, May 17, 2018 at 9:51 PM
>>> To: Andrew Hanushevsky <[log in to unmask]>
>>> Cc: Wei Yang <[log in to unmask]>, Zachary Marshall
>>> <[log in to unmask]>, Paolo Calafiura <[log in to unmask]>
>>> Subject: Re: Interest in an LBNL project
>>>
>>> Hi Andy,
>>>
>>> > Could you send the xrootd log from dtn04 (I really should get a
>>> NERSC
>>> account).
>>>
>>> I don't know how to get this log. Perhaps Wei can help?
>>>
>>> -- vakho
>>>
>>>
>>> On 05/17/2018 05:06 PM, Andrew Hanushevsky wrote:
>>> > Hi Vakho,
>>> >
>>> > Something happened at the server on dtn04 and it thinks it
>>> doesn't
>>> > have access to the file but will some time in he future. So, it
>>> is
>>> > waiting for the future to arrive and stalling the client until
>>> then.
>>> > Could you send the xrootd log from dtn04 (I really should get a
>>> NERSC
>>> > account).
>>> >
>>> > Andy
>>> >
>>> > On Thu, 17 May 2018, Vakho Tsulaia wrote:
>>> >
>>> >> Hi Wei,
>>> >>
>>> >>> Take the first one as an example:
>>> >>>
>>> >>> yangw@cori02 $ ~yangw/bin/xrdcp -f
>>> >>>
>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
>>> >>> /dev/null
>>> >>>
>>> [213.7MB/213.7MB][100%][==================================================][7.914MB/s]
>>> >>>
>>> >>> yangw@cori02 $ ~yangw/bin/xrdcp -f
>>> >>>
>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
>>> >>> /dev/null
>>> >>>
>>> [213.7MB/213.7MB][100%][==================================================][213.7MB/s]
>>> >>>
>>> >> Yesterday I played around with it from a Shifter container. I
>>> >> successfully downloaded several EVNT files using the commands
>>> >> as the following one (for example):
>>> >>
>>> >> xrdcp -f
>>> >>
>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
>>> >> EVNT.13836203._000001.pool.root.1
>>> >>
>>> >> But then at some point this command stopped working for me, it
>>> was
>>> >> hanging forever with no response. So I reran it with
>>> >> '-d 3' and the thing started to generate a log which looked like
>>> an
>>> >> infinite loop. At some point I killed it and saved the log
>>> (attached).
>>> >>
>>> >> Could you please have a look at it and tell me what's going on
>>> there?
>>> >>
>>> >> Thanks,
>>> >> -- vakho
>>> >>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XCACHE-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
>>>
>>
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
|