LISTSERV mailing list manager LISTSERV 16.5

Help for XCACHE-L Archives


XCACHE-L Archives

XCACHE-L Archives


XCACHE-L@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

XCACHE-L Home

XCACHE-L Home

XCACHE-L  May 2018

XCACHE-L May 2018

Subject:

Re: Interest in an LBNL project

From:

"Yang, Wei" <[log in to unmask]>

Reply-To:

list for xcache development and deployment <[log in to unmask]>

Date:

Fri, 18 May 2018 20:31:04 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)

I think those HPC shared filesystems already got too much metadata IOs. It is unlikely that they want more.

--
Wei Yang  |  [log in to unmask]  |  650-926-3338 (O)
 

´╗┐On 5/18/18, 12:36 PM, "Andrew Hanushevsky" <[log in to unmask]> wrote:

    You can turn it on if you configured the metadata server to support it. 
    That can be problematic as you need to reserve space for the attributes. 
    Unfortunately, when you do that the metadata server has to read more from 
    disk than it would otherwise, even if you disabled it. So, some people 
    forgo reserving the space altogether which makes it impossible to enable 
    attributes after the fact. That's what I understand from the configuration 
    options.
    
    Andy
    
    On Fri, 18 May 2018, Wilko Kroeger wrote:
    
    >
    > On Fri, 18 May 2018, Andrew Hanushevsky wrote:
    >
    >> From my reading Lustre blogs, the consensus is that is degrades performance 
    >> of the metadata server and unless you have an overpworing reason to enable 
    >> extended attributes, you should disbale them. I suspect that was the 
    >> reason.
    >
    > Yes, that what I also suspect. We (LCLS) also have it turned off (but I can 
    > turn it on).
    >
    > Cheers,
    >  Wilko
    >
    >> Andy
    >> 
    >> On Fri, 18 May 2018, Wilko Kroeger wrote:
    >> 
    >>> 
    >>> Hello Wei
    >>> 
    >>> I think the issue is that Lustre is mounted with no xattr support 
    >>> (user_xattr). Maybe one could ask nersc if they would be willing to do so, 
    >>> at least on the dtn's. There might be some reason why they don't do it.
    >>> 
    >>> Cheers,
    >>>  Wilko
    >>> 
    >>> 
    >>> On Fri, 18 May 2018, Yang, Wei wrote:
    >>> 
    >>>> Hi Andy,
    >>>> 
    >>>> In order to use NERSC Lustre, I following your suggestion to add noxattr 
    >>>> to all.export (at least that is what XrdOucConfig.cc says), but I got:
    >>>> 
    >>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
    >>>> =====> all.export /atlas/rucio stage r/o noxattr
    >>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
    >>>> =====> all.export /root:/ stage r/o noxattr
    >>>> 180518 10:44:16 6234 proxy_Export: warning, invalid path option noxattr
    >>>> =====> all.export /xroot:/ stage r/o noxattr
    >>>> ...
    >>>> 180518 10:56:37 6240 XrootdXeq: yangw.20594:27@cori09-224 pub IPv4 login
    >>>> 180518 10:56:37 6240 XrdFileCache_Manager: info Cache::Attach() 
    >>>> root://27@localfile:1094//atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip
    >>>> 180518 10:56:37 6240 ofs_FAttr: Unable to set attr XrdFrm.Pfn from 
    >>>> /global/cscratch1/sd/yangw/xcache/dtn04.nersc.gov/xrd/datafiles/data/00/F010FF5A5A18000000008037cd1500000000404%; 
    >>>> operation not supported
    >>>> 180518 10:56:37 6240 XrdFileCache_File: error File::Open() Create failed 
    >>>> for data file 
    >>>> /atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip, 
    >>>> err=Operation not supported 
    >>>> /atlas/rucio/transient/44/f0/panda.HITS.14132300._000997.pool.root.1.14132300-3933394871-13456989456-468-8.zip
    >>>> 180518 10:56:39 6240 XrdFileCache_IO: info IOEntireFile::Detach() 
    >>>> 0x1e19300
    >>>> 180518 10:56:39 6240 XrootdXeq: yangw.20594:27@cori09-224 disc 0:00:02
    >>>> 
    >>>> And again, file is not cache. So I switched back to use GPFS.
    >>>> 
    >>>> --
    >>>> Wei Yang  |  [log in to unmask]  |  650-926-3338 (O)
    >>>> 
    >>>> 
    >>>> On 5/17/18, 10:20 PM, "Yang, Wei" <[log in to unmask]> wrote:
    >>>>
    >>>>    The log is at 
    >>>> /global/project/projectdirs/atlas/xcache/cache/dtn04.nersc.gov/xrd/var/log/xrootd.log. 
    >>>> Nothing interesting there.
    >>>>
    >>>>    I met the same hanging issue on that file. RUCIO returns a long list 
    >>>> of data sources in metalink. I know that the 1st data source was Univ. of 
    >>>> Victory which does not work (even from SLAC). I manually changed the 1st 
    >>>> data source to Univ. of Chicago but it still hung. So I attached gdb and 
    >>>> tries a few other files, all works. But when I quite gdb, I see this:
    >>>>
    >>>>    (gdb) c
    >>>>    Continuing.
    >>>>    [New Thread 0x7f12cd764780 (LWP 12188)]
    >>>>
    >>>>    Program received signal SIGUSR1, User defined signal 1.
    >>>>    [Switching to Thread 0x7f12cd74a780 (LWP 8916)]
    >>>>    0x00007f12cc95879b in do_futex_wait.constprop.1 () from 
    >>>> /lib64/libpthread.so.0
    >>>>    (gdb) c
    >>>>    Continuing.
    >>>>    [Thread 0x7f12cd764780 (LWP 12188) exited]
    >>>>    [New Thread 0x7f12cd75c780 (LWP 12271)]
    >>>>
    >>>>    Program received signal SIGUSR1, User defined signal 1.
    >>>>    0x00007f12cc95879b in do_futex_wait.constprop.1 () from 
    >>>> /lib64/libpthread.so.0
    >>>>    (gdb) quit
    >>>>    A debugging session is active.
    >>>>
    >>>>    	Inferior 1 [process 8916] will be detached.
    >>>>
    >>>>    Quit anyway? (y or n) y
    >>>>    Detaching from program: 
    >>>> /global/project/projectdirs/atlas/xcache/test/git/xrdbld/src/xrootd, 
    >>>> process 8916
    >>>>
    >>>>    I don't know where is this SIGUSR1 coming from, and I quite anyway. 
    >>>> But then that hanging file started working! At this point I don't know 
    >>>> what is going on. The same xcache instance is still running. I will keep 
    >>>> an eye on it.
    >>>>
    >>>>    --
    >>>>    Wei Yang  |  [log in to unmask]  |  650-926-3338(O)
    >>>>
    >>>>    ?-----Original Message-----
    >>>>    From: Vakho Tsulaia <[log in to unmask]>
    >>>>    Date: Thursday, May 17, 2018 at 9:51 PM
    >>>>    To: Andrew Hanushevsky <[log in to unmask]>
    >>>>    Cc: Wei Yang <[log in to unmask]>, Zachary Marshall 
    >>>> <[log in to unmask]>, Paolo Calafiura <[log in to unmask]>
    >>>>    Subject: Re: Interest in an LBNL project
    >>>>
    >>>>        Hi Andy,
    >>>>
    >>>>         > Could you send the xrootd log from dtn04 (I really should get a 
    >>>> NERSC
    >>>>        account).
    >>>>
    >>>>        I don't know how to get this log. Perhaps Wei can help?
    >>>>
    >>>>        -- vakho
    >>>> 
    >>>>
    >>>>        On 05/17/2018 05:06 PM, Andrew Hanushevsky wrote:
    >>>>        > Hi Vakho,
    >>>>        >
    >>>>        > Something happened at the server on dtn04 and it thinks it 
    >>>> doesn't
    >>>>        > have access to the file but will some time in he future. So, it 
    >>>> is
    >>>>        > waiting for the future to arrive and stalling the client until 
    >>>> then.
    >>>>        > Could you send the xrootd log from dtn04 (I really should get a 
    >>>> NERSC
    >>>>        > account).
    >>>>        >
    >>>>        > Andy
    >>>>        >
    >>>>        > On Thu, 17 May 2018, Vakho Tsulaia wrote:
    >>>>        >
    >>>>        >> Hi Wei,
    >>>>        >>
    >>>>        >>> Take the first one as an example:
    >>>>        >>>
    >>>>        >>> yangw@cori02 $ ~yangw/bin/xrdcp -f
    >>>>        >>> 
    >>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
    >>>>        >>> /dev/null
    >>>>        >>> 
    >>>> [213.7MB/213.7MB][100%][==================================================][7.914MB/s]
    >>>>        >>>
    >>>>        >>> yangw@cori02 $ ~yangw/bin/xrdcp -f
    >>>>        >>> 
    >>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
    >>>>        >>> /dev/null
    >>>>        >>> 
    >>>> [213.7MB/213.7MB][100%][==================================================][213.7MB/s]
    >>>>        >>>
    >>>>        >> Yesterday I played around with it from a Shifter container. I
    >>>>        >> successfully downloaded several EVNT files using the commands
    >>>>        >> as the following one (for example):
    >>>>        >>
    >>>>        >> xrdcp -f
    >>>>        >> 
    >>>> root://dtn04.nersc.gov//atlas/rucio/mc16_13TeV:EVNT.13836203._000001.pool.root.1
    >>>>        >> EVNT.13836203._000001.pool.root.1
    >>>>        >>
    >>>>        >> But then at some point this command stopped working for me, it 
    >>>> was
    >>>>        >> hanging forever with no response. So I reran it with
    >>>>        >> '-d 3' and the thing started to generate a log which looked 
    >>>> like an
    >>>>        >> infinite loop. At some point I killed it and saved the log 
    >>>> (attached).
    >>>>        >>
    >>>>        >> Could you please have a look at it and tell me what's going on 
    >>>> there?
    >>>>        >>
    >>>>        >> Thanks,
    >>>>        >> -- vakho
    >>>>        >>
    >>>> 
    >>>> 
    >>>> 
    >>>> 
    >>>> 
    >>>> 
    >>>> ########################################################################
    >>>> Use REPLY-ALL to reply to list
    >>>> 
    >>>> To unsubscribe from the XCACHE-L list, click the following link:
    >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1
    >>>> 
    >>> 
    >> 
    >
    


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XCACHE-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XCACHE-L&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2018
May 2018
April 2018
March 2018
February 2018

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use