Print

Print


It's a plain server with attached disks configured in a RAID-6. Is there a simple way to take the logical name and get the physical name on disk?

Thanks,
Lincoln

From: [log in to unmask] <[log in to unmask]> on behalf of Andrew Hanushevsky <[log in to unmask]>
Sent: Tuesday, September 7, 2021 2:49 PM
To: Lincoln Bryant <[log in to unmask]>
Cc: Yang, Wei <[log in to unmask]>; [log in to unmask] <[log in to unmask]>
Subject: Re: Can't remove missing file
 
Hi Lincoln,

What is /uct3-s1.mwt2.org? Is it an Xcache server, a plain server, a tape
backed server? Clearly, that server is having issues and it would be
intgersting to find out why it can't provide the file even though it
thinks it can. I say that because by delaying the client it means it is
attempting to get the file.

Andy


On Tue, 7 Sep 2021, Lincoln Bryant wrote:

> I tried to xrdcp this file. Seems the redirector forwards the client to uct3-s1, where it believes the file is stored, and then the client sits for 1800s and times out.
>
>> From the xrdcp debug:
>
> [2021-09-07 14:01:09.141934 -0500][Dump   ][XRootDTransport   ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, TTL: 300, allocated SIDs: 1, open files: 0, bound file objects: 0
> [2021-09-07 14:01:09.141984 -0500][Dump   ][XRootDTransport   ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, stream timeout: 60, allocated SIDs: 1, wait barrier: 2021-09-07 14:01:22 -0500
> [2021-09-07 14:01:22.637006 -0500][Dump   ][TaskMgr           ] Running task: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094"
> [2021-09-07 14:01:22.637065 -0500][Dump   ][XRootD            ] [uct3-s1.mwt2.org:1094] Stream event reported for msg kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
> [2021-09-07 14:01:22.637100 -0500][Debug  ][XRootD            ] [uct3-s1.mwt2.org:1094] Handling error while processing kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Operation expired.
> [2021-09-07 14:01:22.637125 -0500][Error  ][XRootD            ] [uct3-s1.mwt2.org:1094] Unable to get the response to request kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
> [2021-09-07 14:01:22.637157 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Passing to the thread-pool MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ).
> [2021-09-07 14:01:22.637182 -0500][Dump   ][TaskMgr           ] Will rerun task "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094" at [2021-09-07 14:01:37 -0500]
> [2021-09-07 14:01:22.637283 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Calling MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ) with status: [ERROR] Operation expired.
> [2021-09-07 14:01:22.637373 -0500][Debug  ][File              ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Open has returned with status [ERROR] Operation expired
> [2021-09-07 14:01:22.637400 -0500][Debug  ][File              ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Error while opening at uct3-s1.mwt2.org:1094: [ERROR] Operation expired
> [2021-09-07 14:01:22.637441 -0500][Debug  ][XRootD            ] Redirect trace-back:
> [2021-09-07 14:01:22.637441 -0500][Debug  ][XRootD            ]         0. Redirected from: root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root to: root://uct3-s1.mwt2.org:1094/
> [2021-09-07 14:01:22.637468 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Destroying MsgHandler: 0x1205420.
> [0B/0B][100%][==================================================][0B/s]
> Run: [ERROR] Operation expired:  (source)
> [2021-09-07 14:01:22.638020 -0500][Debug  ][JobMgr            ] Stopping the job manager...
> [2021-09-07 14:01:22.638040 -0500][Dump   ][JobMgr            ] Stopping worker #0...
> [2021-09-07 14:01:22.638245 -0500][Dump   ][JobMgr            ] Worker #0 stopped
> [2021-09-07 14:01:22.638273 -0500][Dump   ][JobMgr            ] Stopping worker #1...
> [2021-09-07 14:01:22.638480 -0500][Dump   ][JobMgr            ] Worker #1 stopped
> [2021-09-07 14:01:22.638503 -0500][Dump   ][JobMgr            ] Stopping worker #2...
> [2021-09-07 14:01:22.638678 -0500][Dump   ][JobMgr            ] Worker #2 stopped
> [2021-09-07 14:01:22.638701 -0500][Debug  ][JobMgr            ] Job manager stopped
> [2021-09-07 14:01:22.638718 -0500][Debug  ][TaskMgr           ] Stopping the task manager...
> [2021-09-07 14:01:22.638945 -0500][Debug  ][TaskMgr           ] Task manager stopped
> [2021-09-07 14:01:22.638969 -0500][Debug  ][Poller            ] Stopping the poller...
> [2021-09-07 14:01:22.639154 -0500][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094"
> [2021-09-07 14:01:22.639178 -0500][Debug  ][AsyncSock         ] [uct3-s1.mwt2.org:1094.0] Closing the socket
> [2021-09-07 14:01:22.639190 -0500][Debug  ][Poller            ] <[::ffff:192.170.227.124]:38632><--><[::ffff:192.170.227.124]:1094> Removing socket from the poller
> [2021-09-07 14:01:22.639225 -0500][Debug  ][PostMaster        ] [uct3-s1.mwt2.org:1094] Destroying stream
> [2021-09-07 14:01:22.639237 -0500][Debug  ][AsyncSock         ] [uct3-s1.mwt2.org:1094.0] Closing the socket
>
> I believe these are the relevant logs from the xrootd.log on uct3-s1:
> 210907 13:31:22 1365782 XrootdXeq: root.1372035:26@uct3-s1 pub IPv4 login
> 210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_open: 0-600 fn=/atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root
> 210907 13:31:22 1365782 root.1372035:26@uct3-s1 XrootdProtocol: delaying client up to 1800 sec
> 210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_close: use=0 fn=dummy
> 210907 13:31:22 1365782 XrdPoll: Poller 0 enabled root.1372035:26@uct3-s1
> 210907 13:31:23 1366108 XrdSched: running underused thread monitor inq=0
> 210907 13:31:23 1366108 XrdSched: 3 threads; 1 idle
> 210907 13:31:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds
> 210907 13:44:23 1365782 XrdSched: running underused thread monitor inq=0
> 210907 13:44:23 1365782 XrdSched: 3 threads; 1 idle
> 210907 13:44:23 1365782 XrdSched: scheduling underused thread monitor in 780 seconds
> 210907 13:45:23 1365776 XrdBuffManager: Reshaper has 2K; target 2450580K
> 210907 13:57:23 1366108 XrdSched: running underused thread monitor inq=0
> 210907 13:57:23 1366108 XrdSched: 3 threads; 1 idle
> 210907 13:57:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds
>
> Thoughts?
>
> Thanks,
> Lincoln
>
> ________________________________
> From: Andrew Hanushevsky <[log in to unmask]>
> Sent: Tuesday, September 7, 2021 1:15 PM
> To: Lincoln Bryant <[log in to unmask]>
> Cc: Yang, Wei <[log in to unmask]>; [log in to unmask] <[log in to unmask]>
> Subject: Re: Can't remove missing file
>
> Hi Lincoln,
>
> Now that I have read the original email more closely, I think you can
> ignore the problem. Notice that the redirector says the file is "pending".
> That means it made a commitment to place the file on the server. Future
> access to the file will vector the client to that server and the cache
> should clean up it's act. Since the file doesn't exist (yet) you get the
> error that it cannot be removed (because it can't). I'd suggest trying to
> read it and see what happens.
>
> Andy
>
>
> On Tue, 7 Sep 2021, Lincoln Bryant wrote:
>
>> Hi Andy,
>>
>> This was first noticed with an xrootd-fuse client:
>> # ls /atlas/uct3/data/users/jano/qual/matched/truth
>> ls: cannot access /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root: No such file or directory
>> ntuple_0.root
>> # rpm -q xrootd-fuse
>> xrootd-fuse-5.3.1-1.el7.x86_64
>>
>> Then I also tried with the 'xrdfs' tool:
>> # xrdfs root://uct3-xrd.mwt2.org:1094 ls /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root
>> [ERROR] Server responded with an error: [3011] No servers have read access to the file
>> # rpm -q xrootd-client
>> xrootd-client-5.3.1-1.el7.x86_64
>>
>> I have tried the same clients on different hosts, which report the same problem.
>>
>> I have tried restarting the cmsd/xrootd on all 3 servers and the redirector. File is still in the same state.
>>
>> Advice on how to proceed?
>>
>> Thanks much,
>> Lincoln
>> ________________________________
>> From: Andrew Hanushevsky <[log in to unmask]>
>> Sent: Tuesday, September 7, 2021 1:03 PM
>> To: Yang, Wei <[log in to unmask]>
>> Cc: Lincoln Bryant <[log in to unmask]>; [log in to unmask] <[log in to unmask]>
>> Subject: Re: Can't remove missing file
>>
>> For an xroot client, when this occurs, the client automatically asks the
>> redirector to correct the cache. So, this should not be an issue. On the
>> other hand, http has no such protocol so the rror persists until the cache
>> times out. So, what client was getting this error?
>>
>> On Tue, 7 Sep 2021, Yang, Wei wrote:
>>
>>> Hi Lincoln,
>>>
>>> Xrootd?s redirector will cache the info of this file for 8 hours (I think it can be shortened). So the ghost will disappear after that time. Alternatively you can restart the redirector.
>>>
>>> I don?t know if there is a way to tell the redirector to refresh the cache (or cache of this particular entry).
>>>
>>> regards,
>>> --
>>> Wei Yang  |  [log in to unmask]<mailto:[log in to unmask]>  |  650-926-3338(O)
>>>
>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-L list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1


Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1