Print

Print


I tried to xrdcp this file. Seems the redirector forwards the client to uct3-s1, where it believes the file is stored, and then the client sits for 1800s and times out.

From the xrdcp debug:

[2021-09-07 14:01:09.141934 -0500][Dump   ][XRootDTransport   ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, TTL: 300, allocated SIDs: 1, open files: 0, bound file objects: 0
[2021-09-07 14:01:09.141984 -0500][Dump   ][XRootDTransport   ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, stream timeout: 60, allocated SIDs: 1, wait barrier: 2021-09-07 14:01:22 -0500
[2021-09-07 14:01:22.637006 -0500][Dump   ][TaskMgr           ] Running task: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094"
[2021-09-07 14:01:22.637065 -0500][Dump   ][XRootD            ] [uct3-s1.mwt2.org:1094] Stream event reported for msg kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2021-09-07 14:01:22.637100 -0500][Debug  ][XRootD            ] [uct3-s1.mwt2.org:1094] Handling error while processing kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Operation expired.
[2021-09-07 14:01:22.637125 -0500][Error  ][XRootD            ] [uct3-s1.mwt2.org:1094] Unable to get the response to request kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat )
[2021-09-07 14:01:22.637157 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Passing to the thread-pool MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2021-09-07 14:01:22.637182 -0500][Dump   ][TaskMgr           ] Will rerun task "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094" at [2021-09-07 14:01:37 -0500]
[2021-09-07 14:01:22.637283 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Calling MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ) with status: [ERROR] Operation expired.
[2021-09-07 14:01:22.637373 -0500][Debug  ][File              ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Open has returned with status [ERROR] Operation expired
[2021-09-07 14:01:22.637400 -0500][Debug  ][File              ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Error while opening at uct3-s1.mwt2.org:1094: [ERROR] Operation expired
[2021-09-07 14:01:22.637441 -0500][Debug  ][XRootD            ] Redirect trace-back:
[2021-09-07 14:01:22.637441 -0500][Debug  ][XRootD            ]         0. Redirected from: root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root to: root://uct3-s1.mwt2.org:1094/
[2021-09-07 14:01:22.637468 -0500][Debug  ][ExDbgMsg          ] [uct3-s1.mwt2.org:1094] Destroying MsgHandler: 0x1205420.
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Operation expired:  (source)
[2021-09-07 14:01:22.638020 -0500][Debug  ][JobMgr            ] Stopping the job manager...
[2021-09-07 14:01:22.638040 -0500][Dump   ][JobMgr            ] Stopping worker #0...
[2021-09-07 14:01:22.638245 -0500][Dump   ][JobMgr            ] Worker #0 stopped
[2021-09-07 14:01:22.638273 -0500][Dump   ][JobMgr            ] Stopping worker #1...
[2021-09-07 14:01:22.638480 -0500][Dump   ][JobMgr            ] Worker #1 stopped
[2021-09-07 14:01:22.638503 -0500][Dump   ][JobMgr            ] Stopping worker #2...
[2021-09-07 14:01:22.638678 -0500][Dump   ][JobMgr            ] Worker #2 stopped
[2021-09-07 14:01:22.638701 -0500][Debug  ][JobMgr            ] Job manager stopped
[2021-09-07 14:01:22.638718 -0500][Debug  ][TaskMgr           ] Stopping the task manager...
[2021-09-07 14:01:22.638945 -0500][Debug  ][TaskMgr           ] Task manager stopped
[2021-09-07 14:01:22.638969 -0500][Debug  ][Poller            ] Stopping the poller...
[2021-09-07 14:01:22.639154 -0500][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094"
[2021-09-07 14:01:22.639178 -0500][Debug  ][AsyncSock         ] [uct3-s1.mwt2.org:1094.0] Closing the socket
[2021-09-07 14:01:22.639190 -0500][Debug  ][Poller            ] <[::ffff:192.170.227.124]:38632><--><[::ffff:192.170.227.124]:1094> Removing socket from the poller
[2021-09-07 14:01:22.639225 -0500][Debug  ][PostMaster        ] [uct3-s1.mwt2.org:1094] Destroying stream
[2021-09-07 14:01:22.639237 -0500][Debug  ][AsyncSock         ] [uct3-s1.mwt2.org:1094.0] Closing the socket

I believe these are the relevant logs from the xrootd.log on uct3-s1:
210907 13:31:22 1365782 XrootdXeq: root.1372035:26@uct3-s1 pub IPv4 login
210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_open: 0-600 fn=/atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root
210907 13:31:22 1365782 root.1372035:26@uct3-s1 XrootdProtocol: delaying client up to 1800 sec
210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_close: use=0 fn=dummy
210907 13:31:22 1365782 XrdPoll: Poller 0 enabled root.1372035:26@uct3-s1
210907 13:31:23 1366108 XrdSched: running underused thread monitor inq=0
210907 13:31:23 1366108 XrdSched: 3 threads; 1 idle
210907 13:31:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds
210907 13:44:23 1365782 XrdSched: running underused thread monitor inq=0
210907 13:44:23 1365782 XrdSched: 3 threads; 1 idle
210907 13:44:23 1365782 XrdSched: scheduling underused thread monitor in 780 seconds
210907 13:45:23 1365776 XrdBuffManager: Reshaper has 2K; target 2450580K
210907 13:57:23 1366108 XrdSched: running underused thread monitor inq=0
210907 13:57:23 1366108 XrdSched: 3 threads; 1 idle
210907 13:57:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds

Thoughts?

Thanks,
Lincoln


From: Andrew Hanushevsky <[log in to unmask]>
Sent: Tuesday, September 7, 2021 1:15 PM
To: Lincoln Bryant <[log in to unmask]>
Cc: Yang, Wei <[log in to unmask]>; [log in to unmask] <[log in to unmask]>
Subject: Re: Can't remove missing file
 
Hi Lincoln,

Now that I have read the original email more closely, I think you can
ignore the problem. Notice that the redirector says the file is "pending".
That means it made a commitment to place the file on the server. Future
access to the file will vector the client to that server and the cache
should clean up it's act. Since the file doesn't exist (yet) you get the
error that it cannot be removed (because it can't). I'd suggest trying to
read it and see what happens.

Andy


On Tue, 7 Sep 2021, Lincoln Bryant wrote:

> Hi Andy,
>
> This was first noticed with an xrootd-fuse client:
> # ls /atlas/uct3/data/users/jano/qual/matched/truth
> ls: cannot access /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root: No such file or directory
> ntuple_0.root
> # rpm -q xrootd-fuse
> xrootd-fuse-5.3.1-1.el7.x86_64
>
> Then I also tried with the 'xrdfs' tool:
> # xrdfs root://uct3-xrd.mwt2.org:1094 ls /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root
> [ERROR] Server responded with an error: [3011] No servers have read access to the file
> # rpm -q xrootd-client
> xrootd-client-5.3.1-1.el7.x86_64
>
> I have tried the same clients on different hosts, which report the same problem.
>
> I have tried restarting the cmsd/xrootd on all 3 servers and the redirector. File is still in the same state.
>
> Advice on how to proceed?
>
> Thanks much,
> Lincoln
> ________________________________
> From: Andrew Hanushevsky <[log in to unmask]>
> Sent: Tuesday, September 7, 2021 1:03 PM
> To: Yang, Wei <[log in to unmask]>
> Cc: Lincoln Bryant <[log in to unmask]>; [log in to unmask] <[log in to unmask]>
> Subject: Re: Can't remove missing file
>
> For an xroot client, when this occurs, the client automatically asks the
> redirector to correct the cache. So, this should not be an issue. On the
> other hand, http has no such protocol so the rror persists until the cache
> times out. So, what client was getting this error?
>
> On Tue, 7 Sep 2021, Yang, Wei wrote:
>
>> Hi Lincoln,
>>
>> Xrootd?s redirector will cache the info of this file for 8 hours (I think it can be shortened). So the ghost will disappear after that time. Alternatively you can restart the redirector.
>>
>> I don?t know if there is a way to tell the redirector to refresh the cache (or cache of this particular entry).
>>
>> regards,
>> --
>> Wei Yang  |  [log in to unmask]<mailto:[log in to unmask]>  |  650-926-3338(O)
>>
>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>


Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1