It's a plain server with attached disks configured in a RAID-6. Is there a simple way to take the logical name and get the physical name on disk? Thanks, Lincoln ________________________________ From: [log in to unmask] <[log in to unmask]> on behalf of Andrew Hanushevsky <[log in to unmask]> Sent: Tuesday, September 7, 2021 2:49 PM To: Lincoln Bryant <[log in to unmask]> Cc: Yang, Wei <[log in to unmask]>; [log in to unmask] <[log in to unmask]> Subject: Re: Can't remove missing file Hi Lincoln, What is /uct3-s1.mwt2.org? Is it an Xcache server, a plain server, a tape backed server? Clearly, that server is having issues and it would be intgersting to find out why it can't provide the file even though it thinks it can. I say that because by delaying the client it means it is attempting to get the file. Andy On Tue, 7 Sep 2021, Lincoln Bryant wrote: > I tried to xrdcp this file. Seems the redirector forwards the client to uct3-s1, where it believes the file is stored, and then the client sits for 1800s and times out. > >> From the xrdcp debug: > > [2021-09-07 14:01:09.141934 -0500][Dump ][XRootDTransport ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, TTL: 300, allocated SIDs: 1, open files: 0, bound file objects: 0 > [2021-09-07 14:01:09.141984 -0500][Dump ][XRootDTransport ] [uct3-s1.mwt2.org:1094.0] Stream inactive since 1787 seconds, stream timeout: 60, allocated SIDs: 1, wait barrier: 2021-09-07 14:01:22 -0500 > [2021-09-07 14:01:22.637006 -0500][Dump ][TaskMgr ] Running task: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094" > [2021-09-07 14:01:22.637065 -0500][Dump ][XRootD ] [uct3-s1.mwt2.org:1094] Stream event reported for msg kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) > [2021-09-07 14:01:22.637100 -0500][Debug ][XRootD ] [uct3-s1.mwt2.org:1094] Handling error while processing kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ): [ERROR] Operation expired. > [2021-09-07 14:01:22.637125 -0500][Error ][XRootD ] [uct3-s1.mwt2.org:1094] Unable to get the response to request kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) > [2021-09-07 14:01:22.637157 -0500][Debug ][ExDbgMsg ] [uct3-s1.mwt2.org:1094] Passing to the thread-pool MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ). > [2021-09-07 14:01:22.637182 -0500][Dump ][TaskMgr ] Will rerun task "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094" at [2021-09-07 14:01:37 -0500] > [2021-09-07 14:01:22.637283 -0500][Debug ][ExDbgMsg ] [uct3-s1.mwt2.org:1094] Calling MsgHandler: 0x1205420 (message: kXR_open (file: /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ) with status: [ERROR] Operation expired. > [2021-09-07 14:01:22.637373 -0500][Debug ][File ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Open has returned with status [ERROR] Operation expired > [2021-09-07 14:01:22.637400 -0500][Debug ][File ] [0x12005a0@root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root?xrdcl.requuid=2686890d-010a-42b5-bcf6-c5d77f0213e0] Error while opening at uct3-s1.mwt2.org:1094: [ERROR] Operation expired > [2021-09-07 14:01:22.637441 -0500][Debug ][XRootD ] Redirect trace-back: > [2021-09-07 14:01:22.637441 -0500][Debug ][XRootD ] 0. Redirected from: root://uct3-xrd.mwt2.org:1094//atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root to: root://uct3-s1.mwt2.org:1094/ > [2021-09-07 14:01:22.637468 -0500][Debug ][ExDbgMsg ] [uct3-s1.mwt2.org:1094] Destroying MsgHandler: 0x1205420. > [0B/0B][100%][==================================================][0B/s] > Run: [ERROR] Operation expired: (source) > [2021-09-07 14:01:22.638020 -0500][Debug ][JobMgr ] Stopping the job manager... > [2021-09-07 14:01:22.638040 -0500][Dump ][JobMgr ] Stopping worker #0... > [2021-09-07 14:01:22.638245 -0500][Dump ][JobMgr ] Worker #0 stopped > [2021-09-07 14:01:22.638273 -0500][Dump ][JobMgr ] Stopping worker #1... > [2021-09-07 14:01:22.638480 -0500][Dump ][JobMgr ] Worker #1 stopped > [2021-09-07 14:01:22.638503 -0500][Dump ][JobMgr ] Stopping worker #2... > [2021-09-07 14:01:22.638678 -0500][Dump ][JobMgr ] Worker #2 stopped > [2021-09-07 14:01:22.638701 -0500][Debug ][JobMgr ] Job manager stopped > [2021-09-07 14:01:22.638718 -0500][Debug ][TaskMgr ] Stopping the task manager... > [2021-09-07 14:01:22.638945 -0500][Debug ][TaskMgr ] Task manager stopped > [2021-09-07 14:01:22.638969 -0500][Debug ][Poller ] Stopping the poller... > [2021-09-07 14:01:22.639154 -0500][Debug ][TaskMgr ] Requesting unregistration of: "TickGeneratorTask for: root://uct3-s1.mwt2.org:1094" > [2021-09-07 14:01:22.639178 -0500][Debug ][AsyncSock ] [uct3-s1.mwt2.org:1094.0] Closing the socket > [2021-09-07 14:01:22.639190 -0500][Debug ][Poller ] <[::ffff:192.170.227.124]:38632><--><[::ffff:192.170.227.124]:1094> Removing socket from the poller > [2021-09-07 14:01:22.639225 -0500][Debug ][PostMaster ] [uct3-s1.mwt2.org:1094] Destroying stream > [2021-09-07 14:01:22.639237 -0500][Debug ][AsyncSock ] [uct3-s1.mwt2.org:1094.0] Closing the socket > > I believe these are the relevant logs from the xrootd.log on uct3-s1: > 210907 13:31:22 1365782 XrootdXeq: root.1372035:26@uct3-s1 pub IPv4 login > 210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_open: 0-600 fn=/atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root > 210907 13:31:22 1365782 root.1372035:26@uct3-s1 XrootdProtocol: delaying client up to 1800 sec > 210907 13:31:22 1365782 root.1372035:26@uct3-s1 ofs_close: use=0 fn=dummy > 210907 13:31:22 1365782 XrdPoll: Poller 0 enabled root.1372035:26@uct3-s1 > 210907 13:31:23 1366108 XrdSched: running underused thread monitor inq=0 > 210907 13:31:23 1366108 XrdSched: 3 threads; 1 idle > 210907 13:31:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds > 210907 13:44:23 1365782 XrdSched: running underused thread monitor inq=0 > 210907 13:44:23 1365782 XrdSched: 3 threads; 1 idle > 210907 13:44:23 1365782 XrdSched: scheduling underused thread monitor in 780 seconds > 210907 13:45:23 1365776 XrdBuffManager: Reshaper has 2K; target 2450580K > 210907 13:57:23 1366108 XrdSched: running underused thread monitor inq=0 > 210907 13:57:23 1366108 XrdSched: 3 threads; 1 idle > 210907 13:57:23 1366108 XrdSched: scheduling underused thread monitor in 780 seconds > > Thoughts? > > Thanks, > Lincoln > > ________________________________ > From: Andrew Hanushevsky <[log in to unmask]> > Sent: Tuesday, September 7, 2021 1:15 PM > To: Lincoln Bryant <[log in to unmask]> > Cc: Yang, Wei <[log in to unmask]>; [log in to unmask] <[log in to unmask]> > Subject: Re: Can't remove missing file > > Hi Lincoln, > > Now that I have read the original email more closely, I think you can > ignore the problem. Notice that the redirector says the file is "pending". > That means it made a commitment to place the file on the server. Future > access to the file will vector the client to that server and the cache > should clean up it's act. Since the file doesn't exist (yet) you get the > error that it cannot be removed (because it can't). I'd suggest trying to > read it and see what happens. > > Andy > > > On Tue, 7 Sep 2021, Lincoln Bryant wrote: > >> Hi Andy, >> >> This was first noticed with an xrootd-fuse client: >> # ls /atlas/uct3/data/users/jano/qual/matched/truth >> ls: cannot access /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root: No such file or directory >> ntuple_0.root >> # rpm -q xrootd-fuse >> xrootd-fuse-5.3.1-1.el7.x86_64 >> >> Then I also tried with the 'xrdfs' tool: >> # xrdfs root://uct3-xrd.mwt2.org:1094 ls /atlas/uct3/data/users/jano/qual/matched/truth/ntuple_0.root >> [ERROR] Server responded with an error: [3011] No servers have read access to the file >> # rpm -q xrootd-client >> xrootd-client-5.3.1-1.el7.x86_64 >> >> I have tried the same clients on different hosts, which report the same problem. >> >> I have tried restarting the cmsd/xrootd on all 3 servers and the redirector. File is still in the same state. >> >> Advice on how to proceed? >> >> Thanks much, >> Lincoln >> ________________________________ >> From: Andrew Hanushevsky <[log in to unmask]> >> Sent: Tuesday, September 7, 2021 1:03 PM >> To: Yang, Wei <[log in to unmask]> >> Cc: Lincoln Bryant <[log in to unmask]>; [log in to unmask] <[log in to unmask]> >> Subject: Re: Can't remove missing file >> >> For an xroot client, when this occurs, the client automatically asks the >> redirector to correct the cache. So, this should not be an issue. On the >> other hand, http has no such protocol so the rror persists until the cache >> times out. So, what client was getting this error? >> >> On Tue, 7 Sep 2021, Yang, Wei wrote: >> >>> Hi Lincoln, >>> >>> Xrootd?s redirector will cache the info of this file for 8 hours (I think it can be shortened). So the ghost will disappear after that time. Alternatively you can restart the redirector. >>> >>> I don?t know if there is a way to tell the redirector to refresh the cache (or cache of this particular entry). >>> >>> regards, >>> -- >>> Wei Yang | [log in to unmask]<mailto:[log in to unmask]> | 650-926-3338(O) >>> >>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-L list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>> >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >> > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1