Print

Print


Hi,

I noticed this in log of a xrootd manager (xrootd.t2.ucsd.edu):

110625 11:26:59 10971 Receive xrootd 23 bytes on 10550270
110625 11:26:59 10971 Decode xrootd redirects uscmsPoo.11911:[log in to unmask] to uaf-6.t2.ucsd.edu:1094 /store/data/Run2011A/DoubleElectron/AOD/May10ReRe
co-v1/0000/8EFCAE37-687B-E011-BAEA-001A92971B8E.root
110625 11:26:59 10971 uscmsPoo.11911:[log in to unmask] XrootdProtocol: redirecting to uaf-6.t2.ucsd.edu:1094
110625 11:27:15 10971 Receive xrootd 0 bytes on 0
110625 11:27:15 10971 setStatus xrootd.t2.ucsd.edu sent suspend event
110625 11:27:15 10971 cms_setStatus: Manager xrootd.t2.ucsd.edu suspended
110625 11:29:12 10971 nagios.8294:[log in to unmask] XrootdProtocol: more auth requested; sz=2070
110625 11:29:12 10971 XrootdXeq: nagios.8294:[log in to unmask] login as 92a2e9e2.0
110625 11:29:12 10971 nagios.8294:[log in to unmask] XrootdProtocol: stalling client for 10 sec
110625 11:29:22 10971 nagios.8294:[log in to unmask] XrootdProtocol: stalling client for 10 sec
110625 11:29:32 10971 nagios.8294:[log in to unmask] XrootdProtocol: stalling client for 10 sec
110625 11:29:42 10971 nagios.8294:[log in to unmask] XrootdProtocol: stalling client for 10 sec

and then it keeps stalling the clients. What happened here? The cmsd log fragment from arounf this time is below.

At about this time one of the servers, uaf-3 was shut-down as it had, it seems so far, a corruption of system disk.

The restart of xrootd/cmsd fixed the problem.

Cheers,
Matevz


110625 11:27:11 19518 Dispatch manager.0:[log in to unmask] for state dlen=204
110625 11:27:11 19518 tore/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/DE6E36B9-A67B-E011-BC19-001A92810AEE.root do_State: /store/data/Run2011A/DoubleMu/AOD/PromptReco-v4/000/165/548/44682DE0-6387-E011-B50E-001617E30F48.root,/store/data/Run2011A/DoubleMu/AOD/PromptReco-v4/000/165/472/22826903-2C86-E011-810E-003048F1C420.root
110625 11:27:15 19518 Update Counts Parm1=-1 Parm2=0
110625 11:27:15 19518 Remove_Node server.2524:22@uaf-3:1094 node 4.5
110625 11:27:15 19518 State: Status changed to suspended
110625 11:27:15 19518 Send status to redirector.10971:26@xrootd
110625 11:27:15 19518 Protocol: server.2524:22@uaf-3 logged out; request read failed
110625 11:27:15 19518 Inform xrootd-itb.unl.edu status
110625 11:27:15 19518 Inform xrootd.unl.edu status
110625 11:27:17 19518 Dispatch manager.0:[log in to unmask] for state dlen=96
110625 11:27:17 19518 tore/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/DE6E36B9-A67B-E011-BC19-001A92810AEE.root do_State: /store/data/Run2011A/DoubleMu/AOD/May10ReReco-v1/0000/5A1FB529-D67B-E011-8920-0026189438B5.root
110625 11:27:17 19518 Broadcast server.2524:22@uaf-3:1094 is unreachable
110625 11:27:17 19518 Dispatch server.2925:21@uaf-4:1094 for have dlen=96
110625 11:27:17 19518 server.2925:21@uaf-4:1094 do_Have: /store/data/Run2011A/DoubleMu/AOD/May10ReReco-v1/0000/5A1FB529-D67B-E011-8920-0026189438B5.root
110625 11:27:17 19518 Inform xrootd-itb.unl.edu have
110625 11:27:17 19518 Dispatch server.13721:19@uaf-6:1094 for have dlen=96
110625 11:27:17 19518 Inform xrootd.unl.edu have
110625 11:27:17 19518 server.13721:19@uaf-6:1094 do_Have: /store/data/Run2011A/DoubleMu/AOD/May10ReReco-v1/0000/5A1FB529-D67B-E011-8920-0026189438B5.root
110625 11:27:17 19518 Dispatch manager.0:[log in to unmask] for state dlen=108
110625 11:27:17 19518 tore/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/DE6E36B9-A67B-E011-BC19-001A92810AEE.root do_State: /store/data/Run2011A/DoubleElectron/AOD/PromptReco-v4/000/166/346/D857F287-D58E-E011-87B0-001D09F2906A.root
110625 11:27:17 19518 Broadcast server.2524:22@uaf-3:1094 is unreachable
110625 11:27:17 19518 Dispatch manager.0:[log in to unmask] for state dlen=102
110625 11:27:17 19518 tore/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/DE6E36B9-A67B-E011-BC19-001A92810AEE.root do_State: /store/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/8A376B9B-7A7B-E011-8EA2-001A92810AE4.root
110625 11:27:17 19518 Broadcast server.2524:22@uaf-3:1094 is unreachable
110625 11:27:17 19518 Dispatch server.13721:19@uaf-6:1094 for have dlen=108
110625 11:27:17 19518 server.13721:19@uaf-6:1094 do_Have: /store/data/Run2011A/DoubleElectron/AOD/PromptReco-v4/000/166/346/D857F287-D58E-E011-87B0-001D09F2906A.root
110625 11:27:17 19518 Inform xrootd-itb.unl.edu have
110625 11:27:17 19518 Inform xrootd.unl.edu have
110625 11:27:17 19518 Dispatch server.2925:21@uaf-4:1094 for have dlen=108
110625 11:27:17 19518 Dispatch server.2925:21@uaf-4:1094 for have dlen=102
110625 11:27:17 19518 server.2925:21@uaf-4:1094 do_Have: /store/data/Run2011A/DoubleElectron/AOD/PromptReco-v4/000/166/346/D857F287-D58E-E011-87B0-001D09F2906A.root
110625 11:27:17 19518 server.2925:21@uaf-4:1094 do_Have: /store/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/8A376B9B-7A7B-E011-8EA2-001A92810AE4.root
110625 11:27:17 19518 Inform xrootd-itb.unl.edu have
110625 11:27:17 19518 Inform xrootd.unl.edu have
110625 11:27:17 19518 Dispatch server.13721:19@uaf-6:1094 for have dlen=102
110625 11:27:17 19518 server.13721:19@uaf-6:1094 do_Have: /store/data/Run2011A/DoubleElectron/AOD/May10ReReco-v1/0000/8A376B9B-7A7B-E011-8EA2-001A92810AE4.root