URL: <http://savannah.cern.ch/support/?136850> Summary: Weird error message of xrootd when facing client disconnection Project: XROOTD Submitted by: espinal Submitted on: 2013-04-04 07:05 Category: None Priority: 5 - Normal Severity: 3 - Normal Status: None Privacy: Public Assigned to: None Originator Email: Open/Closed: Open Discussion Lock: Any Operating System: GNU/Linux _______________________________________________________ Details: Observed misleading error message in case of client disconnection while transferring data. A bit of context: * Client is an external box (not at CERN) running xroot v.3.0.4-1 and i386 arch. * Observed connection reset by peer while trying to send data * Error message appearing in xroot logs is: "Unable to signal open to the corresponding LSF stagerjob ; Input/output error'" which is a bit misleading as LSF is not present anymore on the stagers, and probably we want something meaningful. * Logs: - xrdlog.server log: 130403 16:35:17 30990 XrdInet: Accepted connection from [log in to unmask] 130403 16:35:17 9246 XrootdXeq: ardm.18293:[log in to unmask] login as ardm 130403 16:35:17 9246 ardm.18293:[log in to unmask] castor2ofs_open: castorlfn=/castor/cern.ch/ArDM/test/event_extraction.pdf 130403 16:35:17 9246 ardm.18293:[log in to unmask] castor2ofs_open: 4102-40664 [log in to unmask] 130403 16:35:18 30996 XrdPoll: Sever event occured for ardm.18293:[log in to unmask] 130403 16:35:18 9246 XrdLink: Unable to receive from ardm.18293:[log in to unmask]; connection reset by peer 130403 16:35:18 9246 XrootdXeq: ardm.18293:[log in to unmask] disc 0:00:01 (link read error) 130403 16:35:18 9246 ardm.18293:[log in to unmask] castor2ofs_close: use=1 [log in to unmask] 130403 16:35:18 9246 ardm.18293:[log in to unmask] castor2ofs_close: use=0 fn=dummy 130403 16:35:29 30990 XrdInet: Accepted connection from [log in to unmask] 130403 16:35:29 31486 XrootdXeq: ardm.18293:[log in to unmask] login as ardm 130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdProtocol: endsess 30977:8.12475 130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdProtocol: endsess 30977:8.12475 rc=-3 (No such process) 130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdResponse: sending err 3011: session not found 130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_open: castorlfn=/castor/cern.ch/ArDM/test/event_extraction.pdf 130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_open: error: open => couldn't run the Open towards StagerJob: reqid=12864861298 stagerjobport=41109 stagerjobuuid=d7a54302-70a8-2b5f-e043-6aa18a892e44 130403 16:35:29 31486 castor2ofs_open: ardm.18293:[log in to unmask] Unable to signal open to the corresponding LSF stagerjob ; Input/output error 130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdResponse: sending err 3007: Unable to signal open to the corresponding LSF stagerjob ; Input/output error 130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close: use=0 fn=dummy 130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close: StagerJob close failed: got rc=0 msg=undef 130403 16:35:29 31486 castor2ofs_close: ardm.18293:[log in to unmask] Unable to undef ; Success 130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close: use=0 fn=dummy 130403 16:35:29 31486 XrootdXeq: ardm.18293:[log in to unmask] disc 0:00:00 - The error message we got in xrootd debugging is: ======== DUMPING SERVER RESPONSE HEADER ======== ServerHeader.streamid = 0x01 0x00 ServerHeader.status = kXR_error (4003) ServerHeader.dlen = 82 ========== END DUMPING SERVER HEADER =========== 130403 16:35:29 001 Xrd: ReadPartialAnswer: Server [lxfsra26a03.cern.ch:1095] answered [kXR_error] (4003) 130403 16:35:29 001 Xrd: CheckErrorStatus: Server [lxfsra26a03.cern.ch:1095] declared: Unable to signal open to the corresponding LSF stagerjob ; Input/output error(error code: 3007) 130403 16:35:29 001 Xrd: OpenFileWhenRedirected: File open failed. 130403 16:35:29 001 Xrd: SendGenCommand: Got (and maybe recovered) an error from lxfsra26a03.cern.ch:1095 130403 16:35:29 001 Xrd: SendGenCommand: Recovery failure detected. Aborting request.lxfsra26a03.cern.ch:1095 Error writing to output server. Last server error 3007 ('Unable to signal open to the corresponding LSF stagerjob ; Input/output error') - castor stager log: root@lxfsra26a03 castor]# grep d7a54302-70a8-2b5f-e043-6aa18a892e44 stagerjob.log* stagerjob.log.1:2013-04-03T16:35:07.488229+02:00 lxfsra26a03 stagerjob[27189]: LVL=Info TID=27189 MSG="Job Started" REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15 NSHOSTNAME=castorns NSFILEID=1289998589 Arguments="/usr/bin/stagerjob -r 7a49e241-9a7b-4f7b-826f-eecc4629ab15 -s d7a54302-70a8-2b5f-e043-6aa18a892e44 -F 1289998589 -H castorns -p xroot -i 12864861298.0 -T 40 -m w -C 129:128.142.171.178:34764 -u 3563 -g 1107 -X 0 -S default -t 1364999707 -R file:///tmp/tmpJRNoi6" Type="StagePutRequest" Protocol="xroot" SvcClass="default" TotalWaitTime=0.488204 SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44 stagerjob.log.1:2013-04-03T16:35:18.616308+02:00 lxfsra26a03 stagerjob[27189]: LVL=Warn TID=27189 MSG="Failed to get checksum information from extended attributes" REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15 NSHOSTNAME=castorns NSFILEID=1289998589 Path="[log in to unmask]" Error="No data available" SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44 stagerjob.log.1:2013-04-03T16:35:18.640257+02:00 lxfsra26a03 stagerjob[27189]: LVL=Info TID=27189 MSG="Job finished successfully" REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15 NSHOSTNAME=castorns NSFILEID=1289998589 ElapsedTime=11.155123 SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44 _______________________________________________________ Reply to this item at: <http://savannah.cern.ch/support/?136850> _______________________________________________ Message sent via/by LCG Savannah http://savannah.cern.ch/ ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1