Print

Print


URL:
  <http://savannah.cern.ch/support/?136850>

                 Summary: Weird error message of xrootd when facing client
disconnection
                 Project: XROOTD
            Submitted by: espinal
            Submitted on: 2013-04-04 07:05
                Category: None
                Priority: 5 - Normal
                Severity: 3 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
        Operating System: GNU/Linux

    _______________________________________________________

Details:

Observed misleading error message in case of client disconnection while
transferring data. 

A bit of context: 

* Client is an external box (not at CERN) running xroot v.3.0.4-1 and i386
arch.

* Observed connection reset by peer while trying to send data

* Error message appearing in xroot logs is: "Unable to signal open to the
corresponding LSF stagerjob ; Input/output error'" which is a bit misleading
as LSF is not present anymore on the stagers, and probably we want something
meaningful.

* Logs:

- xrdlog.server log:

130403 16:35:17 30990 XrdInet: Accepted connection from [log in to unmask]
130403 16:35:17 9246 XrootdXeq: ardm.18293:[log in to unmask] login as ardm
130403 16:35:17 9246 ardm.18293:[log in to unmask] castor2ofs_open:
castorlfn=/castor/cern.ch/ArDM/test/event_extraction.pdf
130403 16:35:17 9246 ardm.18293:[log in to unmask] castor2ofs_open: 4102-40664
[log in to unmask]
130403 16:35:18 30996 XrdPoll: Sever event occured for
ardm.18293:[log in to unmask]
130403 16:35:18 9246 XrdLink: Unable to receive from
ardm.18293:[log in to unmask]; connection reset by peer
130403 16:35:18 9246 XrootdXeq: ardm.18293:[log in to unmask] disc 0:00:01
(link read error)
130403 16:35:18 9246 ardm.18293:[log in to unmask] castor2ofs_close: use=1
[log in to unmask]
130403 16:35:18 9246 ardm.18293:[log in to unmask] castor2ofs_close: use=0
fn=dummy
130403 16:35:29 30990 XrdInet: Accepted connection from [log in to unmask]
130403 16:35:29 31486 XrootdXeq: ardm.18293:[log in to unmask] login as ardm
130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdProtocol: endsess
30977:8.12475
130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdProtocol: endsess
30977:8.12475 rc=-3 (No such process)
130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdResponse: sending
err 3011: session not found
130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_open:
castorlfn=/castor/cern.ch/ArDM/test/event_extraction.pdf
130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_open: error:
open => couldn't run the Open towards StagerJob: reqid=12864861298
stagerjobport=41109 stagerjobuuid=d7a54302-70a8-2b5f-e043-6aa18a892e44
130403 16:35:29 31486 castor2ofs_open: ardm.18293:[log in to unmask] Unable
to signal open to the corresponding LSF stagerjob ; Input/output error
130403 16:35:29 31486 ardm.18293:[log in to unmask] XrootdResponse: sending
err 3007: Unable to signal open to the corresponding LSF stagerjob ;
Input/output error
130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close: use=0
fn=dummy
130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close:
StagerJob close failed: got rc=0 msg=undef
130403 16:35:29 31486 castor2ofs_close: ardm.18293:[log in to unmask] Unable
to undef ; Success
130403 16:35:29 31486 ardm.18293:[log in to unmask] castor2ofs_close: use=0
fn=dummy
130403 16:35:29 31486 XrootdXeq: ardm.18293:[log in to unmask] disc 0:00:00


- The error message we got in xrootd debugging is:


======== DUMPING SERVER RESPONSE HEADER ========
      ServerHeader.streamid = 0x01 0x00
        ServerHeader.status = kXR_error (4003)
          ServerHeader.dlen = 82
========== END DUMPING SERVER HEADER ===========

130403 16:35:29 001 Xrd: ReadPartialAnswer: Server [lxfsra26a03.cern.ch:1095]
answered [kXR_error] (4003)
130403 16:35:29 001 Xrd: CheckErrorStatus: Server [lxfsra26a03.cern.ch:1095]
declared: Unable to signal open to the corresponding LSF stagerjob ;
Input/output error(error code: 3007)
130403 16:35:29 001 Xrd: OpenFileWhenRedirected: File open failed.
130403 16:35:29 001 Xrd: SendGenCommand: Got (and maybe recovered) an error
from lxfsra26a03.cern.ch:1095
130403 16:35:29 001 Xrd: SendGenCommand: Recovery failure detected. Aborting
request.lxfsra26a03.cern.ch:1095
Error writing to output server.
Last server error 3007 ('Unable to signal open to the corresponding LSF
stagerjob ; Input/output error')


- castor stager log:


root@lxfsra26a03 castor]# grep d7a54302-70a8-2b5f-e043-6aa18a892e44
stagerjob.log*
stagerjob.log.1:2013-04-03T16:35:07.488229+02:00 lxfsra26a03
stagerjob[27189]: LVL=Info TID=27189 MSG="Job Started"
REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15 NSHOSTNAME=castorns
NSFILEID=1289998589 Arguments="/usr/bin/stagerjob -r
7a49e241-9a7b-4f7b-826f-eecc4629ab15 -s d7a54302-70a8-2b5f-e043-6aa18a892e44
-F 1289998589 -H castorns -p xroot -i 12864861298.0 -T 40 -m w -C
129:128.142.171.178:34764 -u 3563 -g 1107 -X 0 -S default -t 1364999707 -R
file:///tmp/tmpJRNoi6" Type="StagePutRequest" Protocol="xroot"
SvcClass="default" TotalWaitTime=0.488204
SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44
stagerjob.log.1:2013-04-03T16:35:18.616308+02:00 lxfsra26a03
stagerjob[27189]: LVL=Warn TID=27189 MSG="Failed to get checksum information
from extended attributes" REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15
NSHOSTNAME=castorns NSFILEID=1289998589
Path="[log in to unmask]" Error="No data
available" SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44
stagerjob.log.1:2013-04-03T16:35:18.640257+02:00 lxfsra26a03
stagerjob[27189]: LVL=Info TID=27189 MSG="Job finished successfully"
REQID=7a49e241-9a7b-4f7b-826f-eecc4629ab15 NSHOSTNAME=castorns
NSFILEID=1289998589 ElapsedTime=11.155123
SUBREQID=d7a54302-70a8-2b5f-e043-6aa18a892e44






    _______________________________________________________

Reply to this item at:

  <http://savannah.cern.ch/support/?136850>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1