Print

Print


Hi,

I'm out of ideas where to debug further. We've hit very odd case that `xrdcp` client receives SIGSEGV. Full backtrace with few additional info you can find below [*].

`xrootd-client` version is:
```
Name        : xrootd-client
Version     : 4.8.3
Release     : 1.osg34.el7
```

`xrootd-server` version is:
```
Name        : xrootd-server
Version     : 4.8.1
Release     : 1.osg34.el7
```

Server was verified as identical installation to other xrootd nodes at the site, configuration included. Fetching file from other nodes at the site works well. We can reproduce "`Segmentation fault`" eventually quickly by running command manually (see backtrace below).

In addition to backtrace, I'm posting output with increased debug level `$ xrdcp -d 3 -f root://...`:
http://t2.unl.edu/store/zvada/andy/xrdcp-segfault.txt

We are out of ideas what to debug in the next step. Best guess suggests that node sends a malformed message to the client that responds in segfault... We haven't seen this happening elsewhere, though.

Thanks for any help in advance.

-Marian

[*]
```
(gdb) r
Starting program: /usr/bin/xrdcp -d 2 -f root://xrootd.accre.vanderbilt.edu:1094//store/mc/SAM/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v1/00000/A64CCCF2-5C76-E711-B359-0CC47A78A3F8.root /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[2018-06-21 16:26:15.408510 -0500][Debug  ][Utility           ] CopyProcess: 2 jobs to prepare
[2018-06-21 16:26:15.408619 -0500][Debug  ][Utility           ] Creating a classic copy job, from root://xrootd.accre.vanderbilt.edu:1094//store/mc/SAM/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v1/00000/A64CCCF2-5C76-E711-B359-0CC47A78A3F8.root to file://localhost/dev/null
[2018-06-21 16:26:15.408650 -0500][Debug  ][Utility           ] Monitor library name not set. No monitoring
[2018-06-21 16:26:15.408721 -0500][Debug  ][Poller            ] Available pollers: built-in
[2018-06-21 16:26:15.408728 -0500][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in
[2018-06-21 16:26:15.408734 -0500][Debug  ][Poller            ] Creating poller: built-in
[2018-06-21 16:26:15.408745 -0500][Debug  ][Poller            ] Creating and starting the built-in poller...
[New Thread 0x7ffff4a25700 (LWP 11113)]
[2018-06-21 16:26:15.413698 -0500][Debug  ][Poller            ] Using 1 poller threads
[2018-06-21 16:26:15.413726 -0500][Debug  ][TaskMgr           ] Starting the task manager...
[New Thread 0x7ffff4224700 (LWP 11114)]
[2018-06-21 16:26:15.414280 -0500][Debug  ][TaskMgr           ] Task manager started
[2018-06-21 16:26:15.414297 -0500][Debug  ][JobMgr            ] Starting the job manager...
[New Thread 0x7ffff3a23700 (LWP 11115)]
[New Thread 0x7ffff3222700 (LWP 11116)]
[New Thread 0x7ffff2a21700 (LWP 11117)]
[2018-06-21 16:26:15.416346 -0500][Debug  ][JobMgr            ] Job manager started, 3 workers
[2018-06-21 16:26:15.416364 -0500][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2018-06-21 16:26:15 -0500]
[2018-06-21 16:26:15.416378 -0500][Debug  ][Utility           ] Opening root://xrootd.accre.vanderbilt.edu:1094//store/mc/SAM/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v1/00000/A64CCCF2-5C76-E711-B359-0CC47A78A3F8.root for reading
[2018-06-21 16:26:15.416421 -0500][Debug  ][File              ] [0x61b6b0@root://xrootd.accre.vanderbilt.edu:1094//store/mc/SAM/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v1/00000/A64CCCF2-5C76-E711-B359-0CC47A78A3F8.root] Sending an open command
[2018-06-21 16:26:15.416479 -0500][Debug  ][PostMaster        ] Creating new channel to: xrootd.accre.vanderbilt.edu:1094 1 stream(s)
[2018-06-21 16:26:15.416499 -0500][Debug  ][PostMaster        ] [xrootd.accre.vanderbilt.edu:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2018-06-21 16:26:15.417059 -0500][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: xrootd.accre.vanderbilt.edu:1094" to be run at: [2018-06-21 16:26:30 -0500]
[2018-06-21 16:26:15.418326 -0500][Debug  ][PostMaster        ] [xrootd.accre.vanderbilt.edu:1094] Found 1 address(es): [::ffff:129.59.197.121]:1094
[2018-06-21 16:26:15.418417 -0500][Debug  ][AsyncSock         ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Attempting connection to [::ffff:129.59.197.121]:1094
[2018-06-21 16:26:15.418473 -0500][Debug  ][Poller            ] Adding socket 0x621290 to the poller
[2018-06-21 16:26:15.459293 -0500][Debug  ][AsyncSock         ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Async connection call returned
[2018-06-21 16:26:15.459368 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2018-06-21 16:26:15.500320 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 310)
[2018-06-21 16:26:15.500380 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] kXR_protocol successful (type: manager [], protocol version 310)
[2018-06-21 16:26:15.501518 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Sending out kXR_login request, username: zvada, cgi: ?xrd.cc=us&xrd.tz=-6&xrd.appname=xrdcp&xrd.info=&xrd.hostname=hcc-marian.unl.edu&xrd.rn=v4.8.3, dual-stack: true, private IPv4: false, private IPv6: false
[2018-06-21 16:26:15.542406 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Logged in, session: 6e4c0000093a000037000000164e0000
[2018-06-21 16:26:15.542456 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Authentication is required: &P=gsi,v:10300,c:ssl,ca:70d35895.0|d690e530.0
[2018-06-21 16:26:15.542470 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Sending authentication data
[2018-06-21 16:26:15.558881 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Trying to authenticate using gsi
[2018-06-21 16:26:15.692276 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Sending more authentication data for gsi
[2018-06-21 16:26:15.738889 -0500][Debug  ][XRootDTransport   ] [xrootd.accre.vanderbilt.edu:1094 #0.0] Authenticated with gsi.
[2018-06-21 16:26:15.738954 -0500][Debug  ][PostMaster        ] [xrootd.accre.vanderbilt.edu:1094 #0] Stream 0 connected.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2a21700 (LWP 11117)]
operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=<error reading variable: Cannot access memory at address 0x0>, __os=...) at /usr/include/c++/4.8.2/bits/basic_string.h:2758
warning: Source file is more recent than executable.
2758	      return __ostream_insert(__os, __str.data(), __str.size());
Missing separate debuginfos, use: debuginfo-install libstdc++-4.8.5-28.el7_5.1.x86_64
(gdb) bt
#0  operator<< <char, std::char_traits<char>, std::allocator<char> > (__str=<error reading variable: Cannot access memory at address 0x0>, __os=...) at /usr/include/c++/4.8.2/bits/basic_string.h:2758
#1  XrdCl::XRootDMsgHandler::Process (this=0x6215d0, msg=<optimized out>) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClXRootDMsgHandler.cc:480
#2  0x00007ffff7b2c6ee in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7fffec0556d0, arg=<optimized out>) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClStream.hh:284
#3  0x00007ffff7b90daf in XrdCl::JobManager::RunJobs (this=0x61a5e0) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClJobManager.cc:148
#4  0x00007ffff7b91009 in RunRunnerThread (arg=<optimized out>) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClJobManager.cc:33
#5  0x00007ffff6620dc5 in start_thread (arg=0x7ffff2a21700) at pthread_create.c:308
#6  0x00007ffff6b4473d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) frame 2
#2  0x00007ffff7b2c6ee in XrdCl::Stream::HandleIncMsgJob::Run (this=0x7fffec0556d0, arg=<optimized out>) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClStream.hh:284
284	            pHandler->Process( msg );
(gdb) frame 1
#1  XrdCl::XRootDMsgHandler::Process (this=0x6215d0, msg=<optimized out>) at /usr/src/debug/xrootd-4.8.3/src/XrdCl/XrdClXRootDMsgHandler.cc:480
480	        o << urlComponents[0];
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/752

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1