~25000 "xrdcp" coredumps have been recorded in ntofpro Batch jobs this
morning, happening on 9 worker nodes between 01:34 and 03:56. The
commands issued look like
xrdcp -f -N root://eospublic.cern.ch//eos/experiment/ntof/processing/TAC_DATA/114471/run114471.idx.finished /tmp/ntofpro/run114471.idx.finished
Note that we did not record any "xrdcp" coredumps from other users.
A similar burst of "xrdcp" coredumps happened on Sep 28, when we
recorded >12000 cores on 2 servers, between 02:42 and 03:50.
Attached you can see some details of the crash, including a backtrace:
PID: 125003 (xrdcp)
UID: 95759 (ntofpro)
GID: 2348 (za)
Signal: 11 (SEGV)
Timestamp: Tue 2022-10-04 01:34:49 CEST (9h ago)
Command Line: xrdcp -f -N root://eospublic.cern.ch//eos/experiment/ntof/processing/TAC_DATA/114471/run114471.idx.finished /tmp/ntofpro/run114471.idx.finished
Executable: /usr/bin/xrdcp
Control Group: /system.slice/condor.service
Unit: condor.service
Slice: system.slice
Boot ID: 1615e66df0174904b3821ad44a980f28
Machine ID: 7aea158db12945aebe58d21e3d64855a
Hostname: b7g02p8793.cern.ch
Message: Process 125003 (xrdcp) of user 95759 dumped core.
Stack trace of thread 30919:
#0 0x00002b1fa7f93b30 _ZN5XrdCl10PostMaster13GetJobManagerEv (libXrdCl.so.3)
#1 0x00002b1fa804de17 _ZN5XrdCl16LocalFileHandlerC1Ev (libXrdCl.so.3)
#2 0x00002b1fa7febce6 _ZN5XrdCl16FileStateHandlerC1ERPNS_10FilePlugInE (libXrdCl.so.3)
#3 0x00002b1fa7fdeb75 _ZN5XrdCl4FileC2Eb (libXrdCl.so.3)
#4 0x00002b1fa801107f _ZN12_GLOBAL__N_112XRootDSourceC2EPKN5XrdCl3URLEjhRKSsRKSt6vectorISsSaISsEEb (libXrdCl.so.3)
#5 0x00002b1fa8014feb _ZN5XrdCl14ClassicCopyJob3RunEPNS_19CopyProgressHandlerE (libXrdCl.so.3)
#6 0x00002b1fa7ffca02 _ZN12_GLOBAL__N_113QueuedCopyJob3RunEPv (libXrdCl.so.3)
#7 0x00002b1fa7ffe808 _ZN5XrdCl11CopyProcess3RunEPNS_19CopyProgressHandlerE (libXrdCl.so.3)
#8 0x000000000040b5c3 main (xrdcp)
#9 0x00002b1fa9042555 __libc_start_main (libc.so.6)
#10 0x000000000040cdc3 _start (xrdcp)
(gdb) bt
#0 0x00002b1fa7f93b30 in XrdCl::PostMaster::GetJobManager() () from /lib64/libXrdCl.so.3
#1 0x00002b1fa804de17 in XrdCl::LocalFileHandler::LocalFileHandler() () from /lib64/libXrdCl.so.3
#2 0x00002b1fa7febce6 in XrdCl::FileStateHandler::FileStateHandler(XrdCl::FilePlugIn*&) () from /lib64/libXrdCl.so.3
#3 0x00002b1fa7fdeb75 in XrdCl::File::File(bool) () from /lib64/libXrdCl.so.3
#4 0x00002b1fa801107f in (anonymous namespace)::XRootDSource::XRootDSource(XrdCl::URL const*, unsigned int, unsigned char, std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool) () from /lib64/libXrdCl.so.3
#5 0x00002b1fa8014feb in XrdCl::ClassicCopyJob::Run(XrdCl::CopyProgressHandler*) () from /lib64/libXrdCl.so.3
#6 0x00002b1fa7ffca02 in (anonymous namespace)::QueuedCopyJob::Run(void*) () from /lib64/libXrdCl.so.3
#7 0x00002b1fa7ffe808 in XrdCl::CopyProcess::Run(XrdCl::CopyProgressHandler*) () from /lib64/libXrdCl.so.3
#8 0x000000000040b5c3 in main ()
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1