~25000 "xrdcp" coredumps have been recorded in ntofpro Batch jobs this morning, happening on 9 worker nodes between 01:34 and 03:56. The commands issued look like xrdcp -f -N root://eospublic.cern.ch//eos/experiment/ntof/processing/TAC_DATA/114471/run114471.idx.finished /tmp/ntofpro/run114471.idx.finished Note that we did not record any "xrdcp" coredumps from other users. A similar burst of "xrdcp" coredumps happened on Sep 28, when we recorded >12000 cores on 2 servers, between 02:42 and 03:50. Attached you can see some details of the crash, including a backtrace: ``` PID: 125003 (xrdcp) UID: 95759 (ntofpro) GID: 2348 (za) Signal: 11 (SEGV) Timestamp: Tue 2022-10-04 01:34:49 CEST (9h ago) Command Line: xrdcp -f -N root://eospublic.cern.ch//eos/experiment/ntof/processing/TAC_DATA/114471/run114471.idx.finished /tmp/ntofpro/run114471.idx.finished Executable: /usr/bin/xrdcp Control Group: /system.slice/condor.service Unit: condor.service Slice: system.slice Boot ID: 1615e66df0174904b3821ad44a980f28 Machine ID: 7aea158db12945aebe58d21e3d64855a Hostname: b7g02p8793.cern.ch Message: Process 125003 (xrdcp) of user 95759 dumped core. Stack trace of thread 30919: #0 0x00002b1fa7f93b30 _ZN5XrdCl10PostMaster13GetJobManagerEv (libXrdCl.so.3) #1 0x00002b1fa804de17 _ZN5XrdCl16LocalFileHandlerC1Ev (libXrdCl.so.3) #2 0x00002b1fa7febce6 _ZN5XrdCl16FileStateHandlerC1ERPNS_10FilePlugInE (libXrdCl.so.3) #3 0x00002b1fa7fdeb75 _ZN5XrdCl4FileC2Eb (libXrdCl.so.3) #4 0x00002b1fa801107f _ZN12_GLOBAL__N_112XRootDSourceC2EPKN5XrdCl3URLEjhRKSsRKSt6vectorISsSaISsEEb (libXrdCl.so.3) #5 0x00002b1fa8014feb _ZN5XrdCl14ClassicCopyJob3RunEPNS_19CopyProgressHandlerE (libXrdCl.so.3) #6 0x00002b1fa7ffca02 _ZN12_GLOBAL__N_113QueuedCopyJob3RunEPv (libXrdCl.so.3) #7 0x00002b1fa7ffe808 _ZN5XrdCl11CopyProcess3RunEPNS_19CopyProgressHandlerE (libXrdCl.so.3) #8 0x000000000040b5c3 main (xrdcp) #9 0x00002b1fa9042555 __libc_start_main (libc.so.6) #10 0x000000000040cdc3 _start (xrdcp) (gdb) bt #0 0x00002b1fa7f93b30 in XrdCl::PostMaster::GetJobManager() () from /lib64/libXrdCl.so.3 #1 0x00002b1fa804de17 in XrdCl::LocalFileHandler::LocalFileHandler() () from /lib64/libXrdCl.so.3 #2 0x00002b1fa7febce6 in XrdCl::FileStateHandler::FileStateHandler(XrdCl::FilePlugIn*&) () from /lib64/libXrdCl.so.3 #3 0x00002b1fa7fdeb75 in XrdCl::File::File(bool) () from /lib64/libXrdCl.so.3 #4 0x00002b1fa801107f in (anonymous namespace)::XRootDSource::XRootDSource(XrdCl::URL const*, unsigned int, unsigned char, std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool) () from /lib64/libXrdCl.so.3 #5 0x00002b1fa8014feb in XrdCl::ClassicCopyJob::Run(XrdCl::CopyProgressHandler*) () from /lib64/libXrdCl.so.3 #6 0x00002b1fa7ffca02 in (anonymous namespace)::QueuedCopyJob::Run(void*) () from /lib64/libXrdCl.so.3 #7 0x00002b1fa7ffe808 in XrdCl::CopyProcess::Run(XrdCl::CopyProgressHandler*) () from /lib64/libXrdCl.so.3 #8 0x000000000040b5c3 in main () ``` -- Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1797 You are receiving this because you are subscribed to this thread. Message ID: <[log in to unmask]> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1