Print

Print


We need, at this point, two pieces of information: 1) What version of
xrootd are you running, and 2) what version of Linux are you running (i.e.
the distro and version and the kernel version).

On Sun, 9 Apr 2023, Geonmo Ryu wrote:

> Hello, everyone.
>
> Recently, the xrootd server to support the CMS AAA service has been restarting frequently, and I wanted to report this error. The server is for supporting the CMS AAA service and normally has around 100 sessions open, sometimes increasing to around 1000 during peak times. Recently, we have noticed that the xrootd server is constantly aborting and being restarted by systemd. I've checked the generated core dump files and they all seem to be failing in the same place, so I thought I'd report it. Below is the output of the bt command using the core dump.
> ```bash
> (gdb) bt
> #0 0x00007fd5d7315387 in raise () from /lib64/libc.so.6
> #1 0x00007fd5d7316a78 in abort () from /lib64/libc.so.6
> #2 0x00007fd5d76baf2f in futex_wake (private=<optimized out>, processes_to_wake=1, futex=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_post.c:49
> #3 __new_sem_post (sem=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_post.c:80
> #4 0x00007fd5d886b2a5 in Post (this=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.hh:505
> #5 XrdXrootdFile::Ref ***@***.***=0x7fd300065010, ***@***.***=-1) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdFile.cc:173
> #6 0x00007fd5d888204d in XrdXrootdProtocol::do_Close ***@***.***=0x7fd30001e100) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdXeq.cc:574
> #7 0x00007fd5d88780ee in XrdXrootdProtocol::Process2 (this=0x7fd30001e100) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdProtocol.cc:492
> #8 0x00007fd5d85bade6 in XrdLinkXeq::DoIt (this=0x7fd364132800) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLinkXeq.cc:320
> #9 0x00007fd5d85be22a in XrdScheduler::Run (this=0x614e60 <XrdGlobal::Sched>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:406
> #10 0x00007fd5d85be349 in XrdStartWorking (carg=<optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:89
> #11 0x00007fd5d854b6d7 in XrdSysThread_Xeq (myargs=0x7fd39c007d80) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
> #12 0x00007fd5d76b4ea5 in start_thread (arg=0x7fd1668e1700) at pthread_create.c:307
> #13 0x00007fd5d73dd98d in clone () from /lib64/libc.so.6
> ```
> Finally, the abort() function is executed on frame 1, and its code is shown below.
>
> ```bash
> 31 futex_wake (unsigned int* futex, int processes_to_wake, int private)
> 32 {
> 33 int res = lll_futex_wake (futex, processes_to_wake, private);
> 34 /* No error. Ignore the number of woken processes. */
> 35 if (res >= 0)
> 36 return;
> 37 switch (res)
> 38 {
> 39 case -EFAULT: /* Could have happened due to memory reuse. */
> 40 case -EINVAL: /* Could be either due to incorrect alignment (a bug in
> 41 glibc or in the application) or due to memory being
> 42 reused for a PI futex. We cannot distinguish between the
> 43 two causes, and one of them is correct use, so we do not
> 44 act in this case. */
> 45 return;
> 46 case -ENOSYS: /* Must have been caused by a glibc bug. */
> 47 /* No other errors are documented at this time. */
> 48 default:
> 49 abort ();
> 50 }
> 51 }
> ```
> The frame is used in glibc's sema_post, and I know that errors other than EINVAL are not defined in the sema_post function, but here it appears that the program was stopped by ENOSYS or some other error. In frame 5, the h_semaphore variable held by syncWait has a normal sem_t union value, so I don't think I gave the wrong sem_t as an argument to sem_post.
>
> ```bash
> #5 XrdXrootdFile::Ref ***@***.***=0x7fd300065010, ***@***.***=-1) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdFile.cc:173
> 173 if (num < 0 && syncWait && refCount <= 0) syncWait->Post();
> (gdb) p* syncWait
> $1 = {h_semaphore = {__size = "\021P\006\000\323\177\000\000\245\262\206\330\325\177\000\000\000\341\001\000\323\177\000\000\000\341\001\000\323\177\000", __align = 140544215240721}}
>
> ```
> There's nothing more I can do to investigate, so I'm hoping the xrootd developers will take care of it. If you need the dump core file, contact me privately and I'll upload it for you - it's about 2-3GB.
>
>
> Translated with www.DeepL.com/Translator (free version)
>
> --
> Reply to this email directly or view it on GitHub:
> https://github.com/xrootd/xrootd/issues/1995
> You are receiving this because you are subscribed to this thread.
>
> Message ID: ***@***.***>


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <xrootd/xrootd/issues/1995/1501409003@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1995#issuecomment-1501409003", "url": "https://github.com/xrootd/xrootd/issues/1995#issuecomment-1501409003", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1