.... and here is the logging output of thread apply all bt full https://www.dropbox.com/s/imdvj8ba5knlvbg/andy.log?dl=0 On Tue, Aug 25, 2015 at 7:24 AM, Tommaso Boccali <[log in to unmask]> wrote: > ciao, andrew, I asked the spanish guys for the cmsd version. > in the meantime i replicated a machine with the 4.2.2 setup, and analyzed > a core file. > > I get: > > [root@xrootd-redic-bck xrootd]# gdb cmsd core.63160 > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) > Copyright (C) 2010 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/bin/cmsd...Reading symbols from > /usr/lib/debug/usr/bin/cmsd.debug...done. > done. > [New Thread 63535] > ... > ... > ... > Missing separate debuginfo for > Try: yum --disablerepo='*' --enablerepo='*-debug*' install > /usr/lib/debug/.build-id/9e/e239647f77340975f782611a1fa728c355ecda > Reading symbols from /usr/lib64/libXrdServer.so.2.0.0...Reading symbols > from /usr/lib/debug/usr/lib64/libXrdServer.so.2.0.0.debug...done. > done. > Loaded symbols for /usr/lib64/libXrdServer.so.2.0.0 > Reading symbols from /usr/lib64/libXrdUtils.so.2.0.0...Reading symbols > from /usr/lib/debug/usr/lib64/libXrdUtils.so.2.0.0.debug...done. > done. > Loaded symbols for /usr/lib64/libXrdUtils.so.2.0.0 > Reading symbols from /lib64/libpthread-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libpthread-2.12.so.debug...done. > [Thread debugging using libthread_db enabled] > done. > Loaded symbols for /lib64/libpthread-2.12.so > Reading symbols from /lib64/librt-2.12.so...Reading symbols from > /usr/lib/debug/lib64/librt-2.12.so.debug...done. > done. > Loaded symbols for /lib64/librt-2.12.so > Reading symbols from /usr/lib64/libstdc++.so.6.0.13...Reading symbols from > /usr/lib/debug/usr/lib64/libstdc++.so.6.0.13.debug...done. > done. > Loaded symbols for /usr/lib64/libstdc++.so.6.0.13 > Reading symbols from /lib64/libm-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libm-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libm-2.12.so > Reading symbols from /lib64/libgcc_s-4.4.6-20120305.so.1...Reading symbols > from /usr/lib/debug/lib64/libgcc_s-4.4.6-20120305.so.1.debug...done. > done. > Loaded symbols for /lib64/libgcc_s-4.4.6-20120305.so.1 > Reading symbols from /lib64/libc-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libc-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libc-2.12.so > Reading symbols from /lib64/libdl-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libdl-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libdl-2.12.so > Reading symbols from /lib64/ld-2.12.so...Reading symbols from > /usr/lib/debug/lib64/ld-2.12.so.debug...done. > done. > Loaded symbols for /lib64/ld-2.12.so > Reading symbols from /lib64/libnss_files-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libnss_files-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libnss_files-2.12.so > Reading symbols from /lib64/libnss_dns-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libnss_dns-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libnss_dns-2.12.so > Reading symbols from /lib64/libresolv-2.12.so...Reading symbols from > /usr/lib/debug/lib64/libresolv-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libresolv-2.12.so > Core was generated by `/usr/bin/cmsd -l /var/log/xrootd/cmsd.log -c > /etc/xrootd/xrootd-redir-cms.cfg -'. > Program terminated with signal 11, Segmentation fault. > #0 __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50 > 50 unsigned int type = PTHREAD_MUTEX_TYPE (mutex); > (gdb) where > #0 __pthread_mutex_lock (mutex=0x30) at pthread_mutex_lock.c:50 > #1 0x0000000000437519 in Lock (this=0x7f3038002160, lp=<value optimized > out>) > at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.hh:149 > #2 Lock (this=0x7f3038002160, lp=<value optimized out>) at > /usr/src/debug/xrootd/xrootd/src/XrdCms/XrdCmsNode.hh:143 > #3 XrdCmsProtocol::Process (this=0x7f3038002160, lp=<value optimized > out>) at /usr/src/debug/xrootd/xrootd/src/XrdCms/XrdCmsProtocol.cc:480 > #4 0x00007f30fa1e6149 in XrdLink::DoIt (this=0x7f3038000b98) at > /usr/src/debug/xrootd/xrootd/src/Xrd/XrdLink.cc:397 > #5 0x00007f30fa1e9625 in XrdScheduler::Run (this=0x647478) at > /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:333 > #6 0x00007f30fa1e9819 in XrdStartWorking (carg=<value optimized out>) at > /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:85 > #7 0x00007f30fa1ae3af in XrdSysThread_Xeq (myargs=0x7f30340008c0) at > /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86 > #8 0x00007f30f9f6f9d1 in start_thread (arg=0x7f30f0aca700) at > pthread_create.c:301 > #9 0x00007f30f9308b6d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > > > > does it help? > > I have many core files, they should be all ~ equivalent. > > I have put one on > https://www.dropbox.com/s/6eqyf2th3oxmk4b/core.64945?dl=0 > > > > > tom > > On Mon, Aug 24, 2015 at 10:14 PM, Andrew Hanushevsky < > [log in to unmask]> wrote: > >> Hi Tommaso, >> >> OK, this is now not critical as I seem to have found the problem. >> However, a backtrace would still be good to have as assurance. Yes, it is a >> bug when invalid login data is encountered. Now we have tto find out why >> this actually happened. >> >> What is wngw.ifca.es actually running (i.e. what cmsd version). >> >> >> Andy >> >> On Mon, 24 Aug 2015, Tommaso Boccali wrote: >> >> ciao andrew, tomorrow I can try but the fact is taht today in the end I >>> had >>> to downgrade, since it is a production server. >>> >>> so i have to reupgrade, take the snapshot and go back as fast as >>> possible :( >>> >>> ciao ciao >>> >>> tom >>> >>> On Mon, Aug 24, 2015 at 9:47 PM, Andrew Hanushevsky < >>> [log in to unmask]> >>> wrote: >>> >>> Hi Tommaso, >>>> >>>> Both daemons (xrootd and cmsd) will exit if you attempt to run them as >>>> root. This is a security feature. You can run them as root but only >>>> after >>>> specifically confirming this via command line options (i.e. you accept >>>> the >>>> risks). As for the SEGV, that's clearly a bug. Is it possible to get a >>>> stack trace of the thread that got the SEGV? Please make sure to install >>>> the debug RPM so we can get actual line numbers. >>>> >>>> Andy >>>> >>>> >>>> On Mon, 24 Aug 2015, Tommaso Boccali wrote: >>>> >>>> uhm, >>>> >>>>> >>>>> - the strace line was just my fault, I was trying running as root on >>>>> the >>>>> command line >>>>> - when I retried with user xrootd, I get instead the lines below, which >>>>> terminate with a segv (*) >>>>> >>>>> so the last message is consistent with the one in the logs: >>>>> >>>>> [pid 65395] writev(2, [{"150824 10:09:53 65395 ", 22}, {"Pup", 3}, {": >>>>> ", >>>>> 2}, {"buffer overrun unpacking", 24}, {" ", 1}, {"short arg 0: ident.", >>>>> 19}, {"\n", 1}], 7) = 72 >>>>> [pid 65395] gettid() = 65395 >>>>> [pid 65395] writev(2, [{"150824 10:09:53 65395 ", 22}, {"Login", 5}, >>>>> {": >>>>> ", >>>>> 2}, {"wngw.ifca.es", 12}, {" ", 1}, {"login failed;", 13}, {" ", 1}, >>>>> {"invalid login data", 18}, {"\n", 1}], 9) = 75 >>>>> >>>>> >>>>> *: >>>>> [pid 65395] <... gettid resumed> ) = 65395 >>>>> [pid 65395] write(2, "150824 10:09:53 65395 ", 22) = 22 >>>>> [pid 65395] write(2, "Xrd", 3) = 3 >>>>> [pid 65395] write(2, "Inet", 4) = 4 >>>>> [pid 65395] write(2, ": ", 2) = 2 >>>>> [pid 65395] write(2, "Accepted connection from ", 25) = 25 >>>>> [pid 65395] write(2, "23", 2) = 2 >>>>> [pid 65395] write(2, "@", 1) = 1 >>>>> [pid 65395] write(2, "wngw.ifca.es", 12) = 12 >>>>> [pid 65395] write(2, "\n", 1) = 1 >>>>> [pid 65395] futex(0x6473c8, FUTEX_WAKE_PRIVATE, 1) = 0 >>>>> [pid 65395] poll([{fd=23, events=POLLIN|POLLRDNORM}], 1, 1000) = 1 >>>>> ([{fd=23, revents=POLLIN|POLLRDNORM}]) >>>>> [pid 65395] recvfrom(23, "\0\0\0\0\0\0\0\0", 8, MSG_PEEK, NULL, NULL) >>>>> = 8 >>>>> [pid 65395] gettid() = 65395 >>>>> [pid 65395] write(2, "150824 10:09:53 65395 ", 22) = 22 >>>>> [pid 65395] write(2, "Xrd", 3) = 3 >>>>> [pid 65395] write(2, "Protocol", 8) = 8 >>>>> [pid 65395] write(2, ": ", 2) = 2 >>>>> [pid 65395] write(2, "matched protocol ", 17) = 17 >>>>> [pid 65395] write(2, "cmsd", 4) = 4 >>>>> [pid 65395] write(2, "\n", 1) = 1 >>>>> [pid 65395] epoll_ctl(12, EPOLL_CTL_ADD, 23, {0, {u32=4160757352, >>>>> u64=140071634214504}}) = 0 >>>>> [pid 65395] gettid() = 65395 >>>>> [pid 65395] write(2, "150824 10:09:53 65395 ", 22) = 22 >>>>> [pid 65395] write(2, "?:[log in to unmask]", 17) = 17 >>>>> [pid 65395] write(2, " ", 1) = 1 >>>>> [pid 65395] write(2, "Xrd", 3) = 3 >>>>> [pid 65395] write(2, "Poll", 4) = 4 >>>>> [pid 65395] write(2, ": ", 2) = 2 >>>>> [pid 65395] write(2, "FD ", 3) = 3 >>>>> [pid 65395] write(2, "23", 2) = 2 >>>>> [pid 65395] write(2, " attached to poller ", 20) = 20 >>>>> [pid 65395] write(2, "2", 1) = 1 >>>>> [pid 65395] write(2, "; num=", 6) = 6 >>>>> [pid 65395] write(2, "1", 1) = 1 >>>>> [pid 65395] write(2, "\n", 1) = 1 >>>>> [pid 65395] poll([{fd=23, events=POLLIN|POLLRDNORM}], 1, 5000) = 1 >>>>> ([{fd=23, revents=POLLIN|POLLRDNORM}]) >>>>> [pid 65395] recvfrom(23, "\0\0\0\0\0\0\0\0", 8, 0, NULL, NULL) = 8 >>>>> [pid 65395] gettid() = 65395 >>>>> [pid 65395] writev(2, [{"150824 10:09:53 65395 ", 22}, {"Pup", 3}, {": >>>>> ", >>>>> 2}, {"buffer overrun unpacking", 24}, {" ", 1}, {"short arg 0: ident.", >>>>> 19}, {"\n", 1}], 7) = 72 >>>>> [pid 65395] gettid() = 65395 >>>>> [pid 65395] writev(2, [{"150824 10:09:53 65395 ", 22}, {"Login", 5}, >>>>> {": >>>>> ", >>>>> 2}, {"wngw.ifca.es", 12}, {" ", 1}, {"login failed;", 13}, {" ", 1}, >>>>> {"invalid login data", 18}, {"\n", 1}], 9) = 75 >>>>> [pid 65395] --- SIGSEGV (Segmentation fault) @ 0 (0) --- >>>>> Process 65395 detached >>>>> [pid 65394] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65389] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65397] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65396] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65379] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65383] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65391] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65392] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65393] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65390] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65386] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65385] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65388] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65387] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65384] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65382] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65381] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65380] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65378] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65377] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65376] +++ killed by SIGSEGV (core dumped) +++ >>>>> [pid 65375] +++ killed by SIGSEGV (core dumped) +++ >>>>> +++ killed by SIGSEGV (core dumped) +++ >>>>> >>>>> On Mon, Aug 24, 2015 at 10:02 AM, Jan Iven <[log in to unmask]> wrote: >>>>> >>>>> On 08/24/2015 09:51 AM, Tommaso Boccali wrote: >>>>> >>>>>> >>>>>> Ciao, I am trying to upgrade one of the CMS EU redirectors from 3.3.6 >>>>>> to >>>>>> >>>>>>> 4.2.2 (with no configuration changed at first approx) >>>>>>> >>>>>>> The problem is that the main cmsd seems to die soon after start. No >>>>>>> real >>>>>>> message in the logs, but with strace I see a very suspect >>>>>>> >>>>>>> writev(2, [{"Copr. 2007 Stanford University/"..., 42}, {"\n", 1}], >>>>>>> 2) = >>>>>>> 43 >>>>>>> geteuid() = 0 >>>>>>> gettid() = 53572 >>>>>>> writev(2, [{"150824 09:47:27 53572 ", 22}, {"Config", 6}, {": ", 2}, >>>>>>> {"Security reasons prohibit cmsd r"..., 73}, {"\n", 1}], 5) = 104 >>>>>>> >>>>>>> >>>>>>> Well, that line ought to end up either on STDERR somewhere or in >>>>>> some log >>>>>> file. Alternatively, suggest "strace -s 1024" to get the full error >>>>>> message.. >>>>>> >>>>>> Cheers >>>>>> jan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Tommaso Boccali >>>>> INFN Pisa >>>>> >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>>>> >>>>> >>>>> >>> >>> -- >>> Tommaso Boccali >>> INFN Pisa >>> >>> > > > -- > Tommaso Boccali > INFN Pisa > -- Tommaso Boccali INFN Pisa ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1