Hi Fabrice, It's a great news ! Thanks to all, Dominique On 03/13/2015 03:01 PM, Fabrice Jammes wrote: > Thanks to both Andy, and Daniel, Qserv worker is now running on in2p3 > cluster. > > whew, it's a good thing ;-) > > On 03/13/2015 02:40 PM, Salnikov, Andrei A. wrote: >> Hi Fabrice, >> >> thanks! Check my other email which I sent right before noon and see >> if it fixes crash. >> >> Cheers, >> Andy >> >> Fabrice Jammes wrote on 2015-03-13: >>> Hi Andy, >>> >>> I just checked your email. I met this problem but only on cmsd worker. >>> With Daniel help, I have fixed it in u/fjammes/DM-213. See last commits >>> : >>> https://github.com/lsst/qserv/commit/d5abe798cfceadbcafd0b555de5c986d4a3c >>> >>> 7 605 >>> https://github.com/lsst/qserv/commit/a784bce2f5bb9f1f1a4631250054838084c9 >>> >>> 5 619 >>> >>> I'll try what you propose for log pb. Thanks! >>> >>> >>> >>> On 03/13/2015 10:56 AM, Salnikov, Andrei A. wrote: >>>> Hi Fabrice, >>>> >>>> is master (git master) supposed to work? I get weird message >>>> from cmsd when I try to start Qserv (both worker and master): >>>> >>>> ------ [log in to unmask] phase 1 server >>>> initialization completed. ++++++ [log in to unmask] >>>> phase 2 server initialization started. Plugin >>>> /usr/local/home/salnikov/dm-2294/lib/libxrdoss.so: undefined symbol: >>>> _ZN6google8protobuf8internal12kEmptyStringE osslib libxrdoss.so ------ >>>> [log in to unmask] phase 2 server initialization >>>> failed. 150313 12:30:02 15616 XrdProtocol: Protocol cmsd could not be >>>> loaded ------ cmsd [log in to unmask]:1094 >>>> initialization failed. >>>> >>>> if I do 'ldd -r lib/libxrdoss.so' I indeed see few unresolved >>>> symbols in it. >>>> >>>> Cheers, >>>> Andy >>>> >>>> Fabrice Jammes wrote on 2015-03-13: >>>>> The way is to start a Qserv master and then a Qserv worker. cmsd will >>>>> then fails on the worker. There's a draft script which does that for >>>>> in2p3 cluster here: https://github.com/lsst/qserv/blob/u/fjammes/DM- >>>>> 213/admin/tools/cluster/cc-in2p3/install-puppet-node.sh You can >>>>> cut and >>>>> paste the configuration part from this one (but not the install part >>>>> which is based on in2p3 shared fs) >>>>> >>>>> You also have to install/setup xrootd built in debug mode if you want >>>>> to debug the produced core dump: eups distrib install xrootd >>>>> xssi-1.0.0 >>>>> >>>>> Please let me know. Vaikunth is installing multi-node Qserv at NCSA, >>>>> so he may have additional informations. >>>>> >>>>> I hope your in2p3 account will be ready soon. The Qserv setup is >>>>> already done here. >>>>> >>>>> Thanks, >>>>> >>>>> On 03/13/2015 07:04 AM, Salnikov, Andrei A. wrote: >>>>>> Hi Fabrice, >>>>>> >>>>>> I'd like to reproduce it to understand exactly what happens, >>>>>> is there a way to repeat it on NCSA machines? >>>>>> >>>>>> Cheers, >>>>>> Andy >>>>>> >>>>>> >>>>>> Fabrice Jammes wrote on 2015-03-13: >>>>>>> Hi Andy, >>>>>>> >>>>>>> cmsd crash happen when starting worker node. I can easily test the >>>>>>> fix if needed. It is blocking for in2p3 cluster install. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> On 03/12/2015 11:04 PM, Salnikov, Andrei A. wrote: >>>>>>>> Fabrice, can you tell me what did you do to make it crash? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Andy >>>>>>>> >>>>>>>> >>>>>>>> Andrew Hanushevsky wrote on 2015-03-12: >>>>>>>>> Hi Andy, >>>>>>>>> >>>>>>>>> Just run what Fabrice ran and it will core right away since >>>>>>>>> XrdSysError wasn't told about XrdSysLogger and eventually the >>>>>>>>> message is routed to that object. >>>>>>>>> >>>>>>>>> Andy >>>>>>>>> >>>>>>>>> On Fri, 13 Mar 2015, Salnikov, Andrei A. wrote: >>>>>>>>> >>>>>>>>>> Hi Andy, >>>>>>>>>> >>>>>>>>>> I'm confused, the little piece of code below is the fix, right? >>>>>>>>>> Is it possible for me to reproduce crash? I'll try to remember >>>>>>>>>> tomorrow what I did for logging, maybe I could see it better. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Andy >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Andrew Hanushevsky wrote on 2015-03-12: >>>>>>>>>>> Hi Andy, >>>>>>>>>>> >>>>>>>>>>> I don't think there is a fix for this. Could you fix it? I >>>>>>>>>>> already have a Jira entry for rewriting that portion of the >>>>>>>>>>> code >>>>>>>>>>> once I get a handle on how to deal with the repo. >>>>>>>>>>> >>>>>>>>>>> Andy >>>>>>>>>>> >>>>>>>>>>> On Thu, 12 Mar 2015, Salnikov, Andrei A. wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Andy, >>>>>>>>>>>> >>>>>>>>>>>> sorry for the mess I created. I do not remember exactly >>>>>>>>>>>> what is >>>>>>>>>>>> involved there, but if you say that it fixes it then I'm >>>>>>>>>>>> happy. >>>>>>>>>>>> Are you going to make that fix or do you want me to do it? Is >>>>>>>>>>>> there a ticket for it already? >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Andy >>>>>>>>>>>> >>>>>>>>>>>> Andrew Hanushevsky wrote on 2015-03-12: >>>>>>>>>>>>> Hi Andy, >>>>>>>>>>>>> >>>>>>>>>>>>> Daniel told me that you did the log4cxx integration for this >>>>>>>>>>>>> part of qserv. Unfortunately, the underlying xrootd object >>>>>>>>>>>>> that send messages to the log was not initialized when the >>>>>>>>>>>>> XrdOssGetSS(Logger, ...) is called. This results in a SEGV >>>>>>>>>>>>> when a message is issued. I know this is old code and now >>>>>>>>>>>>> there is a better way of doing all of this (and I will do >>>>>>>>>>>>> that >>>>>>>>>>>>> later) for now the way to fix this is simply insert this code >>>>>>>>>>>>> snippedin the initalization section: >>>>>>>>>>>>> >>>>>>>>>>>>> #include "XrdSys/XrdSysError.hh" >>>>>>>>>>>>> >>>>>>>>>>>>> namespace XrdSsi >>>>>>>>>>>>> { >>>>>>>>>>>>> extern XrdSysError Log; >>>>>>>>>>>>> }; >>>>>>>>>>>>> >>>>>>>>>>>>> XrdSsi::Log.logger(Logger); // Logger is the arg in >>>>>>>>>>>>> XrdOssGetSS() >>>>>>>>>>>>> >>>>>>>>>>>>> OK? Like I said this will become far simpler on the next >>>>>>>>>>>>> iteration. >>>>>>>>>>>>> >>>>>>>>>>>>> Andy >>>>>>>>>>>>> >>>>>>>>>>>>> ---------- Forwarded message ---------- >>>>>>>>>>>>> Date: Thu, 12 Mar 2015 14:14:12 -0700 >>>>>>>>>>>>> From: Fabrice Jammes <[log in to unmask]> >>>>>>>>>>>>> To: Andrew Hanushevsky <[log in to unmask]> >>>>>>>>>>>>> Subject: cmsd core log >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Andy, >>>>>>>>>>>>> >>>>>>>>>>>>> We introduced logging in this module a few months ago but our >>>>>>>>>>>>> mono-node integration test doesn't launch cmsd, so it was >>>>>>>>>>>>> never tested. It may be our logging library (lsst::log which >>>>>>>>>>>>> relies on log4cxx) which is faulty? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Fabrice >>>>>>>>>>>>> >>>>>>>>>>>>> Program terminated with signal 11, Segmentation fault. #0 >>>>>>>>>>>>> 0x00007fdb86a9fc10 in pthread_mutex_lock () from >>>>>>>>>>>>> /lib64/libpthread.so.0 Missing separate debuginfos, use: >>>>>>>>>>>>> debuginfo-install expat-2.1.0- 8.el7.x86_64 >>>>>>>>>>>>> glibc-2.17-55.el7_0.5.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 >>>>>>>>>>>>> krb5-libs-1.11.3-49.el7.x86_64 libcom_err-1.42.9-4.el7.x86_64 >>>>>>>>>>>>> libgcc-4.8.2-16.2.el7_0.x86_64 libicu-50.1.2-11.el7.x86_64 >>>>>>>>>>>>> libselinux-2.2.2-6.el7.x86_64 >>>>>>>>>>>>> libstdc++-4.8.2-16.2.el7_0.x86_64 >>>>>>>>>>>>> nss-softokn-freebl-3.16.2.3-1.el7_0.x86_64 >>>>>>>>>>>>> openssl-libs-1.0.1e-34.el7_0.7.x86_64 pcre-8.32-12.el7.x86_64 >>>>>>>>>>>>> xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64 >>>>>>>>>>>>> (gdb) >>>>>>>>>>>>> where #0 0x00007fdb86a9fc10 in pthread_mutex_lock () from >>>>>>>>>>>>> /lib64/libpthread.so.0 #1 0x000000000041ed54 in >>>>>>>>>>>>> XrdSysMutex::Lock (this=0x10) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/./XrdSys/XrdSysPthread.hh:149 #2 >>>>>>>>>>>>> 0x00007fdb86ceb2e6 in XrdSysLogger::Put (this=0x0, iovcnt=2, >>>>>>>>>>>>> iov=0x7fff589f1800) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdSys/XrdSysLogger.cc:270 #3 >>>>>>>>>>>>> 0x00007fdb86cea817 in XrdSysError::Say (this=0x7fdb7f3aa7e0 >>>>>>>>>>>>> <XrdSsi::Log>, txt1=0x1f1efb8 "0312 20:09:50.451 >>>>>>>>>>>>> [0x7fdb87464740] INFO QservOss (build/xrdoss/QservOss.cc:260) >>>>>>>>>>>>> - QservOss Init", txt2=0x0, >>>>>>>>>>>>> txt3=0x0, txt4=0x0, txt5=0x0, >>>>>>>>>>>>> txt6=txt6@entry=0x0) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdSys/XrdSysError.cc:163 #4 >>>>>>>>>>>>> 0x00007fdb7f19e638 in XrdSsiLogger::Msg (pfx=<optimized out>, >>>>>>>>>>>>> txt1=<optimized out>, txt2=<optimized out>, txt3=<optimized >>>>>>>>>>>>> out>) >>>>>>>>>>>>> at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-4.0.0rc3- >>>>>>>>>>> qsClient2.lsst1/xrootd- >>>>>>>>>>>>> 4.0.0rc3-qsClient2.lsst1/src/XrdSsi/XrdSsiLogger.cc:66 #5 >>>>>>>>>>>>> 0x00007fdb84f38a0d in >>>>>>>>>>>>> lsst::qserv::wlog::XrootdAppender::append >>>>>>>>>>>>> (this=0x1f2b9e0, event=..., p=...) at >>>>>>> build/wlog/XrootdAppender.cc:57 >>>>>>>>>>> #6 >>>>>>>>>>>>> 0x00007fdb841ebd1e in log4cxx::AppenderSkeleton::doAppend >>>>>>>>>>>>> (this=0x1f2b9e0, event=..., pool1=...) at >>>>>>>>>>>>> appenderskeleton.cpp:142 #7 0x00007fdb841e9df4 in >>>>>>>>>>>>> log4cxx::helpers::AppenderAttachableImpl::appendLoopOnAppenders >>>>>>>>>>>>> >>>>>>>>>>>>> (this=0x1f2bd20, event=..., p=...) at >>>>>>>>>>>>> appenderattachableimpl.cpp:69 #8 0x00007fdb8422f6d5 in >>>>>>>>>>>>> log4cxx::Logger::callAppenders (this=this@entry=0x1f2bd90, >>>>>>>>>>>>> event=..., p=...) at logger.cpp:93 #9 0x00007fdb8422fa1f in >>>>>>>>>>>>> log4cxx::Logger::forcedLog (this=0x1f2bd90, level1=..., >>>>>>>>>>>>> message="QservOss Init", location=...) at logger.cpp:124 #10 >>>>>>>>>>>>> 0x00007fdb84500412 in lsst::log::Log::vlog (logger=..., >>>>>>>>>>>>> level=..., filename="build/xrdoss/QservOss.cc", >>>>>>>>>>>>> funcname="virtual int >>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::Init(XrdSysLogger*, const >>>>>>> char*)", >>>>>>>>>>>>> lineno=lineno@entry=260, fmt=fmt@entry=0x7fdb84f5a7bb >>>>>>> "QservOss >>>>>>>>>>>>> Init", args=args@entry=0x7fff589f2018) at src/Log.cc:357 #11 >>>>>>>>>>>>> 0x00007fdb84500734 in lsst::log::Log::log (logger=..., >>>>>>>>>>>>> level=..., filename="build/xrdoss/QservOss.cc", >>>>>>>>>>>>> funcname="virtual int >>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::Init(XrdSysLogger*, const >>>>>>>>>>>>> char*)", lineno=260, >>>>>>>>>>>>> fmt=0x7fdb84f5a7bb "QservOss Init") at >>>>>>>>>>>>> src/Log.cc:386 >>>>>>>>>>>>> #12 0x00007fdb84f4e4ac in lsst::qserv::xrdoss::QservOss::Init >>>>>>>>>>>>> (this=0x1f1fab0, log=0x0, cfgFn=0x0) at >>>>>>>>>>>>> build/xrdoss/QservOss.cc:260 #13 0x00007fdb84f4d4f7 in >>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::QservOss รง(this=0x1f1fab0) at >>>>>>>>>>>>> build/xrdoss/QservOss.cc:135 #14 0x00007fdb84f4d2f4 in >>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::getInstance () at >>>>>>>>>>>>> build/xrdoss/QservOss.cc:101 #15 0x00007fdb84f4e8f8 in >>>>>>>>>>>>> XrdOssGetStorageSystem (native_oss=0x7fdb87253100 >>>>>>>>>>>>> <XrdOssGetSS(XrdSysLogger*, char const*, char const*, char >>>>>>>>>>>>> const*, XrdOucEnv*, XrdVersionInfo&)::myOssSys>, >>>>>>>>>>>>> Logger=0x65b3b8 <XrdMain::Config+248>, >>> config_fn=0x1efec40 >>>>>>>>>>>>> "/qserv/qserv-run/etc/lsp.cf", parms=0x0) at >>>>>>>>>>>>> build/xrdoss/QservOss.cc:293 #16 0x00007fdb86ff484d in >>>>>>>>>>>>> XrdOssGetSS (Logger=0x65b3b8 <XrdMain::Config+248>, >>>>>>>>>>>>> config_fn=0x1efec40 "/qserv/qserv-run/etc/lsp.cf", >>>>>>>>>>>>> OssLib=0x1efef70 "libxrdoss.so", OssParms=0x0, envP=0x0, >>>>>>>>>>>>> urVer=...) >>>>>>>>>>>>> at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdOss/XrdOssApi.cc:135 #17 >>>>>>>>>>>>> 0x000000000042f08d >>>>>>>>>>>>> in XrdCmsConfig::ConfigOSS (this=0x65d340 >>>>>>>>>>>>> <XrdCms::Config>) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsConfig.cc:777 #18 >>>>>>>>>>>>> 0x000000000042dbc8 in XrdCmsConfig::Configure2 (this=0x65d340 >>>>>>>>>>>>> <XrdCms::Config>) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsConfig.cc:408 #19 >>>>>>>>>>>>> 0x0000000000443754 in XrdgetProtocol (pname=0x1eef920 "cmsd", >>>>>>>>>>>>> parms=0x0, pi=0x65b2c0 <XrdMain::Config>) >>>>>>>>>>>>> at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsProtocol.cc:121 #20 >>>>>>>>>>>>> 0x000000000041de24 in XrdProtLoad::getProtocol (lname=0x0, >>>>>>>>>>>>> pname=0x1eef920 "cmsd", parms=0x0, pi=0x65b2c0 >>>>>>>>>>>>> <XrdMain::Config>) >>>>>>>>>>>>> at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdProtLoad.cc:247 #21 0x000000000041d8b4 >>>>>>>>>>>>> in XrdProtLoad::Load (lname=0x0, pname=0x1eef920 "cmsd", >>>>>>>>>>>>> parms=0x0, pi=0x65b2c0 <XrdMain::Config>) >>>>>>>>>>>>> at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdProtLoad.cc:101 #22 0x000000000041a5b7 >>>>>>>>>>>>> in XrdConfig::Setup (this=0x65b2c0 <XrdMain::Config>, >>>>>>>>>>>>> dfltp=0x7fff589f3d4e "cmsd") at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdConfig.cc:1028 #23 0x00000000004187de >>>>>>>>>>>>> in XrdConfig::Configure (this=0x65b2c0 <XrdMain::Config>, >>>>>>>>>>>>> argc=9, argv=0x7fff589f3318) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdConfig.cc:509 #24 >>>>>>>>>>>>> 0x000000000041f13b in >>>>>>>>>>>>> main (argc=9, argv=0x7fff589f3318) at >>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd- >>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdMain.cc:179 >>>>>> >>>>>> ###################################################################### >>>>>> >>>>>> ## Use REPLY-ALL to reply to list >>>>>> >>>>>> To unsubscribe from the QSERV-L list, click the following link: >>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the QSERV-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the QSERV-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1 -- Dominique Boutigny - CNRS / CC-IN2P3 Now at SLAC National Accelerator Laboratory Kavli Building - Room 221 Mail : [log in to unmask] - [log in to unmask] Office : +1 650-926-5759 - Cellular : +1 774-232-0912 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the QSERV-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1