Print

Print


Hi Fabrice,

It's a great news !

Thanks to all,

Dominique

On 03/13/2015 03:01 PM, Fabrice Jammes wrote:
> Thanks to both Andy, and Daniel, Qserv worker is now running on in2p3 
> cluster.
>
> whew, it's a good thing ;-)
>
> On 03/13/2015 02:40 PM, Salnikov, Andrei A. wrote:
>> Hi Fabrice,
>>
>> thanks! Check my other email which I sent right before noon and see
>> if it fixes crash.
>>
>> Cheers,
>> Andy
>>
>> Fabrice Jammes wrote on 2015-03-13:
>>> Hi Andy,
>>>
>>> I just checked your email. I met this problem but only on cmsd worker.
>>> With Daniel help, I have fixed it in u/fjammes/DM-213. See last commits
>>> :
>>> https://github.com/lsst/qserv/commit/d5abe798cfceadbcafd0b555de5c986d4a3c 
>>>
>>> 7 605
>>> https://github.com/lsst/qserv/commit/a784bce2f5bb9f1f1a4631250054838084c9 
>>>
>>> 5 619
>>>
>>> I'll try what you propose for log pb. Thanks!
>>>
>>>
>>>
>>> On 03/13/2015 10:56 AM, Salnikov, Andrei A. wrote:
>>>> Hi Fabrice,
>>>>
>>>> is master (git master) supposed to work? I get weird message
>>>> from cmsd when I try to start Qserv (both worker and master):
>>>>
>>>> ------ [log in to unmask] phase 1 server
>>>> initialization completed. ++++++ [log in to unmask]
>>>> phase 2 server initialization started. Plugin
>>>> /usr/local/home/salnikov/dm-2294/lib/libxrdoss.so: undefined symbol:
>>>> _ZN6google8protobuf8internal12kEmptyStringE osslib libxrdoss.so ------
>>>> [log in to unmask] phase 2 server initialization
>>>> failed. 150313 12:30:02 15616 XrdProtocol: Protocol cmsd could not be
>>>> loaded ------ cmsd [log in to unmask]:1094
>>>> initialization failed.
>>>>
>>>> if I do 'ldd -r lib/libxrdoss.so' I indeed see few unresolved
>>>> symbols in it.
>>>>
>>>> Cheers,
>>>> Andy
>>>>
>>>> Fabrice Jammes wrote on 2015-03-13:
>>>>> The way is to start a Qserv master and then a Qserv worker. cmsd will
>>>>> then fails on the worker. There's a draft script which does that for
>>>>> in2p3 cluster here: https://github.com/lsst/qserv/blob/u/fjammes/DM-
>>>>> 213/admin/tools/cluster/cc-in2p3/install-puppet-node.sh You can 
>>>>> cut and
>>>>> paste the configuration part from this one (but not the install part
>>>>> which is based on in2p3 shared fs)
>>>>>
>>>>> You also have to install/setup xrootd built in debug mode if you want
>>>>> to debug the produced core dump: eups distrib install xrootd 
>>>>> xssi-1.0.0
>>>>>
>>>>> Please let me know. Vaikunth is installing multi-node Qserv at NCSA,
>>>>> so he may have additional informations.
>>>>>
>>>>> I hope your in2p3 account will be ready soon. The Qserv setup is
>>>>> already done here.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> On 03/13/2015 07:04 AM, Salnikov, Andrei A. wrote:
>>>>>> Hi Fabrice,
>>>>>>
>>>>>> I'd like to reproduce it to understand exactly what happens,
>>>>>> is there a way to repeat it on NCSA machines?
>>>>>>
>>>>>> Cheers,
>>>>>> Andy
>>>>>>
>>>>>>
>>>>>> Fabrice Jammes wrote on 2015-03-13:
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> cmsd crash happen when starting worker node. I can easily test the
>>>>>>> fix if needed. It is blocking for in2p3 cluster install.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> On 03/12/2015 11:04 PM, Salnikov, Andrei A. wrote:
>>>>>>>> Fabrice, can you tell me what did you do to make it crash?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Andy
>>>>>>>>
>>>>>>>>
>>>>>>>> Andrew Hanushevsky wrote on 2015-03-12:
>>>>>>>>> Hi Andy,
>>>>>>>>>
>>>>>>>>> Just run what Fabrice ran and it will core right away since
>>>>>>>>> XrdSysError wasn't told about XrdSysLogger and eventually the
>>>>>>>>> message is routed to that object.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>> On Fri, 13 Mar 2015, Salnikov, Andrei A. wrote:
>>>>>>>>>
>>>>>>>>>> Hi Andy,
>>>>>>>>>>
>>>>>>>>>> I'm confused, the little piece of code below is the fix, right?
>>>>>>>>>> Is it possible for me to reproduce crash? I'll try to remember
>>>>>>>>>> tomorrow what I did for logging, maybe I could see it better.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Andy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrew Hanushevsky wrote on 2015-03-12:
>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>
>>>>>>>>>>> I don't think there is a fix for this. Could you fix it? I
>>>>>>>>>>> already have a Jira entry for rewriting that portion of the 
>>>>>>>>>>> code
>>>>>>>>>>> once I get a handle on how to deal with the repo.
>>>>>>>>>>>
>>>>>>>>>>> Andy
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 12 Mar 2015, Salnikov, Andrei A. wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>
>>>>>>>>>>>> sorry for the mess I created. I do not remember exactly 
>>>>>>>>>>>> what is
>>>>>>>>>>>> involved there, but if you say that it fixes it then I'm 
>>>>>>>>>>>> happy.
>>>>>>>>>>>> Are you going to make that fix or do you want me to do it? Is
>>>>>>>>>>>> there a ticket for it already?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Andy
>>>>>>>>>>>>
>>>>>>>>>>>> Andrew Hanushevsky wrote on 2015-03-12:
>>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniel told me that you did the log4cxx integration for this
>>>>>>>>>>>>> part of qserv. Unfortunately, the underlying xrootd object
>>>>>>>>>>>>> that send messages to the log was not initialized when the
>>>>>>>>>>>>> XrdOssGetSS(Logger, ...) is called. This results in a SEGV
>>>>>>>>>>>>> when a message is issued. I know this is old code and now
>>>>>>>>>>>>> there is a better way of doing all of this (and I will do 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> later) for now the way to fix this is simply insert this code
>>>>>>>>>>>>> snippedin the initalization section:
>>>>>>>>>>>>>
>>>>>>>>>>>>> #include "XrdSys/XrdSysError.hh"
>>>>>>>>>>>>>
>>>>>>>>>>>>> namespace XrdSsi
>>>>>>>>>>>>> {
>>>>>>>>>>>>> extern XrdSysError     Log;
>>>>>>>>>>>>> };
>>>>>>>>>>>>>
>>>>>>>>>>>>> XrdSsi::Log.logger(Logger); // Logger is the arg in
>>>>>>>>>>>>> XrdOssGetSS()
>>>>>>>>>>>>>
>>>>>>>>>>>>> OK? Like I said this will become far simpler on the next
>>>>>>>>>>>>> iteration.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andy
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>> Date: Thu, 12 Mar 2015 14:14:12 -0700
>>>>>>>>>>>>> From: Fabrice Jammes <[log in to unmask]>
>>>>>>>>>>>>> To: Andrew Hanushevsky <[log in to unmask]>
>>>>>>>>>>>>> Subject: cmsd core log
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Andy,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We introduced logging in this module a few months ago but our
>>>>>>>>>>>>> mono-node integration test doesn't launch cmsd, so it was
>>>>>>>>>>>>> never tested. It may be our logging library (lsst::log which
>>>>>>>>>>>>> relies on log4cxx) which is faulty?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fabrice
>>>>>>>>>>>>>
>>>>>>>>>>>>> Program terminated with signal 11, Segmentation fault. #0
>>>>>>>>>>>>> 0x00007fdb86a9fc10 in pthread_mutex_lock () from
>>>>>>>>>>>>> /lib64/libpthread.so.0 Missing separate debuginfos, use:
>>>>>>>>>>>>> debuginfo-install expat-2.1.0- 8.el7.x86_64
>>>>>>>>>>>>> glibc-2.17-55.el7_0.5.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
>>>>>>>>>>>>> krb5-libs-1.11.3-49.el7.x86_64 libcom_err-1.42.9-4.el7.x86_64
>>>>>>>>>>>>> libgcc-4.8.2-16.2.el7_0.x86_64 libicu-50.1.2-11.el7.x86_64
>>>>>>>>>>>>> libselinux-2.2.2-6.el7.x86_64
>>>>>>>>>>>>> libstdc++-4.8.2-16.2.el7_0.x86_64
>>>>>>>>>>>>> nss-softokn-freebl-3.16.2.3-1.el7_0.x86_64
>>>>>>>>>>>>> openssl-libs-1.0.1e-34.el7_0.7.x86_64 pcre-8.32-12.el7.x86_64
>>>>>>>>>>>>> xz-libs-5.1.2-9alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64 
>>>>>>>>>>>>> (gdb)
>>>>>>>>>>>>> where #0 0x00007fdb86a9fc10 in pthread_mutex_lock () from
>>>>>>>>>>>>> /lib64/libpthread.so.0 #1 0x000000000041ed54 in
>>>>>>>>>>>>> XrdSysMutex::Lock (this=0x10) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/./XrdSys/XrdSysPthread.hh:149 #2
>>>>>>>>>>>>> 0x00007fdb86ceb2e6 in XrdSysLogger::Put (this=0x0, iovcnt=2,
>>>>>>>>>>>>> iov=0x7fff589f1800) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdSys/XrdSysLogger.cc:270 #3
>>>>>>>>>>>>> 0x00007fdb86cea817 in XrdSysError::Say (this=0x7fdb7f3aa7e0
>>>>>>>>>>>>> <XrdSsi::Log>, txt1=0x1f1efb8 "0312 20:09:50.451
>>>>>>>>>>>>> [0x7fdb87464740] INFO QservOss (build/xrdoss/QservOss.cc:260)
>>>>>>>>>>>>> - QservOss Init", txt2=0x0,
>>>>>>>>>>>>>          txt3=0x0, txt4=0x0, txt5=0x0, 
>>>>>>>>>>>>> txt6=txt6@entry=0x0) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdSys/XrdSysError.cc:163 #4
>>>>>>>>>>>>> 0x00007fdb7f19e638 in XrdSsiLogger::Msg (pfx=<optimized out>,
>>>>>>>>>>>>> txt1=<optimized out>, txt2=<optimized out>, txt3=<optimized
>>>>>>>>>>>>> out>)
>>>>>>>>>>>>>          at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-4.0.0rc3-
>>>>>>>>>>> qsClient2.lsst1/xrootd-
>>>>>>>>>>>>> 4.0.0rc3-qsClient2.lsst1/src/XrdSsi/XrdSsiLogger.cc:66 #5
>>>>>>>>>>>>> 0x00007fdb84f38a0d in 
>>>>>>>>>>>>> lsst::qserv::wlog::XrootdAppender::append
>>>>>>>>>>>>> (this=0x1f2b9e0, event=..., p=...) at
>>>>>>> build/wlog/XrootdAppender.cc:57
>>>>>>>>>>> #6
>>>>>>>>>>>>> 0x00007fdb841ebd1e in log4cxx::AppenderSkeleton::doAppend
>>>>>>>>>>>>> (this=0x1f2b9e0, event=..., pool1=...) at
>>>>>>>>>>>>> appenderskeleton.cpp:142 #7 0x00007fdb841e9df4 in
>>>>>>>>>>>>> log4cxx::helpers::AppenderAttachableImpl::appendLoopOnAppenders 
>>>>>>>>>>>>>
>>>>>>>>>>>>> (this=0x1f2bd20, event=..., p=...) at
>>>>>>>>>>>>> appenderattachableimpl.cpp:69 #8 0x00007fdb8422f6d5 in
>>>>>>>>>>>>> log4cxx::Logger::callAppenders (this=this@entry=0x1f2bd90,
>>>>>>>>>>>>> event=..., p=...) at logger.cpp:93 #9 0x00007fdb8422fa1f in
>>>>>>>>>>>>> log4cxx::Logger::forcedLog (this=0x1f2bd90, level1=...,
>>>>>>>>>>>>> message="QservOss Init", location=...) at logger.cpp:124 #10
>>>>>>>>>>>>> 0x00007fdb84500412 in lsst::log::Log::vlog (logger=...,
>>>>>>>>>>>>> level=..., filename="build/xrdoss/QservOss.cc",
>>>>>>>>>>>>> funcname="virtual int
>>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::Init(XrdSysLogger*, const
>>>>>>> char*)",
>>>>>>>>>>>>> lineno=lineno@entry=260, fmt=fmt@entry=0x7fdb84f5a7bb
>>>>>>> "QservOss
>>>>>>>>>>>>> Init", args=args@entry=0x7fff589f2018) at src/Log.cc:357 #11
>>>>>>>>>>>>> 0x00007fdb84500734 in lsst::log::Log::log (logger=...,
>>>>>>>>>>>>> level=..., filename="build/xrdoss/QservOss.cc",
>>>>>>>>>>>>> funcname="virtual int
>>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::Init(XrdSysLogger*, const
>>>>>>>>>>>>> char*)", lineno=260,
>>>>>>>>>>>>>          fmt=0x7fdb84f5a7bb "QservOss Init") at 
>>>>>>>>>>>>> src/Log.cc:386
>>>>>>>>>>>>> #12 0x00007fdb84f4e4ac in lsst::qserv::xrdoss::QservOss::Init
>>>>>>>>>>>>> (this=0x1f1fab0, log=0x0, cfgFn=0x0) at
>>>>>>>>>>>>> build/xrdoss/QservOss.cc:260 #13 0x00007fdb84f4d4f7 in
>>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::QservOss รง(this=0x1f1fab0) at
>>>>>>>>>>>>> build/xrdoss/QservOss.cc:135 #14 0x00007fdb84f4d2f4 in
>>>>>>>>>>>>> lsst::qserv::xrdoss::QservOss::getInstance () at
>>>>>>>>>>>>> build/xrdoss/QservOss.cc:101 #15 0x00007fdb84f4e8f8 in
>>>>>>>>>>>>> XrdOssGetStorageSystem (native_oss=0x7fdb87253100
>>>>>>>>>>>>> <XrdOssGetSS(XrdSysLogger*, char const*, char const*, char
>>>>>>>>>>>>> const*, XrdOucEnv*, XrdVersionInfo&)::myOssSys>,
>>>>>>>>>>>>>          Logger=0x65b3b8 <XrdMain::Config+248>,
>>> config_fn=0x1efec40
>>>>>>>>>>>>> "/qserv/qserv-run/etc/lsp.cf", parms=0x0) at
>>>>>>>>>>>>> build/xrdoss/QservOss.cc:293 #16 0x00007fdb86ff484d in
>>>>>>>>>>>>> XrdOssGetSS (Logger=0x65b3b8 <XrdMain::Config+248>,
>>>>>>>>>>>>> config_fn=0x1efec40 "/qserv/qserv-run/etc/lsp.cf",
>>>>>>>>>>>>> OssLib=0x1efef70 "libxrdoss.so", OssParms=0x0, envP=0x0,
>>>>>>>>>>>>> urVer=...)
>>>>>>>>>>>>>          at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdOss/XrdOssApi.cc:135 #17 
>>>>>>>>>>>>> 0x000000000042f08d
>>>>>>>>>>>>> in XrdCmsConfig::ConfigOSS (this=0x65d340 
>>>>>>>>>>>>> <XrdCms::Config>) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsConfig.cc:777 #18
>>>>>>>>>>>>> 0x000000000042dbc8 in XrdCmsConfig::Configure2 (this=0x65d340
>>>>>>>>>>>>> <XrdCms::Config>) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsConfig.cc:408 #19
>>>>>>>>>>>>> 0x0000000000443754 in XrdgetProtocol (pname=0x1eef920 "cmsd",
>>>>>>>>>>>>> parms=0x0, pi=0x65b2c0 <XrdMain::Config>)
>>>>>>>>>>>>>          at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/XrdCms/XrdCmsProtocol.cc:121 #20
>>>>>>>>>>>>> 0x000000000041de24 in XrdProtLoad::getProtocol (lname=0x0,
>>>>>>>>>>>>> pname=0x1eef920 "cmsd", parms=0x0, pi=0x65b2c0
>>>>>>>>>>>>> <XrdMain::Config>)
>>>>>>>>>>>>>          at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdProtLoad.cc:247 #21 0x000000000041d8b4
>>>>>>>>>>>>> in XrdProtLoad::Load (lname=0x0, pname=0x1eef920 "cmsd",
>>>>>>>>>>>>> parms=0x0, pi=0x65b2c0 <XrdMain::Config>)
>>>>>>>>>>>>>          at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdProtLoad.cc:101 #22 0x000000000041a5b7
>>>>>>>>>>>>> in XrdConfig::Setup (this=0x65b2c0 <XrdMain::Config>,
>>>>>>>>>>>>> dfltp=0x7fff589f3d4e "cmsd") at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdConfig.cc:1028 #23 0x00000000004187de
>>>>>>>>>>>>> in XrdConfig::Configure (this=0x65b2c0 <XrdMain::Config>,
>>>>>>>>>>>>> argc=9, argv=0x7fff589f3318) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdConfig.cc:509 #24 
>>>>>>>>>>>>> 0x000000000041f13b in
>>>>>>>>>>>>> main (argc=9, argv=0x7fff589f3318) at
>>>>>>>>>>>>> /qserv/stack/EupsBuildDir/Linux64/xrootd-xssi-1.0.0/xrootd-
>>>>>>>>>>>>> xssi- 1.0.0/src/Xrd/XrdMain.cc:179
>>>>>>
>>>>>> ###################################################################### 
>>>>>>
>>>>>> ## Use REPLY-ALL to reply to list
>>>>>>
>>>>>> To unsubscribe from the QSERV-L list, click the following link:
>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the QSERV-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

-- 
Dominique Boutigny  -  CNRS / CC-IN2P3
Now at SLAC National Accelerator Laboratory

Kavli Building - Room 221
Mail     : [log in to unmask] -  [log in to unmask]
Office   : +1 650-926-5759   -  Cellular : +1 774-232-0912



########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1