Print

Print


I should have a fix within an hour. However, it will have to be pulled in 
to the LSST XrdSsi branch and you will need to recompile the xrootd (just 
like you did last time).

Andy

On Wed, 11 Nov 2015, Fabrice Jammes wrote:

> Hi Andy,
>
> I'm afraid our distributed setup is broken for a few time... Thanks for your 
> help and for the future fix ;-)
>
> Regards,
>
> On 11/11/2015 03:13 PM, Andrew Hanushevsky wrote:
>> Hi Fabrice,
>> 
>> Ah, OK, I see. This is a problem. There is no easy solution here. I need to 
>> rework a bit of code to get the cmsd running. It has to with the way the 
>> initialization is ordered, sigh. I won't have something immediately an it 
>> will require code changes in the SSI.
>> 
>> Andy
>> 
>> On Wed, 11 Nov 2015, Fabrice Jammes wrote:
>> 
>>> Hi Andy,
>>> 
>>> Here's the requested traces:
>>> 
>>> *cmsd starts successfully with the first config:*
>>> 
>>> qserv@ccqserv126:~$ cat cmsd.conf
>>> all.role server
>>> all.manager ccqserv125.in2p3.fr:2131
>>> ssi.svclib libxrdsvc.so
>>> #oss.statlib -2 libXrdSsi.so
>>> qserv@ccqserv126:~$ cmsd -d -c cmsd.conf
>>> 151111 22:45:19 103 Starting on Linux 3.10.0-229.20.1.el7.x86_64
>>> Copr.  2004-2012 Stanford University, xrd version unknown
>>> ++++++ cmsd [log in to unmask] initialization started.
>>> Config using configuration file cmsd.conf
>>> Config maximum number of connections restricted to 1048576
>>> Config maximum number of threads restricted to 1048576
>>> 151111 22:45:19 103 XrdConfig: sendfile enabled.
>>> 151111 22:45:19 103 XrdSched: scheduling underused thread monitor in 780 
>>> seconds
>>> 151111 22:45:19 104 XrdXeq: Buffer Manager reshaper thread started
>>> 151111 22:45:19 105 XrdXeq: Time scheduler thread started
>>> 151111 22:45:19 103 XrdSched: Starting with 2 workers
>>> 151111 22:45:19 103 XrdLink: Allocating 8 link objects at a time
>>> 151111 22:45:19 107 XrdXeq: Worker thread started
>>> 151111 22:45:19 106 XrdXeq: Worker thread started
>>> 151111 22:45:19 103 XrdPoll: Starting poller 0
>>> 151111 22:45:19 108 XrdXeq: Poller thread started
>>> 151111 22:45:19 103 XrdPoll: Starting poller 1
>>> 151111 22:45:19 109 XrdXeq: Poller thread started
>>> 151111 22:45:19 103 XrdPoll: Starting poller 2
>>> 151111 22:45:19 110 XrdXeq: Poller thread started
>>> 151111 22:45:19 103 XrdProtocol: getting port from protocol cmsd
>>> Copr.  2007 Stanford University/SLAC cmsd.
>>> ++++++ [log in to unmask] phase 1 initialization started.
>>> =====> all.role server
>>> =====> all.manager ccqserv125.in2p3.fr:2131
>>> The following paths are available to the redirector:
>>> r  /
>>> 
>>> ------ [log in to unmask] phase 1 server initialization completed.
>>> 151111 22:45:19 103 XrdConfig: LCL port 37568 wsz=87380 (87380)
>>> 151111 22:45:19 103 XrdProtocol: getting protocol object cmsd
>>> ++++++ [log in to unmask] phase 2 server initialization started.
>>> Config warning: adminpath resides in /tmp and may be unstable!
>>> 151111 22:45:19 103 Configure2 Global System Identification: anon-s 
>>> 2131ccqserv125.in2p3.fr
>>> ++++++ Storage system initialization started.
>>> ++++++ Configuring standalone mode . . .
>>> 151111 22:45:19 103 oss_AioInit: started AIO read signal thread; 
>>> tid=1278469888
>>> 151111 22:45:19 103 oss_AioInit: started AIO write signal thread; 
>>> tid=1277417216
>>> Config effective cmsd.conf oss configuration:
>>>       oss.alloc        0 0 0
>>>       oss.cachescan    600
>>>       oss.fdlimit      524288 1048576
>>>       oss.maxsize      0
>>>       oss.trace        fff
>>>       oss.xfr          1 deny 10800 keep 1200
>>>       oss.memfile off  max 8355569664
>>>       oss.defaults  r/w  nocheck nodread nomig norcreate nopurge nostage 
>>> xattr
>>> ------ Storage system initialization completed.
>>> 151111 22:45:19 103 Start Srv=0 dfs=0 lcl=0 Pre=1 dmLife=0 0
>>> 151111 22:45:19 103 Start Lim=0 0 fix=0 Qmax=1
>>> 151111 22:45:19 103 Meter: Warning! No writable filesystems found.
>>> 151111 22:45:19 103 Update Space Parm1=0 Parm2=0
>>> 151111 22:45:19 103 Meter: Write access and staging prohibited.
>>> ------ [log in to unmask] phase 2 server initialization completed.
>>> 151111 22:45:19 107 XrdSched: running cmsd startup inq=0
>>> 151111 22:45:19 113 XrdXeq: Notification handler thread started
>>> 151111 22:45:19 115 XrdXeq: Admin traffic thread started
>>> 151111 22:45:19 114 XrdXeq: Prep handler thread started
>>> 151111 22:45:19 115 Start: Waiting for primary server to login.
>>> ------ cmsd [log in to unmask]:37568 initialization completed.
>>> 151111 22:45:19 106 XrdSched: Now have 3 workers
>>> 151111 22:45:19 106 XrdSched: running main accept inq=0
>>> 151111 22:45:19 117 XrdXeq: Worker thread started
>>> 
>>> *cmsd crashes with the second config:*
>>> 
>>> qserv@ccqserv126:~$ cat cmsd.conf
>>> all.role server
>>> all.manager ccqserv125.in2p3.fr:2131
>>> ssi.svclib libxrdsvc.so
>>> oss.statlib -2 libXrdSsi.so
>>> qserv@ccqserv126:~$
>>> qserv@ccqserv126:~$ cmsd -d -c cmsd.conf
>>> 151111 22:58:54 137 Starting on Linux 3.10.0-229.20.1.el7.x86_64
>>> Copr.  2004-2012 Stanford University, xrd version unknown
>>> ++++++ cmsd [log in to unmask] initialization started.
>>> Config using configuration file cmsd.conf
>>> Config maximum number of connections restricted to 1048576
>>> Config maximum number of threads restricted to 1048576
>>> 151111 22:58:54 137 XrdConfig: sendfile enabled.
>>> 151111 22:58:54 137 XrdSched: scheduling underused thread monitor in 780 
>>> seconds
>>> 151111 22:58:54 138 XrdXeq: Buffer Manager reshaper thread started
>>> 151111 22:58:54 141 XrdXeq: Worker thread started
>>> 151111 22:58:54 137 XrdSched: Starting with 2 workers
>>> 151111 22:58:54 137 XrdLink: Allocating 8 link objects at a time
>>> 151111 22:58:54 139 XrdXeq: Time scheduler thread started
>>> 151111 22:58:54 140 XrdXeq: Worker thread started
>>> 151111 22:58:54 137 XrdPoll: Starting poller 0
>>> 151111 22:58:54 142 XrdXeq: Poller thread started
>>> 151111 22:58:54 137 XrdPoll: Starting poller 1
>>> 151111 22:58:54 143 XrdXeq: Poller thread started
>>> 151111 22:58:54 137 XrdPoll: Starting poller 2
>>> 151111 22:58:54 144 XrdXeq: Poller thread started
>>> 151111 22:58:54 137 XrdProtocol: getting port from protocol cmsd
>>> Copr.  2007 Stanford University/SLAC cmsd.
>>> ++++++ [log in to unmask] phase 1 initialization started.
>>> =====> all.role server
>>> =====> all.manager ccqserv125.in2p3.fr:2131
>>> The following paths are available to the redirector:
>>> r  /
>>> 
>>> ------ [log in to unmask] phase 1 server initialization completed.
>>> 151111 22:58:54 137 XrdConfig: LCL port 52851 wsz=87380 (87380)
>>> 151111 22:58:54 137 XrdProtocol: getting protocol object cmsd
>>> ++++++ [log in to unmask] phase 2 server initialization started.
>>> Config warning: adminpath resides in /tmp and may be unstable!
>>> 151111 22:58:54 137 Configure2 Global System Identification: anon-s 
>>> 2131ccqserv125.in2p3.fr
>>> ++++++ Storage system initialization started.
>>> =====> oss.statlib -2 libXrdSsi.so
>>> Plugin No such file or directory loading statlib libXrdSsi-4.so
>>> Config Falling back to using libXrdSsi.so
>>> ++++++ ssi phase 1 initialization started.
>>> =====> all.role server
>>> =====> ssi.svclib libxrdsvc.so
>>> ------ ssi phase 1 initialization completed.
>>> ++++++ ssi phase 2 initialization started.
>>> 151111 22:58:54 137 sysFinder: Network i/f undefined; unable to 
>>> self-locate.
>>> ------ ssi phase 2 initialization failed.
>>> ++++++ Configuring standalone mode . . .
>>> ------ Storage system initialization failed.
>>> ------ [log in to unmask] phase 2 server initialization failed.
>>> 151111 22:58:54 137 XrdProtocol: Protocol cmsd could not be loaded
>>> ------ cmsd [log in to unmask]:-1 initialization failed.
>>> 
>>> Hope it'll help.
>>> 
>>> Thanks
>>> 
>>> 
>>> On 11/11/2015 02:10 PM, Andrew Hanushevsky wrote:
>>>> Hi Fabrice,
>>>> 
>>>> Odd. OK, my answers....
>>>> 
>>>> On Wed, 11 Nov 2015, Fabrice Jammes wrote:
>>>> 
>>>>>> 1) Who is producing the following messages?
>>>>> This messages are in cmsd logs and are produced by xrootd:
>>>> Got it. OK, this is because of static initialization of something we will 
>>>> not use but cannot easily avoid initializing. It should be OK.
>>>> 
>>>>>> 2) The "statlib" uses the libXrdSsi.so because we packaged it there as 
>>>>>> a convenience since we need to use the file registry. Do you have a 
>>>>>> static initialization section that expects it will fire up all of 
>>>>>> qserv? We don't want that.
>>>>> I don't really understand this question, sorry. Here's our configuration 
>>>>> file, it may help?
>>>> I just answered in in (1). This is the xrootd client doing static 
>>>> initialization and this is because the SSI library uses the client so it 
>>>> is forced to be initialized when the client library is loaded.
>>>> 
>>>>>> 3) This is a container, right?
>>>>> Yes. FYI, our previous cmsd version was running fine under the same sort 
>>>>> of container with same network setting.
>>>> Then is should run here.
>>>> 
>>>>>> 5) I assume things are registered in DNS or at least appear correctly 
>>>>>> in /etc/hosts otherwise we will have a problem. The container has to 
>>>>>> look like an actual machine.
>>>>> # runned inside he container
>>>>> root@ccqserv126:/qserv# ping ccqserv126
>>>>> PING ccqserv126.in2p3.fr (172.17.0.7): 56 data bytes
>>>>> 64 bytes from 172.17.0.7: icmp_seq=0 ttl=64 time=0.061 ms
>>>>> 64 bytes from 172.17.0.7: icmp_seq=1 ttl=64 time=0.049 ms
>>>> OK, it's properly registered. So, type up a small config file, as 
>>>> follows:
>>>> 
>>>> all.role server
>>>> all.manager ccqserv125.in2p3.fr:2131
>>>> ssi.svclib libxrdsvc.so
>>>> #oss.statlib -2 libXrdSsi.so
>>>> 
>>>> Setup the environment as you normally would but don't start anything. By 
>>>> hand do:
>>>> 
>>>> <path>/cmsd -d -c <path to config file above>
>>>> 
>>>> Send the output to me. The uncomment the "statlib" directive and so the 
>>>> same thing again. Send that output to me as well.
>>>> 
>>>> Andy
>>> 
>>> 
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1