Print

Print


On Fri, 18 Feb 2005, Peter Elmer wrote:

>  Hi Gregory,
>
>  Just to be clear, it is the olbd that seems to be crashing, not the xrootd.

yes, sorry, I meant olbd not xrootd.

>  Note that the "olb.allow" directives adding the two new servers (if they
> can't be specified with wildcards) needed to be added also the _redirector_
> config file since that is the entity that will accept the connection from
> (and provide access to) the two new dataservers. That might be the problem,
> but it is a guess.
>
>  (That said, the olbd shouldn't just crash if it doesn't succeed in
> connecting to the redirector olbd. That sounds like a bug.)

I tried to add the olb.allow lines on the redirector as well but it still 
crashes.

-- gregory

>
>                                   Pete
>
> On Fri, Feb 18, 2005 at 09:52:18AM +0100, Gregory Schott wrote:
>> Hello,
>>
>>   Does s/o have an idea as of why xrootd would crash on the SL3 redirector
>> when adding the SL3 GPFS to the already running RH72 NAS boxes pool? I
>> installed on GPFS the september SL3 binaries. I have the following olb
>> logfile on the dataserver (the crash is silent on the redirector).
>>
>> 050218 09:38:17 10389 olb_Config: (c) 2004 SLAC olbd version 20040907-0403
>> initializing as Server
>> 050218 09:38:17 10389 setupServer Config: thread 3063385008 assigned to
>> ping monitor
>> 050218 09:38:17 10389 olb_Config: Server initialization completed.
>> 050218 09:38:17 10389 main Main: Thread 3052895152 handling notification
>> traffic.
>> 050218 09:38:17 10389 olb_Start: Waiting for primary server to login.
>> 050218 09:38:17 10389 main Main: Thread 3042405296 handling admin traffic.
>> 050218 09:38:17 10389 Admin_Login Initial admin request: 'login p 10388
>> port 1094'
>> 050218 09:38:17 10389 olb_Admin_Login: Primary server 10388 logged in
>> 050218 09:38:17 10389 AddManager Manager: Added babar2 to config; id=0
>> 050218 09:38:17 10389 FreeSpace Updated fs info; old=0K new=0K tot=0K
>> 050218 09:38:17 10389 olb_Server: Logged into babar2
>> 050218 09:38:17 10389 olb_GetLine: Unable to reading request ; connection
>> reset by peer
>> 050218 09:38:17 10389 Receive Null line from babar2
>> 050218 09:38:17 10389 olb_Server: Unable to read response from babar2;
>> connection reset by peer
>> 050218 09:38:17 10389 Remove_Manager Removed babar2 manager 0.1 FD=10
>> 050218 09:38:32 10389 olb_Connect: Unable to connect to babar2; connection
>> refused
>> 050218 09:38:42 10389 olb_Connect: Unable to connect to babar2; connection
>> refused
>> 050218 09:38:53 10389 olb_Connect: Unable to connect to babar2; connection
>> refused
>>
>> And the core file says:
>>
>> gdb /opt/xrootd/bin/olbd core.31310
>> GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
>> found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
>>
>> Core was generated by `/opt/xrootd//bin/olbd -m -l /tmp/babar2.olblog -c
>> config/redirector.cf'.
>> Program terminated with signal 11, Segmentation fault.
>> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libnsl.so.1
>> Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/tls/libpthread.so.0
>> Reading symbols from /lib/tls/librt.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/tls/librt.so.1
>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libdl.so.2
>> Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib/libstdc++.so.5
>> Reading symbols from /lib/tls/libm.so.6...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/tls/libm.so.6
>> Reading symbols from /lib/tls/libc.so.6...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/tls/libc.so.6
>> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libgcc_s.so.1
>> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/ld-linux.so.2
>> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libnss_files.so.2
>> Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libnss_dns.so.2
>> Reading symbols from /lib/libresolv.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libresolv.so.2
>> #0  0x08067adb in XrdOucSecurity::Authorize ()
>> (gdb) backtrace
>> #0  0x08067adb in XrdOucSecurity::Authorize ()
>> #1  0x0806673b in XrdOucNetwork::do_Accept ()
>> #2  0x08065dec in XrdOucNetwork::Accept ()
>> #3  0x08058024 in main ()
>>
>> I also added two lines on the config file for the 2 new GPFS (see full
>> dataserver config file below):
>>
>> olb.allow host f01-010-110.gridka.de
>> olb.allow host f01-005-115.gridka.de
>>
>>
>> Regards,
>>   Gregory
>>
>>
>> #
>> # dataserver.cf
>> #
>>
>> # The Open Distributed Cache Section
>> #
>> odc.manager babar2 3121
>>
>> # The Open Load Balancer Section
>> #
>> olb.allow host l01-001-122.gridka.de
>> olb.allow host f01-001-1*.gridka.de
>> olb.allow host f01-010-110.gridka.de
>> olb.allow host f01-005-115.gridka.de
>> olb.port 3121
>> olb.path r /store
>> olb.sched cpu 100
>> olb.subscribe babar2 3121
>> olb.wait
>>
>> # The Open File System Section
>> #
>> ofs.redirect remote if l01-001-122.gridka.de
>> ofs.redirect target
>> #ofs.redirect target if f01-001-121.gridka.de
>> #ofs.redirect target if f01-001-1*.gridka.de
>>
>> # The Open Storage System Section (cache & localroot are used by olb)
>> #
>> oss.alloc * * 80
>> oss.fdlimit * max
>> oss.localroot /home/xrootd/disk/kanga/EventStore/
>> #oss.path /data/read r/o
>>
>> # The XRD Section
>> #
>> xrd.protocol xrootd *
>>
>> # The XROOTD Section
>> #
>> xrootd.fslib /home/xrootd/software/20040907-0403/lib/libXrdOfs.so
>> xrootd.export /store
>> xrootd.export /prod
>>
>> # Switch on debugging output
>> #
>> odc.trace redirect
>> xrd.trace all
>> xrootd.trace all
>> olb.trace all
>> oss.trace all
>>
>>
>> -------------- Dr. Gregory Schott --------------
>>  Institut fuer Experimentelle Kernphysik (IEKP)
>>      Universitaet Karlsruhe - Postfach 3640
>>            76021 Karlsruhe  (Germany)
>>             tel.: +49-(0)724782-3537
>>             fax.: +49-(0)724782-3414
>>            e-mail: [log in to unmask]
>> -----------------------------------------------
>>
>
>
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>

-------------- Dr. Gregory Schott --------------
  Institut fuer Experimentelle Kernphysik (IEKP)
      Universitaet Karlsruhe - Postfach 3640
            76021 Karlsruhe  (Germany)
             tel.: +49-(0)724782-3537
             fax.: +49-(0)724782-3414
            e-mail: [log in to unmask]
-----------------------------------------------