Print

Print


  Hi Gregory,

  Just to be clear, it is the olbd that seems to be crashing, not the xrootd.

  Note that the "olb.allow" directives adding the two new servers (if they
can't be specified with wildcards) needed to be added also the _redirector_
config file since that is the entity that will accept the connection from
(and provide access to) the two new dataservers. That might be the problem,
but it is a guess.

  (That said, the olbd shouldn't just crash if it doesn't succeed in 
connecting to the redirector olbd. That sounds like a bug.)
 
                                   Pete

On Fri, Feb 18, 2005 at 09:52:18AM +0100, Gregory Schott wrote:
> Hello,
> 
>   Does s/o have an idea as of why xrootd would crash on the SL3 redirector 
> when adding the SL3 GPFS to the already running RH72 NAS boxes pool? I 
> installed on GPFS the september SL3 binaries. I have the following olb 
> logfile on the dataserver (the crash is silent on the redirector).
> 
> 050218 09:38:17 10389 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 
> initializing as Server
> 050218 09:38:17 10389 setupServer Config: thread 3063385008 assigned to 
> ping monitor
> 050218 09:38:17 10389 olb_Config: Server initialization completed.
> 050218 09:38:17 10389 main Main: Thread 3052895152 handling notification 
> traffic.
> 050218 09:38:17 10389 olb_Start: Waiting for primary server to login.
> 050218 09:38:17 10389 main Main: Thread 3042405296 handling admin traffic.
> 050218 09:38:17 10389 Admin_Login Initial admin request: 'login p 10388 
> port 1094'
> 050218 09:38:17 10389 olb_Admin_Login: Primary server 10388 logged in
> 050218 09:38:17 10389 AddManager Manager: Added babar2 to config; id=0
> 050218 09:38:17 10389 FreeSpace Updated fs info; old=0K new=0K tot=0K
> 050218 09:38:17 10389 olb_Server: Logged into babar2
> 050218 09:38:17 10389 olb_GetLine: Unable to reading request ; connection 
> reset by peer
> 050218 09:38:17 10389 Receive Null line from babar2
> 050218 09:38:17 10389 olb_Server: Unable to read response from babar2; 
> connection reset by peer
> 050218 09:38:17 10389 Remove_Manager Removed babar2 manager 0.1 FD=10
> 050218 09:38:32 10389 olb_Connect: Unable to connect to babar2; connection 
> refused
> 050218 09:38:42 10389 olb_Connect: Unable to connect to babar2; connection 
> refused
> 050218 09:38:53 10389 olb_Connect: Unable to connect to babar2; connection 
> refused
> 
> And the core file says:
> 
> gdb /opt/xrootd/bin/olbd core.31310
> GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols 
> found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
> 
> Core was generated by `/opt/xrootd//bin/olbd -m -l /tmp/babar2.olblog -c 
> config/redirector.cf'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/tls/libpthread.so.0
> Reading symbols from /lib/tls/librt.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/tls/librt.so.1
> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols 
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.5
> Reading symbols from /lib/tls/libm.so.6...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/tls/libm.so.6
> Reading symbols from /lib/tls/libc.so.6...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/tls/libc.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/libnss_files.so.2
> Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/libnss_dns.so.2
> Reading symbols from /lib/libresolv.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib/libresolv.so.2
> #0  0x08067adb in XrdOucSecurity::Authorize ()
> (gdb) backtrace
> #0  0x08067adb in XrdOucSecurity::Authorize ()
> #1  0x0806673b in XrdOucNetwork::do_Accept ()
> #2  0x08065dec in XrdOucNetwork::Accept ()
> #3  0x08058024 in main ()
> 
> I also added two lines on the config file for the 2 new GPFS (see full 
> dataserver config file below):
> 
> olb.allow host f01-010-110.gridka.de
> olb.allow host f01-005-115.gridka.de
> 
> 
> Regards,
>   Gregory
> 
> 
> #
> # dataserver.cf
> #
> 
> # The Open Distributed Cache Section
> #
> odc.manager babar2 3121
> 
> # The Open Load Balancer Section
> #
> olb.allow host l01-001-122.gridka.de
> olb.allow host f01-001-1*.gridka.de
> olb.allow host f01-010-110.gridka.de
> olb.allow host f01-005-115.gridka.de
> olb.port 3121
> olb.path r /store
> olb.sched cpu 100
> olb.subscribe babar2 3121
> olb.wait
> 
> # The Open File System Section
> #
> ofs.redirect remote if l01-001-122.gridka.de
> ofs.redirect target
> #ofs.redirect target if f01-001-121.gridka.de
> #ofs.redirect target if f01-001-1*.gridka.de
> 
> # The Open Storage System Section (cache & localroot are used by olb)
> #
> oss.alloc * * 80
> oss.fdlimit * max
> oss.localroot /home/xrootd/disk/kanga/EventStore/
> #oss.path /data/read r/o
> 
> # The XRD Section
> #
> xrd.protocol xrootd *
> 
> # The XROOTD Section
> #
> xrootd.fslib /home/xrootd/software/20040907-0403/lib/libXrdOfs.so
> xrootd.export /store
> xrootd.export /prod
> 
> # Switch on debugging output
> #
> odc.trace redirect
> xrd.trace all
> xrootd.trace all
> olb.trace all
> oss.trace all
> 
> 
> -------------- Dr. Gregory Schott --------------
>  Institut fuer Experimentelle Kernphysik (IEKP)
>      Universitaet Karlsruhe - Postfach 3640
>            76021 Karlsruhe  (Germany)
>             tel.: +49-(0)724782-3537
>             fax.: +49-(0)724782-3414
>            e-mail: [log in to unmask]
> -----------------------------------------------
> 



-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------