Print

Print


Hello,

1) I did the nslookup thing on the redirector and 1 of the 2 gpfs

[xrootd@babar2 xrootd]$ nslookup babar2
Server:         10.97.1.191
Address:        10.97.1.191#53

babar2.gridka.de        canonical name = l01-001-122.gridka.de.
Name:   l01-001-122.gridka.de
Address: 10.33.1.122

[xrootd@babar2 xrootd]$ nslookup  f01-010-110.gridka.de
Server:         10.97.1.191
Address:        10.97.1.191#53

Name:   f01-010-110.gridka.de
Address: 10.65.10.110

[xrootd@babar2 xrootd]$ nslookup f01-005-115.gridka.de
Server:         10.97.1.191
Address:        10.97.1.191#53

Name:   f01-005-115.gridka.de
Address: 10.65.5.115

---

[xrootd@babar2 xrootd]$ ssh f01-005-115.gridka.de

-bash-2.05b$ nslookup f01-005-115.gridka.de
Server:         10.97.1.191
Address:        10.97.1.191#53

Name:   f01-005-115.gridka.de
Address: 10.65.5.115

-bash-2.05b$ nslookup babar2
Server:         10.97.1.191
Address:        10.97.1.191#53

babar2.gridka.de        canonical name = l01-001-122.gridka.de.
Name:   l01-001-122.gridka.de
Address: 10.33.1.122

---

2) I rebuilt xrootd with debugging info and the core file says:

[xrootd@babar2 ~]$ gdb software/xrootd-20040907-0403/bin/olbd core.25794
GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host 
libthread_db lib
rary "/lib/tls/libthread_db.so.1".

Core was generated by 
`/home/xrootd/software/xrootd-20040907-0403//bin/olbd -m -
l /tmp/babar2.olblog -'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/librt.so.1...done.
Loaded symbols for /lib/tls/librt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /usr/lib/libstdc++.so.5...done.
Loaded symbols for /usr/lib/libstdc++.so.5
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0  0x080678cf in XrdOucSecurity::Authorize (this=0x883fa70, addr=0x0)
     at XrdOucSecurity.cc:180
180     XrdOucSecurity.cc: No such file or directory.
         in XrdOucSecurity.cc
(gdb)
(gdb) backtrace
#0  0x080678cf in XrdOucSecurity::Authorize (this=0x883fa70, addr=0x0)
     at XrdOucSecurity.cc:180
#1  0x0806652f in XrdOucNetwork::do_Accept (this=0x883f490, opts=0)
     at XrdOucNetwork.cc:393
#2  0x08065be0 in XrdOucNetwork::Accept (this=0x883f490, opts=0, 
timeout=-1)
     at XrdOucNetwork.cc:79
#3  0x08057e88 in main (argc=6, argv=0xbfffd4d4) at XrdOlbMain.cc:190

---

3) I also tried to use 20050209-0257 instead of 20040907-0403 but saw the 
same problem

Cheers,
   Gregory




On Fri, 18 Feb 2005, Peter Elmer wrote:

>  [Add back Gregory and the mailing list, since Andy accidentally
>   dropped them.]
>
>  Gregory, please see Andy's request below...
>
>
> On Fri, Feb 18, 2005 at 02:43:44AM -0800, Andrew Hanushevsky wrote:
>> Hi Pete,
>>
>>
>> On Fri, 18 Feb 2005, Peter Elmer wrote:
>>>   (That said, the olbd shouldn't just crash if it doesn't succeed in
>>> connecting to the redirector olbd. That sounds like a bug.)
>> Agreed, Gregory could you please compile with "--build=debug" so that we
>> can at least see where it's crashing?
>>
>> Andy
>>
>>>                                    Pete
>>>
>>> On Fri, Feb 18, 2005 at 09:52:18AM +0100, Gregory Schott wrote:
>>>> Hello,
>>>>
>>>>   Does s/o have an idea as of why xrootd would crash on the SL3 redirector
>>>> when adding the SL3 GPFS to the already running RH72 NAS boxes pool? I
>>>> installed on GPFS the september SL3 binaries. I have the following olb
>>>> logfile on the dataserver (the crash is silent on the redirector).
>>>>
>>>> 050218 09:38:17 10389 olb_Config: (c) 2004 SLAC olbd version 20040907-0403
>>>> initializing as Server
>>>> 050218 09:38:17 10389 setupServer Config: thread 3063385008 assigned to
>>>> ping monitor
>>>> 050218 09:38:17 10389 olb_Config: Server initialization completed.
>>>> 050218 09:38:17 10389 main Main: Thread 3052895152 handling notification
>>>> traffic.
>>>> 050218 09:38:17 10389 olb_Start: Waiting for primary server to login.
>>>> 050218 09:38:17 10389 main Main: Thread 3042405296 handling admin traffic.
>>>> 050218 09:38:17 10389 Admin_Login Initial admin request: 'login p 10388
>>>> port 1094'
>>>> 050218 09:38:17 10389 olb_Admin_Login: Primary server 10388 logged in
>>>> 050218 09:38:17 10389 AddManager Manager: Added babar2 to config; id=0
>>>> 050218 09:38:17 10389 FreeSpace Updated fs info; old=0K new=0K tot=0K
>>>> 050218 09:38:17 10389 olb_Server: Logged into babar2
>>>> 050218 09:38:17 10389 olb_GetLine: Unable to reading request ; connection
>>>> reset by peer
>>>> 050218 09:38:17 10389 Receive Null line from babar2
>>>> 050218 09:38:17 10389 olb_Server: Unable to read response from babar2;
>>>> connection reset by peer
>>>> 050218 09:38:17 10389 Remove_Manager Removed babar2 manager 0.1 FD=10
>>>> 050218 09:38:32 10389 olb_Connect: Unable to connect to babar2; connection
>>>> refused
>>>> 050218 09:38:42 10389 olb_Connect: Unable to connect to babar2; connection
>>>> refused
>>>> 050218 09:38:53 10389 olb_Connect: Unable to connect to babar2; connection
>>>> refused
>>>>
>>>> And the core file says:
>>>>
>>>> gdb /opt/xrootd/bin/olbd core.31310
>>>> GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
>>>> Copyright 2004 Free Software Foundation, Inc.
>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>> welcome to change it and/or distribute copies of it under certain
>>>> conditions.
>>>> Type "show copying" to see the conditions.
>>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>>> This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
>>>> found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
>>>>
>>>> Core was generated by `/opt/xrootd//bin/olbd -m -l /tmp/babar2.olblog -c
>>>> config/redirector.cf'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
>>>> Loaded symbols for /lib/libnsl.so.1
>>>> Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/tls/libpthread.so.0
>>>> Reading symbols from /lib/tls/librt.so.1...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/tls/librt.so.1
>>>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>>>> Loaded symbols for /lib/libdl.so.2
>>>> Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /usr/lib/libstdc++.so.5
>>>> Reading symbols from /lib/tls/libm.so.6...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/tls/libm.so.6
>>>> Reading symbols from /lib/tls/libc.so.6...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/tls/libc.so.6
>>>> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/libgcc_s.so.1
>>>> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/ld-linux.so.2
>>>> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/libnss_files.so.2
>>>> Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/libnss_dns.so.2
>>>> Reading symbols from /lib/libresolv.so.2...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib/libresolv.so.2
>>>> #0  0x08067adb in XrdOucSecurity::Authorize ()
>>>> (gdb) backtrace
>>>> #0  0x08067adb in XrdOucSecurity::Authorize ()
>>>> #1  0x0806673b in XrdOucNetwork::do_Accept ()
>>>> #2  0x08065dec in XrdOucNetwork::Accept ()
>>>> #3  0x08058024 in main ()
>>>>
>>>> I also added two lines on the config file for the 2 new GPFS (see full
>>>> dataserver config file below):
>>>>
>>>> olb.allow host f01-010-110.gridka.de
>>>> olb.allow host f01-005-115.gridka.de
>>>>
>>>>
>>>> Regards,
>>>>   Gregory
>>>>
>>>>
>>>> #
>>>> # dataserver.cf
>>>> #
>>>>
>>>> # The Open Distributed Cache Section
>>>> #
>>>> odc.manager babar2 3121
>>>>
>>>> # The Open Load Balancer Section
>>>> #
>>>> olb.allow host l01-001-122.gridka.de
>>>> olb.allow host f01-001-1*.gridka.de
>>>> olb.allow host f01-010-110.gridka.de
>>>> olb.allow host f01-005-115.gridka.de
>>>> olb.port 3121
>>>> olb.path r /store
>>>> olb.sched cpu 100
>>>> olb.subscribe babar2 3121
>>>> olb.wait
>>>>
>>>> # The Open File System Section
>>>> #
>>>> ofs.redirect remote if l01-001-122.gridka.de
>>>> ofs.redirect target
>>>> #ofs.redirect target if f01-001-121.gridka.de
>>>> #ofs.redirect target if f01-001-1*.gridka.de
>>>>
>>>> # The Open Storage System Section (cache & localroot are used by olb)
>>>> #
>>>> oss.alloc * * 80
>>>> oss.fdlimit * max
>>>> oss.localroot /home/xrootd/disk/kanga/EventStore/
>>>> #oss.path /data/read r/o
>>>>
>>>> # The XRD Section
>>>> #
>>>> xrd.protocol xrootd *
>>>>
>>>> # The XROOTD Section
>>>> #
>>>> xrootd.fslib /home/xrootd/software/20040907-0403/lib/libXrdOfs.so
>>>> xrootd.export /store
>>>> xrootd.export /prod
>>>>
>>>> # Switch on debugging output
>>>> #
>>>> odc.trace redirect
>>>> xrd.trace all
>>>> xrootd.trace all
>>>> olb.trace all
>>>> oss.trace all
>>>>
>
>
>
> -------------------------------------------------------------------------
> Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
> Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
> -------------------------------------------------------------------------
>