Print

Print


  Hi Andy and Gregory,

  Is it possible that we are running into something specific with gpfs here?
Recall that gpfs itself attempts to provide a global filesystem-like setup
via these multiple servers (and may do this via DNS load balancing or some
other similar scheme).

                                   Pete

On Fri, Feb 18, 2005 at 02:41:11AM -0800, Andrew Hanushevsky wrote:
> Hi Gregory,
> 
> This looks like some problem in the DNS setup. It's certainly crashing in
> the IP adress to DNS name lookup. Try using nslookup on the IP address
> associated with the data server as well as the new ones you added to see
> if you can do a reverse lookup. You should do that on the redirector
> machine.
> 
> Andy
> 
> On Fri, 18 Feb 2005, Gregory Schott wrote:
> 
> >
> > Hello,
> >
> >    Does s/o have an idea as of why xrootd would crash on the SL3 redirector
> > when adding the SL3 GPFS to the already running RH72 NAS boxes pool? I
> > installed on GPFS the september SL3 binaries. I have the following olb
> > logfile on the dataserver (the crash is silent on the redirector).
> >
> > 050218 09:38:17 10389 olb_Config: (c) 2004 SLAC olbd version 20040907-0403 initializing as Server
> > 050218 09:38:17 10389 setupServer Config: thread 3063385008 assigned to ping monitor
> > 050218 09:38:17 10389 olb_Config: Server initialization completed.
> > 050218 09:38:17 10389 main Main: Thread 3052895152 handling notification traffic.
> > 050218 09:38:17 10389 olb_Start: Waiting for primary server to login.
> > 050218 09:38:17 10389 main Main: Thread 3042405296 handling admin traffic.
> > 050218 09:38:17 10389 Admin_Login Initial admin request: 'login p 10388 port 1094'
> > 050218 09:38:17 10389 olb_Admin_Login: Primary server 10388 logged in
> > 050218 09:38:17 10389 AddManager Manager: Added babar2 to config; id=0
> > 050218 09:38:17 10389 FreeSpace Updated fs info; old=0K new=0K tot=0K
> > 050218 09:38:17 10389 olb_Server: Logged into babar2
> > 050218 09:38:17 10389 olb_GetLine: Unable to reading request ; connection reset by peer
> > 050218 09:38:17 10389 Receive Null line from babar2
> > 050218 09:38:17 10389 olb_Server: Unable to read response from babar2; connection reset by peer
> > 050218 09:38:17 10389 Remove_Manager Removed babar2 manager 0.1 FD=10
> > 050218 09:38:32 10389 olb_Connect: Unable to connect to babar2; connection refused
> > 050218 09:38:42 10389 olb_Connect: Unable to connect to babar2; connection refused
> > 050218 09:38:53 10389 olb_Connect: Unable to connect to babar2; connection refused
> >
> > And the core file says:
> >
> > gdb /opt/xrootd/bin/olbd core.31310
> > GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
> > found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
> >
> > Core was generated by `/opt/xrootd//bin/olbd -m -l /tmp/babar2.olblog -c config/redirector.cf'.
> > Program terminated with signal 11, Segmentation fault.
> > Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libnsl.so.1
> > Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols found)...done.
> > Loaded symbols for /lib/tls/libpthread.so.0
> > Reading symbols from /lib/tls/librt.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /lib/tls/librt.so.1
> > Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libdl.so.2
> > Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols found)...done.
> > Loaded symbols for /usr/lib/libstdc++.so.5
> > Reading symbols from /lib/tls/libm.so.6...(no debugging symbols found)...done.
> > Loaded symbols for /lib/tls/libm.so.6
> > Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done.
> > Loaded symbols for /lib/tls/libc.so.6
> > Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libgcc_s.so.1
> > Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/ld-linux.so.2
> > Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libnss_files.so.2
> > Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libnss_dns.so.2
> > Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/libresolv.so.2
> > #0  0x08067adb in XrdOucSecurity::Authorize ()
> > (gdb) backtrace
> > #0  0x08067adb in XrdOucSecurity::Authorize ()
> > #1  0x0806673b in XrdOucNetwork::do_Accept ()
> > #2  0x08065dec in XrdOucNetwork::Accept ()
> > #3  0x08058024 in main ()
> >
> > I also added two lines on the config file for the 2 new GPFS (see full
> > dataserver config file below):
> >
> > olb.allow host f01-010-110.gridka.de
> > olb.allow host f01-005-115.gridka.de
> >
> >
> > Regards,
> >    Gregory
> >
> >
> > #
> > # dataserver.cf
> > #
> >
> > # The Open Distributed Cache Section
> > #
> > odc.manager babar2 3121
> >
> > # The Open Load Balancer Section
> > #
> > olb.allow host l01-001-122.gridka.de
> > olb.allow host f01-001-1*.gridka.de
> > olb.allow host f01-010-110.gridka.de
> > olb.allow host f01-005-115.gridka.de
> > olb.port 3121
> > olb.path r /store
> > olb.sched cpu 100
> > olb.subscribe babar2 3121
> > olb.wait
> >
> > # The Open File System Section
> > #
> > ofs.redirect remote if l01-001-122.gridka.de
> > ofs.redirect target
> > #ofs.redirect target if f01-001-121.gridka.de
> > #ofs.redirect target if f01-001-1*.gridka.de
> >
> > # The Open Storage System Section (cache & localroot are used by olb)
> > #
> > oss.alloc * * 80
> > oss.fdlimit * max
> > oss.localroot /home/xrootd/disk/kanga/EventStore/
> > #oss.path /data/read r/o
> >
> > # The XRD Section
> > #
> > xrd.protocol xrootd *
> >
> > # The XROOTD Section
> > #
> > xrootd.fslib /home/xrootd/software/20040907-0403/lib/libXrdOfs.so
> > xrootd.export /store
> > xrootd.export /prod
> >
> > # Switch on debugging output
> > #
> > odc.trace redirect
> > xrd.trace all
> > xrootd.trace all
> > olb.trace all
> > oss.trace all
> >
> >
> > -------------- Dr. Gregory Schott --------------
> >   Institut fuer Experimentelle Kernphysik (IEKP)
> >       Universitaet Karlsruhe - Postfach 3640
> >             76021 Karlsruhe  (Germany)
> >              tel.: +49-(0)724782-3537
> >              fax.: +49-(0)724782-3414
> >             e-mail: [log in to unmask]
> > -----------------------------------------------
> >
> >



-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------