Print

Print


Hi Gregory,

The reason is that xrootd cannot translate the NAS IP address back to the
DNS name. Are you sure the NAS boxes are properly registered in your name
server? The next version will give you an error message when this happens
instead of crashing.
Andy

On Tue, 22 Mar 2005, Gregory Schott wrote:

> Hello,
>
>    While I could xrdcp a file when running version 20050110-1339, running now
> version 20050321-0425 the xrd process crashes on the dataservers (RH7.3
> and SL3).
>    I appended below some logfiles, the xrd and olb logfiles don't say anything
> special and xrdcp complains for an error.
>    The core file on SL3 says: (see below)
>
> -- Gregory
>
>
> # xrd logfile
>
> 050322 16:30:23 001 (c) 2004 Stanford University/SLAC xrd version
> 20050321-0425_dbg
> 050322 16:30:23 001 xrd@f01-001-116 initialization started.
> 050322 16:30:23 001 Using configuration file config/dataserver.cf
> 050322 16:30:23 001 Optimizing for 256 connections; maximum is 1024
> 050322 16:30:23 001 XrdSched: Set min_Workers=4 max_Workers=32
> 050322 16:30:23 001 XrdSched: Set stk_Workers=26 max_Workidl=780
> 050322 16:30:23 001 XrdSched: scheduling underused thread monitor in 780
> seconds
> 050322 16:30:23 001 XrdSched: Now have 1 workers
> 050322 16:30:23 001 XrdLink: Allocating 16 link objects at a time
> 050322 16:30:23 001 XrdPoll: Starting poller 0
> 050322 16:30:23 001 XrdPoll: Starting poller 1
> 050322 16:30:23 001 XrdPoll: Starting poller 2
> 050322 16:30:23 001 XrdProtocol: loading protocol xrootd
> 050322 16:30:23 001 (c) 2004 Stanford University/SLAC XRootd (eXtended Root
> Daemon).
> 050322 16:30:23 001 XrootdAioReq: Max aio/req=8; aio/srv=4096; Quantum=65536
> 050322 16:30:23 001 XrootdAioReq: Adding 30 aioreq objects.
> 050322 16:30:23 001 XrootdAio: Adding 24 aio objects; 4096 pending.
> 050322 16:30:23 001 XRootd seclib not specified; strong authentication disabled
> 050322 16:30:23 001 XrootdProtocol: Loading filesystem library
> /home/xrootd/software/current/lib/libXrdOfs.so
> 050322 16:30:23 001 ofs_Init: (c) 2005 Stanford University/SLAC, Ofs Version
> 20050321-0425_dbg
> 050322 16:30:23 001 ofs_Config: File system initialization started.
> 050322 16:30:23 001 ofs_Config: redirect remote ignored; not applicable host.
> 050322 16:30:23 001 odc_Config: Target redirection initialization started
> 050322 16:30:23 001 odc_Config: Target redirection initialization completed.
> 050322 16:30:23 001 ofs_Config: File system initialization completed.
> config/dataserver.cf ofs configuration:
> ofs.authorize
> ofs.redirect target
> ofs.fdscan     9 120 1200
> ofs.maxdelay   60
> ofs.trace      0
> 050322 16:30:23 001 oss_Init: (c) 2004, Stanford University, oss Version
> 20050321-0425_dbg
> 050322 16:30:23 001 oss_config: Storage system initialization started.
> 050322 16:30:23 001 oss_AioInit: started AIO read signal thread; tid=8201
> 050322 16:30:23 24756 odc_olb: Connected to olb via /tmp/.olb/olbd.admin
> 050322 16:30:23 001 oss_AioInit: started AIO write signal thread; tid=9226
> 050322 16:30:23 001 oss_config: Storage system initialization completed.
> config/dataserver.cf oss configuration:
> oss.alloc        0 0 80
> oss.cachescan    600
> oss.compdetect   *
> oss.fdlimit      512 1024
> oss.maxdbsize    0
> oss.localroot /home/xrootd/disk/kanga/EventStore/
> oss.trace        fff
> oss.xfr          1 9437184 30 10800
> oss.memfile off  max 527738880
> oss.path / r/w  nocheck nodread nomig nomkeep nomlock nommap norcreate nostage
> 050322 16:30:23 001 XrdSched: scheduling xrootd protocol anchor in 3600 seconds
> 050322 16:30:23 001 Prep log directory not specified; prepare tracking
> disabled.
> 050322 16:30:23 001 Exporting /prod
> 050322 16:30:23 001 Exporting /store
> 050322 16:30:23 001 XRootd protocol version 2.3.0 build 20050321-0425
> successfully loaded.
> 050322 16:30:23 001 xrd@f01-001-116:1094 initialization completed.
>
> # olb logfile
>
> 050322 16:30:23 001 olb_Config: (c) 2004 SLAC olbd version 20050321-0425_dbg
> initializing as Server
> 050322 16:30:23 001 olb_Config: Server initialization completed.
> 050322 16:30:23 24748 olb_Start: Waiting for primary server to login.
> 050322 16:30:23 24758 Admin_Login Initial admin request: 'login p 24742 port
> 1094'
> 050322 16:30:23 24758 olb_Admin_Login: Primary server 24742 logged in
> 050322 16:30:23 001 AddManager Manager: Added babar2 to config; id=0
> 050322 16:30:23 001 FreeSpace Updated fs info; old=0K new=0K tot=0K
> 050322 16:30:23 001 olb_Server: Logged into babar2
> 050322 16:31:01 001 Receive From babar2: 1@0 ping
> 050322 16:31:04 24758 Admin_Login received admin request: ''
> 050322 16:31:04 24758 olb_Login: Primary server 24742 logged out
>
> # xrdcp output (using version 20050316-1316)
>
> 050322 16:22:46 001 Xrd: GetDomainToMatch GetHostName(f01-001-116.gridka.de)
> returned name=f01-001-116.gridka.de
> 050322 16:22:46 001 Xrd: GetDomainToMatch GetDomain(f01-001-116.gridka.de) -->
> gridka.de
> 050322 16:22:46 001 Xrd: CheckHostDomain Resolved [f01-001-116.gridka.de]'s
> domain name into [gridka.de]
> 050322 16:22:46 001 Xrd: CheckHostDomain Access granted to the domain of
> [f01-001-116.gridka.de].
> 050322 16:22:46 001 Xrd: GetDomainToMatch GetHostName(f01-001-116.gridka.de)
> returned name=f01-001-116.gridka.de
> 050322 16:22:46 001 Xrd: GetDomainToMatch GetDomain(f01-001-116.gridka.de) -->
> gridka.de
> 050322 16:22:46 001 Xrd: CheckHostDomain Resolved [f01-001-116.gridka.de]'s
> domain name into [gridka.de]
> 050322 16:22:46 001 Xrd: CheckHostDomain Access granted to the domain of
> [f01-001-116.gridka.de].
> 050322 16:22:46 001 Xrd: CreateTXNf Trying to connect to
> f01-001-116.gridka.de:1094. Connect try 1
> 050322 16:22:46 001 Xrd: Connect Creating a logical connection...
> 050322 16:22:46 001 Xrd: Connect Physical connection not found. Creating a new
> one...
> 050322 16:22:46 001 Xrd: Connect Connecting to [f01-001-116.gridka.de:1094]
> 050322 16:22:46 001 Xrd: ClientSock::TryConnect Trying to connect
> tof01-001-116.gridka.de(10.65.1.116):1094 Timeout=60
> 050322 16:22:46 001 Xrd: Connect Connected to [f01-001-116.gridka.de:1094]
> 050322 16:22:46 001 Xrd: Connect New physical connection to server
> f01-001-116.gridka.de:1094 succesfully created.
> 050322 16:22:46 001 Xrd: Connect LogConn: size:1 count: 1PhyConn: size:1 count:
> 1
> 050322 16:22:46 001 Xrd: Connect Connect(f01-001-116.gridka.de, 1094) returned
> 0
> 050322 16:22:46 001 Xrd: CreateTXNf The logical connection id is 0. This will
> be the streamid for this client
> 050322 16:22:46 001 Xrd: CreateTXNf Working url is f01-001-116.gridka.de:1094//
> 050322 16:22:46 001 Xrd: DoHandShake HandShake step 1: Sending 20 bytes to the
> server [f01-001-116.gridka.de:1094]
> 050322 16:22:46 001 Xrd: DoHandShake HandShake step 2: Reading 4 bytes from
> server [f01-001-116.gridka.de:1094].
> 050322 16:22:48 001 Xrd: ClientSock::RecvRaw Disconnection detected reading 4
> bytes from socket 4 (server[f01-001-116.gridka.de:1094]). Revents=25
> 050322 16:22:48 001 Xrd: ReadRaw Read error on f01-001-116.gridka.de:1094.
> errno=22
> 050322 16:22:48 001 Xrd: ReadRaw Disconnection reported
> onf01-001-116.gridka.de:1094
> 050322 16:22:48 001 Xrd: DoHandShake Error reading 4 bytes from server
> [f01-001-116.gridka.de:1094].
> 050322 16:22:48 001 Xrd: StartReader Starting reader thread...
> 050322 16:22:48 000 Xrd: SocketReaderThread Reader Thread starting.
> 050322 16:22:48 000 Xrd: ReadRaw Socket is disconnected.
> 050322 16:22:48 001 Xrd: GetAccessToSrv HandShake failed with server
> [f01-001-116.gridka.de:1094]
> 050322 16:22:48 001 Xrd: CreateTXNf Access to server failed
> 050322 16:22:48 001 Xrd: CreateTXNf Disconnecting.
> 050322 16:22:48 001 Xrd: Disconnect Destroying nonexistent logconn 0
> 050322 16:22:48 001 Xrd: Create Connection attempt failed. Sleeping 10 seconds.
>
> # core file
>
> -bash-2.05b$ gdb software/current/bin/xrootd  core.14044
> GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "i386-redhat-linux-gnu"...(no debugging symbols
> found)...Using host libthread_db library "/lib/tls/libthread_db.so.1".
>
> Core was generated by `/home/xrootd/software/current/bin/xrootd -p 1094 -l
> /tmp/f01-010-110.xrdlog -c'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /lib/libnsl.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnsl.so.1
> Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/tls/libpthread.so.0
> Reading symbols from /lib/tls/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/tls/librt.so.1
> Reading symbols from /lib/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libdl.so.2
> Reading symbols from /usr/lib/libstdc++.so.5...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib/libstdc++.so.5
> Reading symbols from /lib/tls/libm.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/tls/libm.so.6
> Reading symbols from /lib/tls/libc.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/tls/libc.so.6
> Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libgcc_s.so.1
> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_files.so.2
> Reading symbols from
> /home/xrootd/software/20050316-1316/lib/libXrdOfs.so...(no debugging
> symbols found)...done.
> Loaded symbols for /home/xrootd/software/current/lib/libXrdOfs.so
> Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libnss_dns.so.2
> Reading symbols from /lib/libresolv.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib/libresolv.so.2
> #0  0xb7407198 in strcmp () from /lib/tls/libc.so.6
> (gdb) backtrace
> #0  0xb7407198 in strcmp () from /lib/tls/libc.so.6
> #1  0x08079903 in XrdNet::Trim ()
> #2  0x0806d165 in XrdLink::Alloc ()
> #3  0x08078cd1 in XrdInet::Accept ()
> #4  0x0806f58b in main ()
>
>
>
>
>
>
>
>
>