Print

Print


This is probably only interesting to Andy and Fabrizio. (You were 
warned. Here be tedious dragons!)

I finally reproduced the error by turning off ALL tracing in ... the 
redirector. At a certain point way below you see that the client timed 
out its connections and tried unsuccessfully to reconnect.  In the 
section after the client debug, you will see the director debug - and 
it definitely is not closing connections from the client, even though 
the client is bravely making new ones. It even hits its thread limit. 
My naive guess is to suspect that the client thinks the connection 
failed, while the server does not. (Not much insight there, but I give 
it to you free of charge.)

Something I would like to try:

Andy, can you please give me the file# and approximate line# for the 
problem you mentioned yesterday about an "==" that should be a "!=". I 
would like to fix that and see if it makes any substantive difference.

One extra data point - I am able to reproduce this bug with the 
20041212 release, as well as the 20041214 release. So it isn't due to a 
change between those two releases.

The way I generally trigger this problem is to have 5 or 6 clients beat 
on a single director with a single server.

---- XRDCLIENT OUTPUT @ debug level 1 -----
This is the output of the client.
Fri Dec 17 14:31:42 2004
%src/bin/pdsTest-INFO: Will use 
root://sol199:1094//cleo/test/mc_test_tiny.pds t
o read from
041217 14:31:42 001 Xrd: Create (C) 2004 SLAC INFN XrdClient 0.3
041217 14:31:42 001 Xrd: XrdClientUrlSet List of servers to connect to 
is [sol19
9:1094]
041217 14:31:42 001 Xrd: ShowUrls The converted URLs count is 1
041217 14:31:42 001 Xrd: ShowUrls URL n.1: 
sol199.lns.cornell.edu:1094//.
041217 14:31:42 001 Xrd: Create Access to server granted.
041217 14:31:42 001 Xrd: Create Opening the remote file 
/cleo/test/mc_test_tiny.
pds
041217 14:33:42 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:33:42 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:33:42 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:33:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:35:42 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:35:42 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:35:42 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:35:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:37:42 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:37:42 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:37:42 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:37:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:39:42 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:39:42 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:39:42 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:39:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:41:44 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:41:44 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:41:44 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:41:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:43:43 000 Xrd: GarbageCollect Purging physical connection 0
041217 14:43:43 000 Xrd: GarbageCollect Purged physical connection 0
041217 14:43:44 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:43:44 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:43:44 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:43:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:45:43 000 Xrd: GarbageCollect Purging physical connection 1
041217 14:45:43 000 Xrd: GarbageCollect Purged physical connection 1
041217 14:45:44 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:45:44 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:45:44 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:45:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:47:43 000 Xrd: GarbageCollect Purging physical connection 2
041217 14:47:43 000 Xrd: GarbageCollect Purged physical connection 2
041217 14:47:44 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:47:44 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:47:44 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:47:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:49:43 000 Xrd: GarbageCollect Purging physical connection 3
041217 14:49:43 000 Xrd: GarbageCollect Purged physical connection 3
041217 14:49:44 001 Xrd: ReadPartialAnswer Error reading msg from 
connmgr (serve
r [sol199.lns.cornell.edu:1094]).
041217 14:49:44 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:49:44 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:49:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:50:45 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:50:45 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:50:45 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:50:45 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:50:45 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:50:55 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:51:45 000 Xrd: GarbageCollect Purging physical connection 4
041217 14:51:45 000 Xrd: GarbageCollect Purged physical connection 4
041217 14:51:56 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:51:56 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:51:56 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:51:56 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:51:56 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:52:06 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:53:07 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:53:07 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:53:07 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:53:07 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:53:07 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:53:17 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:53:45 000 Xrd: GarbageCollect Purging physical connection 5
041217 14:53:45 000 Xrd: GarbageCollect Purged physical connection 5
041217 14:54:18 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:54:18 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:54:18 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:54:18 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:54:18 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:54:28 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:55:29 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:55:29 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:55:29 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:55:29 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:55:29 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:55:39 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:55:45 000 Xrd: GarbageCollect Purging physical connection 6
041217 14:55:45 000 Xrd: GarbageCollect Purged physical connection 6
041217 14:55:46 000 Xrd: GarbageCollect Purging physical connection 9
041217 14:55:46 000 Xrd: GarbageCollect Purged physical connection 9
041217 14:56:40 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:56:40 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:56:40 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:56:40 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]
041217 14:56:40 000 Xrd: XrdClientMessage::ReadRaw Error reading 8 
bytes.
041217 14:56:50 001 Xrd: HandleServerError Received redirection to 
[sol199.lns.c
ornell.edu:1094]. Token=[].
041217 14:56:57 000 Xrd: GarbageCollect Purging physical connection 10
041217 14:56:57 000 Xrd: GarbageCollect Purged physical connection 10
041217 14:57:45 000 Xrd: GarbageCollect Purging physical connection 7
041217 14:57:45 000 Xrd: GarbageCollect Purged physical connection 7
041217 14:57:51 001 Xrd: DoHandShake Error reading 4 bytes from server 
[sol199.l
ns.cornell.edu:1094].
041217 14:57:51 001 Xrd: GetAccessToSrv HandShake failed with server 
[sol199.lns
.cornell.edu:1094]
041217 14:57:51 000 Xrd: ClientSock::RecvRaw Disconnection detected 
reading 8 by
tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
041217 14:57:51 001 Xrd: GoToAnotherServer Error handshaking to 
[sol199.lns.corn
ell.edu:1094]

----- DIRECTOR XROOTD SVR LOG FILE ----- (no messages in the OLBD 
director logfile)
041217 14:31:14 016 XrdLink: gregor.9186:17@lnx243 disconnected after 
0:00:01 (l
ink read error)
041217 14:31:14 007 XrootdXeq: User logged in as gregor.26612:21@lnx6211
041217 14:31:14 015 XrootdXeq: User logged in as gregor.9191:17@lnx243
041217 14:31:21 016 XrdLink: gregor.26612:21@lnx6211 disconnected after 
0:00:07
(link read error)
041217 14:31:21 007 XrootdXeq: User logged in as gregor.26617:20@lnx6211
041217 14:31:41 016 XrdLink: gregor.8740:18@sol184 disconnected after 
0:04:32 (l
ink read error)
041217 14:31:42 017 XrootdXeq: User logged in as gregor.8748:21@sol184
041217 14:31:56 016 XrdLink: gregor.3065:19@sol199 disconnected after 
0:03:33 (l
ink read error)
041217 14:33:21 016 XrootdXeq: User logged in as gregor.26617:18@lnx6211
041217 14:33:22 018 XrootdXeq: User logged in as gregor.9191:19@lnx243
041217 14:33:42 019 XrootdXeq: User logged in as gregor.8748:22@sol184
041217 14:35:21 020 XrootdXeq: User logged in as gregor.26617:23@lnx6211
041217 14:35:22 021 XrootdXeq: User logged in as gregor.9191:24@lnx243
041217 14:35:42 022 XrootdXeq: User logged in as gregor.8748:25@sol184
041217 14:37:21 023 XrootdXeq: User logged in as gregor.26617:26@lnx6211
041217 14:37:22 024 XrootdXeq: User logged in as gregor.9191:27@lnx243
041217 14:37:42 025 XrootdXeq: User logged in as gregor.8748:28@sol184
041217 14:39:21 026 XrootdXeq: User logged in as gregor.26617:29@lnx6211
041217 14:39:22 027 XrootdXeq: User logged in as gregor.9191:30@lnx243
041217 14:39:42 028 XrootdXeq: User logged in as gregor.8748:31@sol184
041217 14:41:11 029 XrootdXeq: User logged in as gregor.3086:32@sol199
041217 14:41:21 030 XrootdXeq: User logged in as gregor.26617:33@lnx6211
041217 14:41:22 031 XrootdXeq: User logged in as gregor.9191:34@lnx243
041217 14:41:44 032 XrootdXeq: User logged in as gregor.8748:35@sol184
041217 14:43:11 033 XrootdXeq: User logged in as gregor.3086:36@sol199
041217 14:43:21 034 XrootdXeq: User logged in as gregor.26617:37@lnx6211
041217 14:43:22 035 XrootdXeq: User logged in as gregor.9191:38@lnx243
041217 14:43:44 036 XrootdXeq: User logged in as gregor.8748:39@sol184
041217 14:45:11 037 XrootdXeq: User logged in as gregor.3086:40@sol199
041217 14:45:21 038 XrootdXeq: User logged in as gregor.26617:41@lnx6211
041217 14:45:22 038 XrootdXeq: User logged in as gregor.9191:42@lnx243
041217 14:45:44 038 XrootdXeq: User logged in as gregor.8748:43@sol184
041217 14:47:11 038 XrootdXeq: User logged in as gregor.3086:44@sol199
041217 14:47:21 038 XrootdXeq: User logged in as gregor.26617:45@lnx6211
041217 14:47:22 038 XrootdXeq: User logged in as gregor.9191:46@lnx243
041217 14:47:44 038 XrootdXeq: User logged in as gregor.8748:47@sol184
041217 14:49:11 038 XrdScheduler: Thread limit has been reached!
041217 14:49:11 038 XrootdXeq: User logged in as gregor.3086:48@sol199


--
Gregory J. Sharp                   email: [log in to unmask]
Wilson Synchrotron Laboratory      url: 
http://www.lepp.cornell.edu/~gregor
Dryden Rd                          ph:  +1 607 255 4882
Ithaca, NY 14853                   fax: +1 607 255 8062