Hi Gregory,
If I could see a gcore of the xrootd after it gets into the "not close"
state, that would give me more information. This assumes that xrootd was
compiled with -g. I would need the core and matching executable.
Andy
On Fri, 17 Dec 2004, Gregory J. Sharp wrote:
> This is probably only interesting to Andy and Fabrizio. (You were
> warned. Here be tedious dragons!)
>
> I finally reproduced the error by turning off ALL tracing in ... the
> redirector. At a certain point way below you see that the client timed
> out its connections and tried unsuccessfully to reconnect. In the
> section after the client debug, you will see the director debug - and
> it definitely is not closing connections from the client, even though
> the client is bravely making new ones. It even hits its thread limit.
> My naive guess is to suspect that the client thinks the connection
> failed, while the server does not. (Not much insight there, but I give
> it to you free of charge.)
>
> Something I would like to try:
>
> Andy, can you please give me the file# and approximate line# for the
> problem you mentioned yesterday about an "==" that should be a "!=". I
> would like to fix that and see if it makes any substantive difference.
>
> One extra data point - I am able to reproduce this bug with the
> 20041212 release, as well as the 20041214 release. So it isn't due to a
> change between those two releases.
>
> The way I generally trigger this problem is to have 5 or 6 clients beat
> on a single director with a single server.
>
> ---- XRDCLIENT OUTPUT @ debug level 1 -----
> This is the output of the client.
> Fri Dec 17 14:31:42 2004
> %src/bin/pdsTest-INFO: Will use
> root://sol199:1094//cleo/test/mc_test_tiny.pds t
> o read from
> 041217 14:31:42 001 Xrd: Create (C) 2004 SLAC INFN XrdClient 0.3
> 041217 14:31:42 001 Xrd: XrdClientUrlSet List of servers to connect to
> is [sol19
> 9:1094]
> 041217 14:31:42 001 Xrd: ShowUrls The converted URLs count is 1
> 041217 14:31:42 001 Xrd: ShowUrls URL n.1:
> sol199.lns.cornell.edu:1094//.
> 041217 14:31:42 001 Xrd: Create Access to server granted.
> 041217 14:31:42 001 Xrd: Create Opening the remote file
> /cleo/test/mc_test_tiny.
> pds
> 041217 14:33:42 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:33:42 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:33:42 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:33:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:35:42 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:35:42 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:35:42 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:35:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:37:42 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:37:42 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:37:42 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:37:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:39:42 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:39:42 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:39:42 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:39:42 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:41:44 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:41:44 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:41:44 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:41:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:43:43 000 Xrd: GarbageCollect Purging physical connection 0
> 041217 14:43:43 000 Xrd: GarbageCollect Purged physical connection 0
> 041217 14:43:44 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:43:44 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:43:44 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:43:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:45:43 000 Xrd: GarbageCollect Purging physical connection 1
> 041217 14:45:43 000 Xrd: GarbageCollect Purged physical connection 1
> 041217 14:45:44 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:45:44 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:45:44 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:45:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:47:43 000 Xrd: GarbageCollect Purging physical connection 2
> 041217 14:47:43 000 Xrd: GarbageCollect Purged physical connection 2
> 041217 14:47:44 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:47:44 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:47:44 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:47:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:49:43 000 Xrd: GarbageCollect Purging physical connection 3
> 041217 14:49:43 000 Xrd: GarbageCollect Purged physical connection 3
> 041217 14:49:44 001 Xrd: ReadPartialAnswer Error reading msg from
> connmgr (serve
> r [sol199.lns.cornell.edu:1094]).
> 041217 14:49:44 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:49:44 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:49:44 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:50:45 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:50:45 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:50:45 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:50:45 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:50:45 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:50:55 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:51:45 000 Xrd: GarbageCollect Purging physical connection 4
> 041217 14:51:45 000 Xrd: GarbageCollect Purged physical connection 4
> 041217 14:51:56 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:51:56 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:51:56 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:51:56 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:51:56 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:52:06 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:53:07 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:53:07 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:53:07 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:53:07 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:53:07 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:53:17 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:53:45 000 Xrd: GarbageCollect Purging physical connection 5
> 041217 14:53:45 000 Xrd: GarbageCollect Purged physical connection 5
> 041217 14:54:18 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:54:18 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:54:18 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:54:18 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:54:18 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:54:28 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:55:29 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:55:29 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:55:29 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:55:29 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:55:29 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:55:39 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:55:45 000 Xrd: GarbageCollect Purging physical connection 6
> 041217 14:55:45 000 Xrd: GarbageCollect Purged physical connection 6
> 041217 14:55:46 000 Xrd: GarbageCollect Purging physical connection 9
> 041217 14:55:46 000 Xrd: GarbageCollect Purged physical connection 9
> 041217 14:56:40 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:56:40 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:56:40 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:56:40 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
> 041217 14:56:40 000 Xrd: XrdClientMessage::ReadRaw Error reading 8
> bytes.
> 041217 14:56:50 001 Xrd: HandleServerError Received redirection to
> [sol199.lns.c
> ornell.edu:1094]. Token=[].
> 041217 14:56:57 000 Xrd: GarbageCollect Purging physical connection 10
> 041217 14:56:57 000 Xrd: GarbageCollect Purged physical connection 10
> 041217 14:57:45 000 Xrd: GarbageCollect Purging physical connection 7
> 041217 14:57:45 000 Xrd: GarbageCollect Purged physical connection 7
> 041217 14:57:51 001 Xrd: DoHandShake Error reading 4 bytes from server
> [sol199.l
> ns.cornell.edu:1094].
> 041217 14:57:51 001 Xrd: GetAccessToSrv HandShake failed with server
> [sol199.lns
> .cornell.edu:1094]
> 041217 14:57:51 000 Xrd: ClientSock::RecvRaw Disconnection detected
> reading 8 by
> tes from socket 5 (server[sol199.lns.cornell.edu:1094]). Revents=32
> 041217 14:57:51 001 Xrd: GoToAnotherServer Error handshaking to
> [sol199.lns.corn
> ell.edu:1094]
>
> ----- DIRECTOR XROOTD SVR LOG FILE ----- (no messages in the OLBD
> director logfile)
> 041217 14:31:14 016 XrdLink: gregor.9186:17@lnx243 disconnected after
> 0:00:01 (l
> ink read error)
> 041217 14:31:14 007 XrootdXeq: User logged in as gregor.26612:21@lnx6211
> 041217 14:31:14 015 XrootdXeq: User logged in as gregor.9191:17@lnx243
> 041217 14:31:21 016 XrdLink: gregor.26612:21@lnx6211 disconnected after
> 0:00:07
> (link read error)
> 041217 14:31:21 007 XrootdXeq: User logged in as gregor.26617:20@lnx6211
> 041217 14:31:41 016 XrdLink: gregor.8740:18@sol184 disconnected after
> 0:04:32 (l
> ink read error)
> 041217 14:31:42 017 XrootdXeq: User logged in as gregor.8748:21@sol184
> 041217 14:31:56 016 XrdLink: gregor.3065:19@sol199 disconnected after
> 0:03:33 (l
> ink read error)
> 041217 14:33:21 016 XrootdXeq: User logged in as gregor.26617:18@lnx6211
> 041217 14:33:22 018 XrootdXeq: User logged in as gregor.9191:19@lnx243
> 041217 14:33:42 019 XrootdXeq: User logged in as gregor.8748:22@sol184
> 041217 14:35:21 020 XrootdXeq: User logged in as gregor.26617:23@lnx6211
> 041217 14:35:22 021 XrootdXeq: User logged in as gregor.9191:24@lnx243
> 041217 14:35:42 022 XrootdXeq: User logged in as gregor.8748:25@sol184
> 041217 14:37:21 023 XrootdXeq: User logged in as gregor.26617:26@lnx6211
> 041217 14:37:22 024 XrootdXeq: User logged in as gregor.9191:27@lnx243
> 041217 14:37:42 025 XrootdXeq: User logged in as gregor.8748:28@sol184
> 041217 14:39:21 026 XrootdXeq: User logged in as gregor.26617:29@lnx6211
> 041217 14:39:22 027 XrootdXeq: User logged in as gregor.9191:30@lnx243
> 041217 14:39:42 028 XrootdXeq: User logged in as gregor.8748:31@sol184
> 041217 14:41:11 029 XrootdXeq: User logged in as gregor.3086:32@sol199
> 041217 14:41:21 030 XrootdXeq: User logged in as gregor.26617:33@lnx6211
> 041217 14:41:22 031 XrootdXeq: User logged in as gregor.9191:34@lnx243
> 041217 14:41:44 032 XrootdXeq: User logged in as gregor.8748:35@sol184
> 041217 14:43:11 033 XrootdXeq: User logged in as gregor.3086:36@sol199
> 041217 14:43:21 034 XrootdXeq: User logged in as gregor.26617:37@lnx6211
> 041217 14:43:22 035 XrootdXeq: User logged in as gregor.9191:38@lnx243
> 041217 14:43:44 036 XrootdXeq: User logged in as gregor.8748:39@sol184
> 041217 14:45:11 037 XrootdXeq: User logged in as gregor.3086:40@sol199
> 041217 14:45:21 038 XrootdXeq: User logged in as gregor.26617:41@lnx6211
> 041217 14:45:22 038 XrootdXeq: User logged in as gregor.9191:42@lnx243
> 041217 14:45:44 038 XrootdXeq: User logged in as gregor.8748:43@sol184
> 041217 14:47:11 038 XrootdXeq: User logged in as gregor.3086:44@sol199
> 041217 14:47:21 038 XrootdXeq: User logged in as gregor.26617:45@lnx6211
> 041217 14:47:22 038 XrootdXeq: User logged in as gregor.9191:46@lnx243
> 041217 14:47:44 038 XrootdXeq: User logged in as gregor.8748:47@sol184
> 041217 14:49:11 038 XrdScheduler: Thread limit has been reached!
> 041217 14:49:11 038 XrootdXeq: User logged in as gregor.3086:48@sol199
>
>
> --
> Gregory J. Sharp email: [log in to unmask]
> Wilson Synchrotron Laboratory url:
> http://www.lepp.cornell.edu/~gregor
> Dryden Rd ph: +1 607 255 4882
> Ithaca, NY 14853 fax: +1 607 255 8062
>
>
|