Hi Gregory, rereading this message, I realized that this scenario is very similar to the tricky one that Andy and I debugged and fixed together last week. So, it might be useful for us to know: - which client/server verison you are using (or the head of which day if you are used to take the cvs head) - what the client side is doing (xrdcp or some other prog?) In particular which flags/options you specified in the Open request. In any case, from the logs you provided, I see no evidence of the fact that the client is creating many connections. I suggest you to increase the debug level at both sides, just to make things (or bugs) clearer. Fabrizio Gregory J. Sharp wrote: > I have stared at the code for nearly a day, and I can't figure this one > out. (Maybe 4 hours sleep last night just wasn't enough?) The message is > long, but hopefully the log extracts hold the clues to solve it. > > My setup is that sol199 is the director and lnx6211 is the dataserver. > > My xrootd data director on sol199 produces the following messages for > every connection. It looks to my naive eye that the connections are not > being closed cleanly, but perhaps "link read error" is just a poor > choice of error message. It occurs in two places in the code, so it > isn't clear which piece of code produces the error. Anyway, things > pretty much work okay while this is going on... > > 041216 12:46:49 017 XrootdXeq: User logged in as gregor.31733:17@lnx7108 > 041216 12:47:28 020 XrdLink: gregor.31733:17@lnx7108 disconnected after > 0:00:39 > (link read error) > 041216 12:48:53 019 XrootdXeq: User logged in as gregor.31739:18@lnx7108 > 041216 12:48:54 019 XrdLink: gregor.31739:18@lnx7108 disconnected after > 0:00:01 > (link read error) > > Then suddenly I get this in the xrootd data server log... lots of > connections being made but never terminated. > > 041216 12:49:14 020 XrootdXeq: User logged in as gregor.31754:17@lnx7108 > 041216 12:51:22 016 XrootdXeq: User logged in as gregor.31754:18@lnx7108 > 041216 12:53:22 018 XrootdXeq: User logged in as gregor.31754:19@lnx7108 > 041216 12:53:51 017 XrootdXeq: User logged in as gregor.31764:20@lnx7108 > 041216 12:55:51 019 XrootdXeq: User logged in as gregor.31764:21@lnx7108 > 041216 12:55:54 021 XrootdXeq: User logged in as gregor.31769:22@lnx7108 > 041216 12:55:56 022 XrootdXeq: User logged in as gregor.31773:23@lnx7108 > > Meanwhile, the client doing the connecting keeps printing > > 041216 12:51:22 001 Xrd: ReadPartialAnswer Error reading msg from > connmgr (server [sol199.lns.cornell.edu:1094]). > 041216 12:53:22 001 Xrd: ReadPartialAnswer Error reading msg from > connmgr (server [sol199.lns.cornell.edu:1094]). > > until I kill it. > > When I do an ls /proc/21843/fd on lnx6211 I see the following: > total 0 > lr-x------ 1 gregor cleo 64 Dec 16 13:00 0 -> /dev/null > l-wx------ 1 gregor cleo 64 Dec 16 13:00 1 -> > /A/lns101/nfs/homes/cleo/gregor/xrootd.inst/logs/xrd.nohup.lnx6211 > lrwx------ 1 gregor cleo 64 Dec 16 13:00 10 -> > socket:[10855876] > lrwx------ 1 gregor cleo 64 Dec 16 13:00 11 -> > socket:[10855893] > l-wx------ 1 gregor cleo 64 Dec 16 13:00 2 -> > /A/lns101/nfs/homes/cleo/gregor/xrootd.inst/logs/xrootd-lnx6211 > l-wx------ 1 gregor cleo 64 Dec 16 13:00 3 -> > /A/lns101/nfs/homes/cleo/gregor/xrootd.inst/logs/xrd.nohup.lnx6211 > lr-x------ 1 gregor cleo 64 Dec 16 13:00 4 -> > pipe:[10855851] > l-wx------ 1 gregor cleo 64 Dec 16 13:00 5 -> > pipe:[10855851] > lr-x------ 1 gregor cleo 64 Dec 16 13:00 6 -> > pipe:[10855852] > l-wx------ 1 gregor cleo 64 Dec 16 13:00 7 -> > pipe:[10855852] > lr-x------ 1 gregor cleo 64 Dec 16 13:00 8 -> > pipe:[10855853] > l-wx------ 1 gregor cleo 64 Dec 16 13:00 9 -> > pipe:[10855853] > > But ALL the socket and pipe lines are flashing red to indicate broken > symlinks. It would seem to have lost contact with the director, since it > received no new messages after the point where the client started to > complain. > > On sol199 (solaris 8) there are 24 open files, but none of them > particularly enlightening to me. > > total 32 > c--------- 1 root sys 13, 2 Dec 16 11:07 0 > --w------- 1 gregor cleo 0 Dec 16 11:32 1 > p--------- 0 gregor cleo 0 Dec 16 12:29 10 > c--------- 0 root root 138, 2 Dec 16 12:30 11 > p--------- 0 gregor cleo 0 Dec 16 12:29 12 > p--------- 0 gregor cleo 0 Dec 16 12:29 13 > c--------- 0 root root 41,997 Dec 16 11:32 14 > s--------- 0 root root 0 Dec 16 11:32 15 > s--------- 0 root root 0 Dec 16 12:49 16 > s--------- 0 root root 0 Dec 16 12:49 17 > s--------- 0 root root 0 Dec 16 12:51 18 > s--------- 0 root root 0 Dec 16 12:53 19 > --w------- 1 gregor cleo 16282 Dec 16 12:55 2 > s--------- 0 root root 0 Dec 16 12:53 20 > s--------- 0 root root 0 Dec 16 12:55 21 > s--------- 0 root root 0 Dec 16 12:55 22 > s--------- 0 root root 0 Dec 16 12:55 23 > s--------- 0 root root 0 Dec 16 12:55 24 > D--------- 1 root root 0 Jul 17 2002 3 > --w------- 1 gregor cleo 0 Dec 16 11:32 4 > c--------- 1 root sys 138, 0 Dec 16 12:49 5 > p--------- 0 gregor cleo 0 Dec 16 12:49 6 > p--------- 0 gregor cleo 0 Dec 16 12:49 7 > c--------- 0 root root 138, 1 Dec 16 12:30 8 > p--------- 0 gregor cleo 0 Dec 16 12:29 9 > > -- > Gregory J. Sharp email: [log in to unmask] > Wilson Synchrotron Laboratory url: http://www.lepp.cornell.edu/~gregor > Dryden Rd ph: +1 607 255 4882 > Ithaca, NY 14853 fax: +1 607 255 8062