Hi Andrew, I read the document. and write a config file(http://wisconsin.cern.ch/~wguan/xrdcluster.cfg). I used my conf, I can see manager is dispatch message to supervisor. But I cannot see any dataserver tries to connect to the supervisor. At the same time, in the manager's log, I can see some dataserver are Dropped. How does xrootd decide which dataserver will connect supervisor? should I specify some dataservers to connect the supervisor? (*) supervisor log 091211 15:07:00 30028 Dispatch manager.0:20@atlas-bkp2 for state dlen=42 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_State: /atlas/xrootd/users/wguan/test/test131141 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_StateFWD: Path find failed for state /atlas/xrootd/users/wguan/test/test131141 (*)manager log 091211 04:13:24 15661 Admit c185.chtc.wisc.edu TSpace=5587GB NumFS=1 FSpace=5693644MB MinFR=57218MB Util=0 091211 04:13:24 15661 Admit c185.chtc.wisc.edu adding path: w /atlas 091211 04:13:24 15661 server.10585:[log in to unmask]:1094 do_Space: 5696231MB free; 0% util 091211 04:13:24 15661 Protocol: server.10585:[log in to unmask]:1094 logged in. 091211 04:13:24 001 XrdInet: Accepted connection from [log in to unmask] 091211 04:13:24 15661 XrdSched: running ?:[log in to unmask] inq=0 091211 04:13:24 15661 XrdProtocol: matched protocol cmsd 091211 04:13:24 15661 ?:[log in to unmask] XrdPoll: FD 79 attached to poller 2; num=22 091211 04:13:24 15661 Add server.21739:[log in to unmask] bumps server.15905:[log in to unmask]:1094 #63 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:24 15661 Drop_Node: server.15905:[log in to unmask]:1094 dropped. 091211 04:13:24 15661 Add Shoved server.21739:[log in to unmask]:1094 to cluster; id=63.78; num=64; min=51 091211 04:13:24 15661 Update Counts Parm1=1 Parm2=0 091211 04:13:24 15661 Admit c187.chtc.wisc.edu TSpace=5587GB NumFS=1 FSpace=5721854MB MinFR=57218MB Util=0 091211 04:13:24 15661 Admit c187.chtc.wisc.edu adding path: w /atlas 091211 04:13:24 15661 server.21739:[log in to unmask]:1094 do_Space: 5721854MB free; 0% util 091211 04:13:24 15661 Protocol: server.21739:[log in to unmask]:1094 logged in. 091211 04:13:24 15661 XrdLink: Unable to recieve from c187.chtc.wisc.edu; connection reset by peer 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:24 15661 XrdSched: scheduling drop node in 60 seconds 091211 04:13:24 15661 Remove_Node server.21739:[log in to unmask]:1094 node 63.78 091211 04:13:24 15661 Protocol: server.21739:[log in to unmask] logged out. 091211 04:13:24 15661 server.21739:[log in to unmask] XrdPoll: FD 79 detached from poller 2; num=21 091211 04:13:27 15661 Dispatch server.24718:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.24718:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c177.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c177.chtc.wisc.edu FD=16 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.24718:[log in to unmask]:1094 node 0.3 091211 04:13:27 15661 Protocol: server.21656:[log in to unmask] logged out. 091211 04:13:27 15661 server.21656:[log in to unmask] XrdPoll: FD 16 detached from poller 2; num=20 091211 04:13:27 15661 XrdLink: No RecvAll() data from c179.chtc.wisc.edu FD=21 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Remove_Node server.17065:[log in to unmask]:1094 node 1.4 091211 04:13:27 15661 Protocol: server.7978:[log in to unmask] logged out. 091211 04:13:27 15661 server.7978:[log in to unmask] XrdPoll: FD 21 detached from poller 1; num=21 091211 04:13:27 15661 State: Status changed to suspended 091211 04:13:27 15661 Send status to redirector.15656:14@atlas-bkp2 091211 04:13:27 15661 Dispatch server.12937:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.12937:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c182.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c182.chtc.wisc.edu FD=19 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.12937:[log in to unmask]:1094 node 7.10 091211 04:13:27 15661 Protocol: server.26620:[log in to unmask] logged out. 091211 04:13:27 15661 server.26620:[log in to unmask] XrdPoll: FD 19 detached from poller 2; num=19 091211 04:13:27 15661 Dispatch server.10842:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.10842:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c178.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c178.chtc.wisc.edu FD=15 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.10842:[log in to unmask]:1094 node 9.12 091211 04:13:27 15661 Protocol: server.11901:[log in to unmask] logged out. 091211 04:13:27 15661 server.11901:[log in to unmask] XrdPoll: FD 15 detached from poller 1; num=20 091211 04:13:27 15661 Dispatch server.5535:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.5535:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c181.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c181.chtc.wisc.edu FD=17 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.5535:[log in to unmask]:1094 node 5.8 091211 04:13:27 15661 Protocol: server.13984:[log in to unmask] logged out. 091211 04:13:27 15661 server.13984:[log in to unmask] XrdPoll: FD 17 detached from poller 0; num=21 091211 04:13:27 15661 Dispatch server.23711:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.23711:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c183.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c183.chtc.wisc.edu FD=22 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.23711:[log in to unmask]:1094 node 8.11 091211 04:13:27 15661 Protocol: server.27735:[log in to unmask] logged out. 091211 04:13:27 15661 server.27735:[log in to unmask] XrdPoll: FD 22 detached from poller 2; num=18 091211 04:13:27 15661 XrdLink: No RecvAll() data from c184.chtc.wisc.edu FD=20 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Remove_Node server.4131:[log in to unmask]:1094 node 3.6 091211 04:13:27 15661 Protocol: server.26787:[log in to unmask] logged out. 091211 04:13:27 15661 server.26787:[log in to unmask] XrdPoll: FD 20 detached from poller 0; num=20 091211 04:13:27 15661 Dispatch server.10585:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.10585:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c185.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c185.chtc.wisc.edu FD=23 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.10585:[log in to unmask]:1094 node 6.9 091211 04:13:27 15661 Protocol: server.8524:[log in to unmask] logged out. 091211 04:13:27 15661 server.8524:[log in to unmask] XrdPoll: FD 23 detached from poller 0; num=19 091211 04:13:27 15661 Dispatch server.20264:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.20264:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c180.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c180.chtc.wisc.edu FD=18 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.20264:[log in to unmask]:1094 node 4.7 091211 04:13:27 15661 Protocol: server.14636:[log in to unmask] logged out. 091211 04:13:27 15661 server.14636:[log in to unmask] XrdPoll: FD 18 detached from poller 1; num=19 091211 04:13:27 15661 Dispatch server.1656:[log in to unmask]:1094 for status dlen=0 091211 04:13:27 15661 server.1656:[log in to unmask]:1094 do_Status: suspend 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 091211 04:13:27 15661 Node: c186.chtc.wisc.edu service suspended 091211 04:13:27 15661 XrdLink: No RecvAll() data from c186.chtc.wisc.edu FD=24 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 091211 04:13:27 15661 Remove_Node server.1656:[log in to unmask]:1094 node 2.5 091211 04:13:27 15661 Protocol: server.7849:[log in to unmask] logged out. 091211 04:13:27 15661 server.7849:[log in to unmask] XrdPoll: FD 24 detached from poller 1; num=18 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: running drop node inq=0 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 seconds 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.66 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.68 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.69 cancelled. 091211 04:14:24 15661 Drop_Node 63.67 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.70 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.71 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.72 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.73 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.74 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.75 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=1 091211 04:14:24 15661 Drop_Node 63.76 cancelled. 091211 04:14:24 15661 XrdSched: Now have 68 workers 091211 04:14:24 15661 XrdSched: running drop node inq=0 091211 04:14:24 15661 Drop_Node 63.77 cancelled. 091211 04:14:24 15661 XrdSched: running drop node inq=0 Wen On Fri, Dec 11, 2009 at 9:50 PM, Andrew Hanushevsky <[log in to unmask]> wrote: > Hi Wen, > > To go past 64 data servers you will need to setup one or more supervisors. > This does not logically change the current configuration you have. You only > need to configure one or more *new* servers (or at least xrootd processes) > whose role is supervisor. We'd like them to run in separate machines for > reliability purposes, but they could run on the manager node as long as you > give each one a unique instance name (i.e., -n option). > > The front part of the cmsd reference explains how to do this. > > http://xrootd.slac.stanford.edu/doc/prod/cms_config.htm > > Andy > > On Fri, 11 Dec 2009, wen guan wrote: > >> Hi Andrew, >> >> Is there any change to configure xrootd with more than 65 >> machines? I used the configure below but it doesn't work. Should I >> configure some machines' manager to be supvervisor? >> >> http://wisconsin.cern.ch/~wguan/xrdcluster.cfg >> >> >> Wen >> >