Hi Andrew, It still doesn't work. The log file is in higgs03.cs.wisc.edu/wguan/. The name is *.20091216 The manager complains there are too many subscribers and the removes nodes. (*) Add server.10040:[log in to unmask] redirected; too many subscribers. Wen On Wed, Dec 16, 2009 at 4:25 AM, Andrew Hanushevsky <[log in to unmask]> wrote: > Hi Wen, > > It will be easier for me to retroft as the changes were pretty minor. Please > lift the new XrdCmsNode.cc file from > > http://www.slac.stanford.edu/~abh/cmsd > > Andy > > ----- Original Message ----- From: "wen guan" <[log in to unmask]> > To: "Andrew Hanushevsky" <[log in to unmask]> > Cc: <[log in to unmask]> > Sent: Tuesday, December 15, 2009 5:12 PM > Subject: Re: xrootd with more than 65 machines > > > Hi Andy, > > I can switch to 20091104-1102. Then you don't need to patch > another version. How can I download v20091104-1102? > > Thanks > Wen > > On Wed, Dec 16, 2009 at 12:52 AM, Andrew Hanushevsky <[log in to unmask]> > wrote: >> >> Hi Wen, >> >> Ah yes, I see that now. The file I gave you is based on v20091104-1102. >> Let >> me see if I can retrofit the patch for you. >> >> Andy >> >> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >> To: "Andrew Hanushevsky" <[log in to unmask]> >> Cc: <[log in to unmask]> >> Sent: Tuesday, December 15, 2009 1:04 PM >> Subject: Re: xrootd with more than 65 machines >> >> >> Hi Andy, >> >> Which xrootd version are you using? XrdCmsConfig.hh is different. >> XrdCmsConfig.hh is downloaded from >> http://xrootd.slac.stanford.edu/download/20091028-1003/. >> >> [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsNode.cc >> 6fb3ae40fe4e10bdd4d372818a341f2c src/XrdCms/XrdCmsNode.cc >> [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsConfig.hh >> 7d57753847d9448186c718f98e963cbe src/XrdCms/XrdCmsConfig.hh >> >> Thanks >> Wen >> >> On Tue, Dec 15, 2009 at 10:50 PM, Andrew Hanushevsky <[log in to unmask]> >> wrote: >>> >>> Hi Wen, >>> >>> Just compiled on Linux and it was clean. Something is really wrong with >>> your >>> source files, specifically XrdCmsConfig.cc >>> >>> The MD5 checksums on the relevant files are: >>> >>> MD5 (XrdCmsNode.cc) = 6fb3ae40fe4e10bdd4d372818a341f2c >>> >>> MD5 (XrdCmsConfig.hh) = 4a7d655582a7cd43b098947d0676924b >>> >>> Andy >>> >>> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >>> To: "Andrew Hanushevsky" <[log in to unmask]> >>> Cc: <[log in to unmask]> >>> Sent: Tuesday, December 15, 2009 4:24 AM >>> Subject: Re: xrootd with more than 65 machines >>> >>> >>> Hi Andy, >>> >>> No problem. Thanks for the fix. But it cannot be compiled. The >>> version I am using is >>> http://xrootd.slac.stanford.edu/download/20091028-1003/. >>> >>> Making cms component... >>> Compiling XrdCmsNode.cc >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Chmod(XrdCmsRRData&)': >>> XrdCmsNode.cc:268: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:268: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:269: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:273: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:273: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Mkdir(XrdCmsRRData&)': >>> XrdCmsNode.cc:600: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:600: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:601: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:605: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:605: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Mkpath(XrdCmsRRData&)': >>> XrdCmsNode.cc:640: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:640: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:641: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:645: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:645: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Mv(XrdCmsRRData&)': >>> XrdCmsNode.cc:704: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:704: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:705: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:709: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:709: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Rm(XrdCmsRRData&)': >>> XrdCmsNode.cc:831: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:831: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:832: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:836: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:836: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Rmdir(XrdCmsRRData&)': >>> XrdCmsNode.cc:873: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:873: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:874: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:878: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:878: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::do_Trunc(XrdCmsRRData&)': >>> XrdCmsNode.cc:1377: error: `fsExec' was not declared in this scope >>> XrdCmsNode.cc:1377: warning: unused variable 'fsExec' >>> XrdCmsNode.cc:1378: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> XrdCmsNode.cc:1382: error: `fsFail' was not declared in this scope >>> XrdCmsNode.cc:1382: warning: unused variable 'fsFail' >>> XrdCmsNode.cc: At global scope: >>> XrdCmsNode.cc:1524: error: no `int XrdCmsNode::fsExec(XrdOucProg*, >>> char*, char*)' member function declared in class `XrdCmsNode' >>> XrdCmsNode.cc: In member function `int XrdCmsNode::fsExec(XrdOucProg*, >>> char*, char*)': >>> XrdCmsNode.cc:1533: error: `fsL2PFail1' was not declared in this scope >>> XrdCmsNode.cc:1533: warning: unused variable 'fsL2PFail1' >>> XrdCmsNode.cc:1537: error: `fsL2PFail2' was not declared in this scope >>> XrdCmsNode.cc:1537: warning: unused variable 'fsL2PFail2' >>> XrdCmsNode.cc: At global scope: >>> XrdCmsNode.cc:1553: error: no `const char* XrdCmsNode::fsFail(const >>> char*, const char*, const char*, int)' member function declared in >>> class `XrdCmsNode' >>> XrdCmsNode.cc: In member function `const char* >>> XrdCmsNode::fsFail(const char*, const char*, const char*, int)': >>> XrdCmsNode.cc:1559: error: `fsL2PFail1' was not declared in this scope >>> XrdCmsNode.cc:1559: warning: unused variable 'fsL2PFail1' >>> XrdCmsNode.cc:1560: error: `fsL2PFail2' was not declared in this scope >>> XrdCmsNode.cc:1560: warning: unused variable 'fsL2PFail2' >>> XrdCmsNode.cc: In static member function `static int >>> XrdCmsNode::isOnline(char*, int)': >>> XrdCmsNode.cc:1608: error: 'class XrdCmsConfig' has no member named >>> 'ossFS' >>> make[4]: *** [../../obj/XrdCmsNode.o] Error 1 >>> make[3]: *** [Linuxall] Error 2 >>> make[2]: *** [all] Error 2 >>> make[1]: *** [XrdCms] Error 2 >>> make: *** [all] Error 2 >>> >>> >>> Wen >>> >>> On Tue, Dec 15, 2009 at 2:08 AM, Andrew Hanushevsky <[log in to unmask]> >>> wrote: >>>> >>>> Hi Wen, >>>> >>>> I have developed a permanent fix. You will find the source files in >>>> >>>> http://www.slac.stanford.edu/~abh/cmsd/ >>>> >>>> There are three files: XrdCmsCluster.cc XrdCmsNode.cc XrdCmsProtocol.cc >>>> >>>> Please do a source replacement and recompile. Unfortunately, the cmsd >>>> will >>>> need to be replaced on each node regardless of role. My apologies for >>>> the >>>> disruption. Please let me know how it goes. >>>> >>>> Andy >>>> >>>> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >>>> To: "Andrew Hanushevsky" <[log in to unmask]> >>>> Cc: <[log in to unmask]> >>>> Sent: Sunday, December 13, 2009 7:04 AM >>>> Subject: Re: xrootd with more than 65 machines >>>> >>>> >>>> Hi Andrew, >>>> >>>> >>>> Thanks. >>>> I used the new cmsd at atlas-bkp1 manager. But it's still dropping >>>> nodes. And in supervisor's log, I cannot find any dataserver to >>>> register to it. >>>> >>>> The new logs are in http://higgs03.cs.wisc.edu/wguan/*.20091213. >>>> The manager is patched at 091213 08:38:15. >>>> >>>> Wen >>>> >>>> On Sun, Dec 13, 2009 at 1:52 AM, Andrew Hanushevsky >>>> <[log in to unmask]> wrote: >>>>> >>>>> Hi Wen >>>>> >>>>> You will find the source replacement at: >>>>> >>>>> http://www.slac.stanford.edu/~abh/cmsd/ >>>>> >>>>> It's XrdCmsCluster.cc and it replaces >>>>> xrootd/src/XrdCms/XrdCmsCluster.cc >>>>> >>>>> I'm stepping out for a couple of hours but will be back to see how >>>>> things >>>>> went. Sorry for the issues :-( >>>>> >>>>> Andy >>>>> >>>>> On Sun, 13 Dec 2009, wen guan wrote: >>>>> >>>>>> Hi Andrew, >>>>>> >>>>>> I prefer a source replacement. Then I can compile it. >>>>>> >>>>>> Thanks >>>>>> Wen >>>>>>> >>>>>>> I can do one of two things here: >>>>>>> >>>>>>> 1) Supply a source replacement and then you would recompile, or >>>>>>> >>>>>>> 2) Give me the uname -a of where the cmsd will run and I'll supply a >>>>>>> binary >>>>>>> replacement for you. >>>>>>> >>>>>>> Your choice. >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> On Sun, 13 Dec 2009, wen guan wrote: >>>>>>> >>>>>>>> Hi Andrew >>>>>>>> >>>>>>>> The problem is found. Great. Thanks. >>>>>>>> >>>>>>>> Where can I find the patched cmsd? >>>>>>>> >>>>>>>> Wen >>>>>>>> >>>>>>>> On Sat, Dec 12, 2009 at 11:36 PM, Andrew Hanushevsky >>>>>>>> <[log in to unmask]> wrote: >>>>>>>>> >>>>>>>>> Hi Wen, >>>>>>>>> >>>>>>>>> I found the problem. Looks like a regression from way back when. >>>>>>>>> There >>>>>>>>> is >>>>>>>>> a >>>>>>>>> missing flag on the redirect. This will require a patched cmsd but >>>>>>>>> you >>>>>>>>> need >>>>>>>>> only to replace the redirector's cmsd as this only affects the >>>>>>>>> redirector. >>>>>>>>> How would you like to proceed? >>>>>>>>> >>>>>>>>> Andy >>>>>>>>> >>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>>> >>>>>>>>>> Hi Andrew, >>>>>>>>>> >>>>>>>>>> It doesn't work. atlas-bkp1 manager still dropping nodes again. >>>>>>>>>> In supervisor, I still haven't seen any dataserver registered. I >>>>>>>>>> said >>>>>>>>>> "I updated the ntp" because you said "the log timestamp do not >>>>>>>>>> overlap". >>>>>>>>>> >>>>>>>>>> Wen >>>>>>>>>> >>>>>>>>>> On Sat, Dec 12, 2009 at 9:33 PM, Andrew Hanushevsky >>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Wen, >>>>>>>>>>> >>>>>>>>>>> Do you mean that everything is now working? It could be that you >>>>>>>>>>> removed >>>>>>>>>>> the >>>>>>>>>>> xrd.timeout directive. That really could cause problems. As for >>>>>>>>>>> the >>>>>>>>>>> delays, >>>>>>>>>>> that is normal when the redirector thinks something is going >>>>>>>>>>> wrong. >>>>>>>>>>> The >>>>>>>>>>> strategy is to delay clients until it can get back to a stable >>>>>>>>>>> configuration. This usually prevents jobs from crashing during >>>>>>>>>>> stressful >>>>>>>>>>> periods. >>>>>>>>>>> >>>>>>>>>>> Andy >>>>>>>>>>> >>>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>> >>>>>>>>>>>> I restarted it to do supervisor test. Also because xrootd >>>>>>>>>>>> manager >>>>>>>>>>>> frequently doesn't response. (*) is the cms.log, the file select >>>>>>>>>>>> is >>>>>>>>>>>> delayed again and again. When do a restart, all things are fine. >>>>>>>>>>>> Now >>>>>>>>>>>> I >>>>>>>>>>>> am trying to find a clue about it. >>>>>>>>>>>> >>>>>>>>>>>> (*) >>>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask] >>>>>>>>>>>> do_Select: >>>>>>>>>>>> wc >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>>> 091212 00:00:19 21318 Select seeking >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>>> 091212 00:00:19 21318 UnkFile rc=1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> path=/atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask] >>>>>>>>>>>> do_Select: >>>>>>>>>>>> delay 5 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 2+-1 post=0 >>>>>>>>>>>> 091212 00:00:19 21318 Dispatch redirector.21313:14@atlas-bkp2 >>>>>>>>>>>> for >>>>>>>>>>>> select dlen=166 >>>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 1+1 post=0 >>>>>>>>>>>> 091212 00:00:19 21318 XrdSched: running redirector inq=0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> There is no core file. I copied a new copies of the logs to the >>>>>>>>>>>> link >>>>>>>>>>>> below. >>>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/ >>>>>>>>>>>> >>>>>>>>>>>> Wen >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Dec 12, 2009 at 3:16 AM, Andrew Hanushevsky >>>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Wen, >>>>>>>>>>>>> >>>>>>>>>>>>> I see in the server log that it is restarting often. Could you >>>>>>>>>>>>> take >>>>>>>>>>>>> a >>>>>>>>>>>>> look >>>>>>>>>>>>> in the c193 to see if you have any core files? Also please make >>>>>>>>>>>>> sure >>>>>>>>>>>>> that >>>>>>>>>>>>> core files are enabled as Linux defaults the size to 0. The >>>>>>>>>>>>> first >>>>>>>>>>>>> step >>>>>>>>>>>>> here >>>>>>>>>>>>> is to find out why your servers are restarting. >>>>>>>>>>>>> >>>>>>>>>>>>> Andy >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>> >>>>>>>>>>>>>> the logs can be found here. From the log you can see >>>>>>>>>>>>>> atlas-bkp1 >>>>>>>>>>>>>> manager are dropping nodes again and again which tries to >>>>>>>>>>>>>> connect >>>>>>>>>>>>>> to >>>>>>>>>>>>>> it. >>>>>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Dec 11, 2009 at 11:41 PM, Andrew Hanushevsky >>>>>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Wen, Could you start everything up and provide me a >>>>>>>>>>>>>>> pointer >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> manager log file, supervisor log file, and one data server >>>>>>>>>>>>>>> logfile >>>>>>>>>>>>>>> all >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> which cover the same time-frame (from start to some point >>>>>>>>>>>>>>> where >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> things are working or not). That way I can see what is >>>>>>>>>>>>>>> happening. >>>>>>>>>>>>>>> At >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> moment I only see two "bad" things in the config file: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Only atlas-bkp1.cs.wisc.edu is designated as a manager but >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>> claim, >>>>>>>>>>>>>>> via >>>>>>>>>>>>>>> the all.manager directive, that there are three (bkp2 and >>>>>>>>>>>>>>> bkp3). >>>>>>>>>>>>>>> While >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> should work, the log file will be dense with error messages. >>>>>>>>>>>>>>> Please >>>>>>>>>>>>>>> correct >>>>>>>>>>>>>>> this to be consistent and make it easier to see real errors. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is not a problem for me. Because this config is used in >>>>>>>>>>>>>> dataserver. In manager, I updated the if >>>>>>>>>>>>>> atlas-bkp1.cs.wisc.edu >>>>>>>>>>>>>> to >>>>>>>>>>>>>> atlas-bkp2 or something. This is a history problem. at first >>>>>>>>>>>>>> only >>>>>>>>>>>>>> atlas-bkp1 is used. atlas-bkp2 and atlas-bkp3 are added later. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) Please use cms.space not olb.space (for historical reasons >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> latter >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> still accepted and over-rides the former, but that will soon >>>>>>>>>>>>>>> end), >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> please use only one (the config file uses both directives). >>>>>>>>>>>>>> >>>>>>>>>>>>>> yes. I should remove this line. in fact cms.space is in the >>>>>>>>>>>>>> cfg >>>>>>>>>>>>>> too. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>> Wen >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The xrootd has an internal mechanism to connect servers with >>>>>>>>>>>>>>> supervisors >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> allow for maximum reliability. You cannot change that >>>>>>>>>>>>>>> algorithm >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> there >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> no need to do so. You should *never* tell anyone to directly >>>>>>>>>>>>>>> connect >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>> supervisor. If you do, you will likely get unreachable nodes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As for dropping data servers, it would appear to me, given >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> flurry >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> such activity, that something either crashed or was >>>>>>>>>>>>>>> restarted. >>>>>>>>>>>>>>> That's >>>>>>>>>>>>>>> why >>>>>>>>>>>>>>> it >>>>>>>>>>>>>>> would be good to see the complete log of each one of the >>>>>>>>>>>>>>> entities. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andy >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I read the document. and write a config >>>>>>>>>>>>>>>> file(http://wisconsin.cern.ch/~wguan/xrdcluster.cfg). >>>>>>>>>>>>>>>> I used my conf, I can see manager is dispatch message to >>>>>>>>>>>>>>>> supervisor. But I cannot see any dataserver tries to connect >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> supervisor. At the same time, in the manager's log, I can >>>>>>>>>>>>>>>> see >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>> dataserver are Dropped. >>>>>>>>>>>>>>>> How does xrootd decide which dataserver will connect >>>>>>>>>>>>>>>> supervisor? >>>>>>>>>>>>>>>> should I specify some dataservers to connect the supervisor? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (*) supervisor log >>>>>>>>>>>>>>>> 091211 15:07:00 30028 Dispatch manager.0:20@atlas-bkp2 for >>>>>>>>>>>>>>>> state >>>>>>>>>>>>>>>> dlen=42 >>>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_State: >>>>>>>>>>>>>>>> /atlas/xrootd/users/wguan/test/test131141 >>>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_StateFWD: >>>>>>>>>>>>>>>> Path >>>>>>>>>>>>>>>> find >>>>>>>>>>>>>>>> failed for state /atlas/xrootd/users/wguan/test/test131141 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (*)manager log >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu TSpace=5587GB >>>>>>>>>>>>>>>> NumFS=1 >>>>>>>>>>>>>>>> FSpace=5693644MB MinFR=57218MB Util=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu adding path: >>>>>>>>>>>>>>>> w >>>>>>>>>>>>>>>> /atlas >>>>>>>>>>>>>>>> 091211 04:13:24 15661 >>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Space: 5696231MB free; 0% util >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 logged in. >>>>>>>>>>>>>>>> 091211 04:13:24 001 XrdInet: Accepted connection from >>>>>>>>>>>>>>>> [log in to unmask] >>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: running >>>>>>>>>>>>>>>> ?:[log in to unmask] >>>>>>>>>>>>>>>> inq=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdProtocol: matched protocol cmsd >>>>>>>>>>>>>>>> 091211 04:13:24 15661 ?:[log in to unmask] XrdPoll: FD 79 >>>>>>>>>>>>>>>> attached >>>>>>>>>>>>>>>> to poller 2; num=22 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Add server.21739:[log in to unmask] >>>>>>>>>>>>>>>> bumps >>>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 #63 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Drop_Node: >>>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 dropped. >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Add Shoved >>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 to cluster; >>>>>>>>>>>>>>>> id=63.78; >>>>>>>>>>>>>>>> num=64; >>>>>>>>>>>>>>>> min=51 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu TSpace=5587GB >>>>>>>>>>>>>>>> NumFS=1 >>>>>>>>>>>>>>>> FSpace=5721854MB MinFR=57218MB Util=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu adding path: >>>>>>>>>>>>>>>> w >>>>>>>>>>>>>>>> /atlas >>>>>>>>>>>>>>>> 091211 04:13:24 15661 >>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Space: 5721854MB free; 0% util >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 logged in. >>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdLink: Unable to recieve from >>>>>>>>>>>>>>>> c187.chtc.wisc.edu; connection reset by peer >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: scheduling drop node in 60 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Remove_Node >>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 node 63.78 >>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>>> server.21739:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:24 15661 server.21739:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 79 detached from poller 2; num=21 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c177.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c177.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=16 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 node 0.3 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.21656:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.21656:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 16 detached from poller 2; num=20 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c179.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=21 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.17065:[log in to unmask]:1094 node 1.4 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.7978:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7978:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 21 >>>>>>>>>>>>>>>> detached from poller 1; num=21 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 State: Status changed to suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Send status to >>>>>>>>>>>>>>>> redirector.15656:14@atlas-bkp2 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c182.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c182.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=19 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 node 7.10 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.26620:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26620:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 19 detached from poller 2; num=19 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c178.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c178.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=15 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 node 9.12 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.11901:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.11901:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 15 detached from poller 1; num=20 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.5535:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c181.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c181.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=17 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094 node 5.8 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.13984:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.13984:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 17 detached from poller 0; num=21 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c183.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c183.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=22 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 node 8.11 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.27735:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.27735:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 22 detached from poller 2; num=18 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c184.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=20 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.4131:[log in to unmask]:1094 node 3.6 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.26787:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26787:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 20 detached from poller 0; num=20 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c185.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c185.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=23 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 node 6.9 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.8524:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.8524:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 23 >>>>>>>>>>>>>>>> detached from poller 0; num=19 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c180.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c180.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=18 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 node 4.7 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.14636:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.14636:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 18 detached from poller 1; num=19 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094 >>>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.1656:[log in to unmask]:1094 >>>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c186.chtc.wisc.edu service >>>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>>> c186.chtc.wisc.edu >>>>>>>>>>>>>>>> FD=24 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094 node 2.5 >>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>>> server.7849:[log in to unmask] >>>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7849:[log in to unmask] >>>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>>> 24 >>>>>>>>>>>>>>>> detached from poller 1; num=18 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.66 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.68 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.69 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.67 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.70 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.71 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.72 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.73 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.74 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.75 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.76 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: Now have 68 workers >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.77 cancelled. >>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Wen >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Dec 11, 2009 at 9:50 PM, Andrew Hanushevsky >>>>>>>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Wen, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To go past 64 data servers you will need to setup one or >>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>> supervisors. >>>>>>>>>>>>>>>>> This does not logically change the current configuration >>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>> have. >>>>>>>>>>>>>>>>> You >>>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> need to configure one or more *new* servers (or at least >>>>>>>>>>>>>>>>> xrootd >>>>>>>>>>>>>>>>> processes) >>>>>>>>>>>>>>>>> whose role is supervisor. We'd like them to run in separate >>>>>>>>>>>>>>>>> machines >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> reliability purposes, but they could run on the manager >>>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>> long >>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>> give each one a unique instance name (i.e., -n option). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The front part of the cmsd reference explains how to do >>>>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> http://xrootd.slac.stanford.edu/doc/prod/cms_config.htm >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Andy >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is there any change to configure xrootd with more than 65 >>>>>>>>>>>>>>>>>> machines? I used the configure below but it doesn't work. >>>>>>>>>>>>>>>>>> Should >>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>> configure some machines' manager to be supvervisor? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> http://wisconsin.cern.ch/~wguan/xrdcluster.cfg >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Wen >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> >> > > >