Hi Wen, It will be easier for me to retroft as the changes were pretty minor. Please lift the new XrdCmsNode.cc file from http://www.slac.stanford.edu/~abh/cmsd Andy ----- Original Message ----- From: "wen guan" <[log in to unmask]> To: "Andrew Hanushevsky" <[log in to unmask]> Cc: <[log in to unmask]> Sent: Tuesday, December 15, 2009 5:12 PM Subject: Re: xrootd with more than 65 machines Hi Andy, I can switch to 20091104-1102. Then you don't need to patch another version. How can I download v20091104-1102? Thanks Wen On Wed, Dec 16, 2009 at 12:52 AM, Andrew Hanushevsky <[log in to unmask]> wrote: > Hi Wen, > > Ah yes, I see that now. The file I gave you is based on v20091104-1102. > Let > me see if I can retrofit the patch for you. > > Andy > > ----- Original Message ----- From: "wen guan" <[log in to unmask]> > To: "Andrew Hanushevsky" <[log in to unmask]> > Cc: <[log in to unmask]> > Sent: Tuesday, December 15, 2009 1:04 PM > Subject: Re: xrootd with more than 65 machines > > > Hi Andy, > > Which xrootd version are you using? XrdCmsConfig.hh is different. > XrdCmsConfig.hh is downloaded from > http://xrootd.slac.stanford.edu/download/20091028-1003/. > > [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsNode.cc > 6fb3ae40fe4e10bdd4d372818a341f2c src/XrdCms/XrdCmsNode.cc > [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsConfig.hh > 7d57753847d9448186c718f98e963cbe src/XrdCms/XrdCmsConfig.hh > > Thanks > Wen > > On Tue, Dec 15, 2009 at 10:50 PM, Andrew Hanushevsky <[log in to unmask]> > wrote: >> >> Hi Wen, >> >> Just compiled on Linux and it was clean. Something is really wrong with >> your >> source files, specifically XrdCmsConfig.cc >> >> The MD5 checksums on the relevant files are: >> >> MD5 (XrdCmsNode.cc) = 6fb3ae40fe4e10bdd4d372818a341f2c >> >> MD5 (XrdCmsConfig.hh) = 4a7d655582a7cd43b098947d0676924b >> >> Andy >> >> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >> To: "Andrew Hanushevsky" <[log in to unmask]> >> Cc: <[log in to unmask]> >> Sent: Tuesday, December 15, 2009 4:24 AM >> Subject: Re: xrootd with more than 65 machines >> >> >> Hi Andy, >> >> No problem. Thanks for the fix. But it cannot be compiled. The >> version I am using is >> http://xrootd.slac.stanford.edu/download/20091028-1003/. >> >> Making cms component... >> Compiling XrdCmsNode.cc >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Chmod(XrdCmsRRData&)': >> XrdCmsNode.cc:268: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:268: warning: unused variable 'fsExec' >> XrdCmsNode.cc:269: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:273: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:273: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Mkdir(XrdCmsRRData&)': >> XrdCmsNode.cc:600: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:600: warning: unused variable 'fsExec' >> XrdCmsNode.cc:601: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:605: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:605: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Mkpath(XrdCmsRRData&)': >> XrdCmsNode.cc:640: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:640: warning: unused variable 'fsExec' >> XrdCmsNode.cc:641: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:645: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:645: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Mv(XrdCmsRRData&)': >> XrdCmsNode.cc:704: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:704: warning: unused variable 'fsExec' >> XrdCmsNode.cc:705: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:709: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:709: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Rm(XrdCmsRRData&)': >> XrdCmsNode.cc:831: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:831: warning: unused variable 'fsExec' >> XrdCmsNode.cc:832: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:836: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:836: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Rmdir(XrdCmsRRData&)': >> XrdCmsNode.cc:873: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:873: warning: unused variable 'fsExec' >> XrdCmsNode.cc:874: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:878: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:878: warning: unused variable 'fsFail' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::do_Trunc(XrdCmsRRData&)': >> XrdCmsNode.cc:1377: error: `fsExec' was not declared in this scope >> XrdCmsNode.cc:1377: warning: unused variable 'fsExec' >> XrdCmsNode.cc:1378: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> XrdCmsNode.cc:1382: error: `fsFail' was not declared in this scope >> XrdCmsNode.cc:1382: warning: unused variable 'fsFail' >> XrdCmsNode.cc: At global scope: >> XrdCmsNode.cc:1524: error: no `int XrdCmsNode::fsExec(XrdOucProg*, >> char*, char*)' member function declared in class `XrdCmsNode' >> XrdCmsNode.cc: In member function `int XrdCmsNode::fsExec(XrdOucProg*, >> char*, char*)': >> XrdCmsNode.cc:1533: error: `fsL2PFail1' was not declared in this scope >> XrdCmsNode.cc:1533: warning: unused variable 'fsL2PFail1' >> XrdCmsNode.cc:1537: error: `fsL2PFail2' was not declared in this scope >> XrdCmsNode.cc:1537: warning: unused variable 'fsL2PFail2' >> XrdCmsNode.cc: At global scope: >> XrdCmsNode.cc:1553: error: no `const char* XrdCmsNode::fsFail(const >> char*, const char*, const char*, int)' member function declared in >> class `XrdCmsNode' >> XrdCmsNode.cc: In member function `const char* >> XrdCmsNode::fsFail(const char*, const char*, const char*, int)': >> XrdCmsNode.cc:1559: error: `fsL2PFail1' was not declared in this scope >> XrdCmsNode.cc:1559: warning: unused variable 'fsL2PFail1' >> XrdCmsNode.cc:1560: error: `fsL2PFail2' was not declared in this scope >> XrdCmsNode.cc:1560: warning: unused variable 'fsL2PFail2' >> XrdCmsNode.cc: In static member function `static int >> XrdCmsNode::isOnline(char*, int)': >> XrdCmsNode.cc:1608: error: 'class XrdCmsConfig' has no member named >> 'ossFS' >> make[4]: *** [../../obj/XrdCmsNode.o] Error 1 >> make[3]: *** [Linuxall] Error 2 >> make[2]: *** [all] Error 2 >> make[1]: *** [XrdCms] Error 2 >> make: *** [all] Error 2 >> >> >> Wen >> >> On Tue, Dec 15, 2009 at 2:08 AM, Andrew Hanushevsky <[log in to unmask]> >> wrote: >>> >>> Hi Wen, >>> >>> I have developed a permanent fix. You will find the source files in >>> >>> http://www.slac.stanford.edu/~abh/cmsd/ >>> >>> There are three files: XrdCmsCluster.cc XrdCmsNode.cc XrdCmsProtocol.cc >>> >>> Please do a source replacement and recompile. Unfortunately, the cmsd >>> will >>> need to be replaced on each node regardless of role. My apologies for >>> the >>> disruption. Please let me know how it goes. >>> >>> Andy >>> >>> ----- Original Message ----- From: "wen guan" <[log in to unmask]> >>> To: "Andrew Hanushevsky" <[log in to unmask]> >>> Cc: <[log in to unmask]> >>> Sent: Sunday, December 13, 2009 7:04 AM >>> Subject: Re: xrootd with more than 65 machines >>> >>> >>> Hi Andrew, >>> >>> >>> Thanks. >>> I used the new cmsd at atlas-bkp1 manager. But it's still dropping >>> nodes. And in supervisor's log, I cannot find any dataserver to >>> register to it. >>> >>> The new logs are in http://higgs03.cs.wisc.edu/wguan/*.20091213. >>> The manager is patched at 091213 08:38:15. >>> >>> Wen >>> >>> On Sun, Dec 13, 2009 at 1:52 AM, Andrew Hanushevsky >>> <[log in to unmask]> wrote: >>>> >>>> Hi Wen >>>> >>>> You will find the source replacement at: >>>> >>>> http://www.slac.stanford.edu/~abh/cmsd/ >>>> >>>> It's XrdCmsCluster.cc and it replaces >>>> xrootd/src/XrdCms/XrdCmsCluster.cc >>>> >>>> I'm stepping out for a couple of hours but will be back to see how >>>> things >>>> went. Sorry for the issues :-( >>>> >>>> Andy >>>> >>>> On Sun, 13 Dec 2009, wen guan wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> I prefer a source replacement. Then I can compile it. >>>>> >>>>> Thanks >>>>> Wen >>>>>> >>>>>> I can do one of two things here: >>>>>> >>>>>> 1) Supply a source replacement and then you would recompile, or >>>>>> >>>>>> 2) Give me the uname -a of where the cmsd will run and I'll supply a >>>>>> binary >>>>>> replacement for you. >>>>>> >>>>>> Your choice. >>>>>> >>>>>> Andy >>>>>> >>>>>> On Sun, 13 Dec 2009, wen guan wrote: >>>>>> >>>>>>> Hi Andrew >>>>>>> >>>>>>> The problem is found. Great. Thanks. >>>>>>> >>>>>>> Where can I find the patched cmsd? >>>>>>> >>>>>>> Wen >>>>>>> >>>>>>> On Sat, Dec 12, 2009 at 11:36 PM, Andrew Hanushevsky >>>>>>> <[log in to unmask]> wrote: >>>>>>>> >>>>>>>> Hi Wen, >>>>>>>> >>>>>>>> I found the problem. Looks like a regression from way back when. >>>>>>>> There >>>>>>>> is >>>>>>>> a >>>>>>>> missing flag on the redirect. This will require a patched cmsd but >>>>>>>> you >>>>>>>> need >>>>>>>> only to replace the redirector's cmsd as this only affects the >>>>>>>> redirector. >>>>>>>> How would you like to proceed? >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>> >>>>>>>>> Hi Andrew, >>>>>>>>> >>>>>>>>> It doesn't work. atlas-bkp1 manager still dropping nodes again. >>>>>>>>> In supervisor, I still haven't seen any dataserver registered. I >>>>>>>>> said >>>>>>>>> "I updated the ntp" because you said "the log timestamp do not >>>>>>>>> overlap". >>>>>>>>> >>>>>>>>> Wen >>>>>>>>> >>>>>>>>> On Sat, Dec 12, 2009 at 9:33 PM, Andrew Hanushevsky >>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>> >>>>>>>>>> Hi Wen, >>>>>>>>>> >>>>>>>>>> Do you mean that everything is now working? It could be that you >>>>>>>>>> removed >>>>>>>>>> the >>>>>>>>>> xrd.timeout directive. That really could cause problems. As for >>>>>>>>>> the >>>>>>>>>> delays, >>>>>>>>>> that is normal when the redirector thinks something is going >>>>>>>>>> wrong. >>>>>>>>>> The >>>>>>>>>> strategy is to delay clients until it can get back to a stable >>>>>>>>>> configuration. This usually prevents jobs from crashing during >>>>>>>>>> stressful >>>>>>>>>> periods. >>>>>>>>>> >>>>>>>>>> Andy >>>>>>>>>> >>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>>>> >>>>>>>>>>> Hi Andrew, >>>>>>>>>>> >>>>>>>>>>> I restarted it to do supervisor test. Also because xrootd >>>>>>>>>>> manager >>>>>>>>>>> frequently doesn't response. (*) is the cms.log, the file select >>>>>>>>>>> is >>>>>>>>>>> delayed again and again. When do a restart, all things are fine. >>>>>>>>>>> Now >>>>>>>>>>> I >>>>>>>>>>> am trying to find a clue about it. >>>>>>>>>>> >>>>>>>>>>> (*) >>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask] >>>>>>>>>>> do_Select: >>>>>>>>>>> wc >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>> 091212 00:00:19 21318 Select seeking >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>> 091212 00:00:19 21318 UnkFile rc=1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> path=/atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask] >>>>>>>>>>> do_Select: >>>>>>>>>>> delay 5 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2 >>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 2+-1 post=0 >>>>>>>>>>> 091212 00:00:19 21318 Dispatch redirector.21313:14@atlas-bkp2 >>>>>>>>>>> for >>>>>>>>>>> select dlen=166 >>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 1+1 post=0 >>>>>>>>>>> 091212 00:00:19 21318 XrdSched: running redirector inq=0 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> There is no core file. I copied a new copies of the logs to the >>>>>>>>>>> link >>>>>>>>>>> below. >>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/ >>>>>>>>>>> >>>>>>>>>>> Wen >>>>>>>>>>> >>>>>>>>>>> On Sat, Dec 12, 2009 at 3:16 AM, Andrew Hanushevsky >>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Wen, >>>>>>>>>>>> >>>>>>>>>>>> I see in the server log that it is restarting often. Could you >>>>>>>>>>>> take >>>>>>>>>>>> a >>>>>>>>>>>> look >>>>>>>>>>>> in the c193 to see if you have any core files? Also please make >>>>>>>>>>>> sure >>>>>>>>>>>> that >>>>>>>>>>>> core files are enabled as Linux defaults the size to 0. The >>>>>>>>>>>> first >>>>>>>>>>>> step >>>>>>>>>>>> here >>>>>>>>>>>> is to find out why your servers are restarting. >>>>>>>>>>>> >>>>>>>>>>>> Andy >>>>>>>>>>>> >>>>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>> >>>>>>>>>>>>> the logs can be found here. From the log you can see >>>>>>>>>>>>> atlas-bkp1 >>>>>>>>>>>>> manager are dropping nodes again and again which tries to >>>>>>>>>>>>> connect >>>>>>>>>>>>> to >>>>>>>>>>>>> it. >>>>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Dec 11, 2009 at 11:41 PM, Andrew Hanushevsky >>>>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Wen, Could you start everything up and provide me a >>>>>>>>>>>>>> pointer >>>>>>>>>>>>>> to >>>>>>>>>>>>>> the >>>>>>>>>>>>>> manager log file, supervisor log file, and one data server >>>>>>>>>>>>>> logfile >>>>>>>>>>>>>> all >>>>>>>>>>>>>> of >>>>>>>>>>>>>> which cover the same time-frame (from start to some point >>>>>>>>>>>>>> where >>>>>>>>>>>>>> you >>>>>>>>>>>>>> think >>>>>>>>>>>>>> things are working or not). That way I can see what is >>>>>>>>>>>>>> happening. >>>>>>>>>>>>>> At >>>>>>>>>>>>>> the >>>>>>>>>>>>>> moment I only see two "bad" things in the config file: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Only atlas-bkp1.cs.wisc.edu is designated as a manager but >>>>>>>>>>>>>> you >>>>>>>>>>>>>> claim, >>>>>>>>>>>>>> via >>>>>>>>>>>>>> the all.manager directive, that there are three (bkp2 and >>>>>>>>>>>>>> bkp3). >>>>>>>>>>>>>> While >>>>>>>>>>>>>> it >>>>>>>>>>>>>> should work, the log file will be dense with error messages. >>>>>>>>>>>>>> Please >>>>>>>>>>>>>> correct >>>>>>>>>>>>>> this to be consistent and make it easier to see real errors. >>>>>>>>>>>>> >>>>>>>>>>>>> This is not a problem for me. Because this config is used in >>>>>>>>>>>>> dataserver. In manager, I updated the if >>>>>>>>>>>>> atlas-bkp1.cs.wisc.edu >>>>>>>>>>>>> to >>>>>>>>>>>>> atlas-bkp2 or something. This is a history problem. at first >>>>>>>>>>>>> only >>>>>>>>>>>>> atlas-bkp1 is used. atlas-bkp2 and atlas-bkp3 are added later. >>>>>>>>>>>>> >>>>>>>>>>>>>> 2) Please use cms.space not olb.space (for historical reasons >>>>>>>>>>>>>> the >>>>>>>>>>>>>> latter >>>>>>>>>>>>>> is >>>>>>>>>>>>>> still accepted and over-rides the former, but that will soon >>>>>>>>>>>>>> end), >>>>>>>>>>>>>> and >>>>>>>>>>>>>> please use only one (the config file uses both directives). >>>>>>>>>>>>> >>>>>>>>>>>>> yes. I should remove this line. in fact cms.space is in the >>>>>>>>>>>>> cfg >>>>>>>>>>>>> too. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Wen >>>>>>>>>>>>> >>>>>>>>>>>>>> The xrootd has an internal mechanism to connect servers with >>>>>>>>>>>>>> supervisors >>>>>>>>>>>>>> to >>>>>>>>>>>>>> allow for maximum reliability. You cannot change that >>>>>>>>>>>>>> algorithm >>>>>>>>>>>>>> and >>>>>>>>>>>>>> there >>>>>>>>>>>>>> is >>>>>>>>>>>>>> no need to do so. You should *never* tell anyone to directly >>>>>>>>>>>>>> connect >>>>>>>>>>>>>> to >>>>>>>>>>>>>> a >>>>>>>>>>>>>> supervisor. If you do, you will likely get unreachable nodes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As for dropping data servers, it would appear to me, given >>>>>>>>>>>>>> the >>>>>>>>>>>>>> flurry >>>>>>>>>>>>>> of >>>>>>>>>>>>>> such activity, that something either crashed or was >>>>>>>>>>>>>> restarted. >>>>>>>>>>>>>> That's >>>>>>>>>>>>>> why >>>>>>>>>>>>>> it >>>>>>>>>>>>>> would be good to see the complete log of each one of the >>>>>>>>>>>>>> entities. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Andy >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I read the document. and write a config >>>>>>>>>>>>>>> file(http://wisconsin.cern.ch/~wguan/xrdcluster.cfg). >>>>>>>>>>>>>>> I used my conf, I can see manager is dispatch message to >>>>>>>>>>>>>>> supervisor. But I cannot see any dataserver tries to connect >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> supervisor. At the same time, in the manager's log, I can >>>>>>>>>>>>>>> see >>>>>>>>>>>>>>> some >>>>>>>>>>>>>>> dataserver are Dropped. >>>>>>>>>>>>>>> How does xrootd decide which dataserver will connect >>>>>>>>>>>>>>> supervisor? >>>>>>>>>>>>>>> should I specify some dataservers to connect the supervisor? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (*) supervisor log >>>>>>>>>>>>>>> 091211 15:07:00 30028 Dispatch manager.0:20@atlas-bkp2 for >>>>>>>>>>>>>>> state >>>>>>>>>>>>>>> dlen=42 >>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_State: >>>>>>>>>>>>>>> /atlas/xrootd/users/wguan/test/test131141 >>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_StateFWD: >>>>>>>>>>>>>>> Path >>>>>>>>>>>>>>> find >>>>>>>>>>>>>>> failed for state /atlas/xrootd/users/wguan/test/test131141 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (*)manager log >>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu TSpace=5587GB >>>>>>>>>>>>>>> NumFS=1 >>>>>>>>>>>>>>> FSpace=5693644MB MinFR=57218MB Util=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu adding path: >>>>>>>>>>>>>>> w >>>>>>>>>>>>>>> /atlas >>>>>>>>>>>>>>> 091211 04:13:24 15661 >>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Space: 5696231MB free; 0% util >>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 logged in. >>>>>>>>>>>>>>> 091211 04:13:24 001 XrdInet: Accepted connection from >>>>>>>>>>>>>>> [log in to unmask] >>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: running >>>>>>>>>>>>>>> ?:[log in to unmask] >>>>>>>>>>>>>>> inq=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdProtocol: matched protocol cmsd >>>>>>>>>>>>>>> 091211 04:13:24 15661 ?:[log in to unmask] XrdPoll: FD 79 >>>>>>>>>>>>>>> attached >>>>>>>>>>>>>>> to poller 2; num=22 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Add server.21739:[log in to unmask] >>>>>>>>>>>>>>> bumps >>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 #63 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Drop_Node: >>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 dropped. >>>>>>>>>>>>>>> 091211 04:13:24 15661 Add Shoved >>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 to cluster; >>>>>>>>>>>>>>> id=63.78; >>>>>>>>>>>>>>> num=64; >>>>>>>>>>>>>>> min=51 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu TSpace=5587GB >>>>>>>>>>>>>>> NumFS=1 >>>>>>>>>>>>>>> FSpace=5721854MB MinFR=57218MB Util=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu adding path: >>>>>>>>>>>>>>> w >>>>>>>>>>>>>>> /atlas >>>>>>>>>>>>>>> 091211 04:13:24 15661 >>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Space: 5721854MB free; 0% util >>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 logged in. >>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdLink: Unable to recieve from >>>>>>>>>>>>>>> c187.chtc.wisc.edu; connection reset by peer >>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: scheduling drop node in 60 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:13:24 15661 Remove_Node >>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 node 63.78 >>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol: >>>>>>>>>>>>>>> server.21739:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:24 15661 server.21739:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 79 detached from poller 2; num=21 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c177.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c177.chtc.wisc.edu >>>>>>>>>>>>>>> FD=16 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 node 0.3 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.21656:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.21656:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 16 detached from poller 2; num=20 >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c179.chtc.wisc.edu >>>>>>>>>>>>>>> FD=21 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.17065:[log in to unmask]:1094 node 1.4 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.7978:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7978:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 21 >>>>>>>>>>>>>>> detached from poller 1; num=21 >>>>>>>>>>>>>>> 091211 04:13:27 15661 State: Status changed to suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 Send status to >>>>>>>>>>>>>>> redirector.15656:14@atlas-bkp2 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c182.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c182.chtc.wisc.edu >>>>>>>>>>>>>>> FD=19 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 node 7.10 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.26620:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26620:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 19 detached from poller 2; num=19 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c178.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c178.chtc.wisc.edu >>>>>>>>>>>>>>> FD=15 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 node 9.12 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.11901:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.11901:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 15 detached from poller 1; num=20 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.5535:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c181.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c181.chtc.wisc.edu >>>>>>>>>>>>>>> FD=17 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094 node 5.8 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.13984:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.13984:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 17 detached from poller 0; num=21 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c183.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c183.chtc.wisc.edu >>>>>>>>>>>>>>> FD=22 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 node 8.11 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.27735:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.27735:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 22 detached from poller 2; num=18 >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c184.chtc.wisc.edu >>>>>>>>>>>>>>> FD=20 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.4131:[log in to unmask]:1094 node 3.6 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.26787:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26787:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 20 detached from poller 0; num=20 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c185.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c185.chtc.wisc.edu >>>>>>>>>>>>>>> FD=23 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 node 6.9 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.8524:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.8524:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 23 >>>>>>>>>>>>>>> detached from poller 0; num=19 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 >>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c180.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c180.chtc.wisc.edu >>>>>>>>>>>>>>> FD=18 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 node 4.7 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.14636:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.14636:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 18 detached from poller 1; num=19 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch >>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094 >>>>>>>>>>>>>>> for status dlen=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.1656:[log in to unmask]:1094 >>>>>>>>>>>>>>> do_Status: >>>>>>>>>>>>>>> suspend >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c186.chtc.wisc.edu service >>>>>>>>>>>>>>> suspended >>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from >>>>>>>>>>>>>>> c186.chtc.wisc.edu >>>>>>>>>>>>>>> FD=24 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node >>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094 node 2.5 >>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol: >>>>>>>>>>>>>>> server.7849:[log in to unmask] >>>>>>>>>>>>>>> logged >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7849:[log in to unmask] >>>>>>>>>>>>>>> XrdPoll: >>>>>>>>>>>>>>> FD >>>>>>>>>>>>>>> 24 >>>>>>>>>>>>>>> detached from poller 1; num=18 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13 >>>>>>>>>>>>>>> seconds >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.66 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.68 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.69 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.67 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.70 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.71 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.72 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.73 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.74 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.75 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.76 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: Now have 68 workers >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.77 cancelled. >>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Wen >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Dec 11, 2009 at 9:50 PM, Andrew Hanushevsky >>>>>>>>>>>>>>> <[log in to unmask]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Wen, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To go past 64 data servers you will need to setup one or >>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>> supervisors. >>>>>>>>>>>>>>>> This does not logically change the current configuration >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> have. >>>>>>>>>>>>>>>> You >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> need to configure one or more *new* servers (or at least >>>>>>>>>>>>>>>> xrootd >>>>>>>>>>>>>>>> processes) >>>>>>>>>>>>>>>> whose role is supervisor. We'd like them to run in separate >>>>>>>>>>>>>>>> machines >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> reliability purposes, but they could run on the manager >>>>>>>>>>>>>>>> node >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>> long >>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> give each one a unique instance name (i.e., -n option). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The front part of the cmsd reference explains how to do >>>>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://xrootd.slac.stanford.edu/doc/prod/cms_config.htm >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Andy >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there any change to configure xrootd with more than 65 >>>>>>>>>>>>>>>>> machines? I used the configure below but it doesn't work. >>>>>>>>>>>>>>>>> Should >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>> configure some machines' manager to be supvervisor? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> http://wisconsin.cern.ch/~wguan/xrdcluster.cfg >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Wen >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >>> >>> >> >> >> > > >