Print

Print


Hi Wen,

It will be easier for me to retroft as the changes were pretty minor. Please 
lift the new XrdCmsNode.cc file from

http://www.slac.stanford.edu/~abh/cmsd

Andy

----- Original Message ----- 
From: "wen guan" <[log in to unmask]>
To: "Andrew Hanushevsky" <[log in to unmask]>
Cc: <[log in to unmask]>
Sent: Tuesday, December 15, 2009 5:12 PM
Subject: Re: xrootd with more than 65 machines


Hi Andy,

      I can switch to 20091104-1102. Then you don't need to patch
another version. How can I download v20091104-1102?

Thanks
Wen

On Wed, Dec 16, 2009 at 12:52 AM, Andrew Hanushevsky <[log in to unmask]> 
wrote:
> Hi Wen,
>
> Ah yes, I see that now. The file I gave you is based on v20091104-1102. 
> Let
> me see if I can retrofit the patch for you.
>
> Andy
>
> ----- Original Message ----- From: "wen guan" <[log in to unmask]>
> To: "Andrew Hanushevsky" <[log in to unmask]>
> Cc: <[log in to unmask]>
> Sent: Tuesday, December 15, 2009 1:04 PM
> Subject: Re: xrootd with more than 65 machines
>
>
> Hi Andy,
>
> Which xrootd version are you using? XrdCmsConfig.hh is different.
> XrdCmsConfig.hh is downloaded from
> http://xrootd.slac.stanford.edu/download/20091028-1003/.
>
> [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsNode.cc
> 6fb3ae40fe4e10bdd4d372818a341f2c src/XrdCms/XrdCmsNode.cc
> [root@c121 xrootd]# md5sum src/XrdCms/XrdCmsConfig.hh
> 7d57753847d9448186c718f98e963cbe src/XrdCms/XrdCmsConfig.hh
>
> Thanks
> Wen
>
> On Tue, Dec 15, 2009 at 10:50 PM, Andrew Hanushevsky <[log in to unmask]>
> wrote:
>>
>> Hi Wen,
>>
>> Just compiled on Linux and it was clean. Something is really wrong with
>> your
>> source files, specifically XrdCmsConfig.cc
>>
>> The MD5 checksums on the relevant files are:
>>
>> MD5 (XrdCmsNode.cc) = 6fb3ae40fe4e10bdd4d372818a341f2c
>>
>> MD5 (XrdCmsConfig.hh) = 4a7d655582a7cd43b098947d0676924b
>>
>> Andy
>>
>> ----- Original Message ----- From: "wen guan" <[log in to unmask]>
>> To: "Andrew Hanushevsky" <[log in to unmask]>
>> Cc: <[log in to unmask]>
>> Sent: Tuesday, December 15, 2009 4:24 AM
>> Subject: Re: xrootd with more than 65 machines
>>
>>
>> Hi Andy,
>>
>> No problem. Thanks for the fix. But it cannot be compiled. The
>> version I am using is
>> http://xrootd.slac.stanford.edu/download/20091028-1003/.
>>
>> Making cms component...
>> Compiling XrdCmsNode.cc
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Chmod(XrdCmsRRData&)':
>> XrdCmsNode.cc:268: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:268: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:269: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:273: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:273: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Mkdir(XrdCmsRRData&)':
>> XrdCmsNode.cc:600: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:600: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:601: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:605: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:605: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Mkpath(XrdCmsRRData&)':
>> XrdCmsNode.cc:640: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:640: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:641: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:645: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:645: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Mv(XrdCmsRRData&)':
>> XrdCmsNode.cc:704: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:704: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:705: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:709: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:709: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Rm(XrdCmsRRData&)':
>> XrdCmsNode.cc:831: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:831: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:832: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:836: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:836: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Rmdir(XrdCmsRRData&)':
>> XrdCmsNode.cc:873: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:873: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:874: error: 'class XrdCmsConfig' has no member named 
>> 'ossFS'
>> XrdCmsNode.cc:878: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:878: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::do_Trunc(XrdCmsRRData&)':
>> XrdCmsNode.cc:1377: error: `fsExec' was not declared in this scope
>> XrdCmsNode.cc:1377: warning: unused variable 'fsExec'
>> XrdCmsNode.cc:1378: error: 'class XrdCmsConfig' has no member named
>> 'ossFS'
>> XrdCmsNode.cc:1382: error: `fsFail' was not declared in this scope
>> XrdCmsNode.cc:1382: warning: unused variable 'fsFail'
>> XrdCmsNode.cc: At global scope:
>> XrdCmsNode.cc:1524: error: no `int XrdCmsNode::fsExec(XrdOucProg*,
>> char*, char*)' member function declared in class `XrdCmsNode'
>> XrdCmsNode.cc: In member function `int XrdCmsNode::fsExec(XrdOucProg*,
>> char*, char*)':
>> XrdCmsNode.cc:1533: error: `fsL2PFail1' was not declared in this scope
>> XrdCmsNode.cc:1533: warning: unused variable 'fsL2PFail1'
>> XrdCmsNode.cc:1537: error: `fsL2PFail2' was not declared in this scope
>> XrdCmsNode.cc:1537: warning: unused variable 'fsL2PFail2'
>> XrdCmsNode.cc: At global scope:
>> XrdCmsNode.cc:1553: error: no `const char* XrdCmsNode::fsFail(const
>> char*, const char*, const char*, int)' member function declared in
>> class `XrdCmsNode'
>> XrdCmsNode.cc: In member function `const char*
>> XrdCmsNode::fsFail(const char*, const char*, const char*, int)':
>> XrdCmsNode.cc:1559: error: `fsL2PFail1' was not declared in this scope
>> XrdCmsNode.cc:1559: warning: unused variable 'fsL2PFail1'
>> XrdCmsNode.cc:1560: error: `fsL2PFail2' was not declared in this scope
>> XrdCmsNode.cc:1560: warning: unused variable 'fsL2PFail2'
>> XrdCmsNode.cc: In static member function `static int
>> XrdCmsNode::isOnline(char*, int)':
>> XrdCmsNode.cc:1608: error: 'class XrdCmsConfig' has no member named
>> 'ossFS'
>> make[4]: *** [../../obj/XrdCmsNode.o] Error 1
>> make[3]: *** [Linuxall] Error 2
>> make[2]: *** [all] Error 2
>> make[1]: *** [XrdCms] Error 2
>> make: *** [all] Error 2
>>
>>
>> Wen
>>
>> On Tue, Dec 15, 2009 at 2:08 AM, Andrew Hanushevsky <[log in to unmask]>
>> wrote:
>>>
>>> Hi Wen,
>>>
>>> I have developed a permanent fix. You will find the source files in
>>>
>>> http://www.slac.stanford.edu/~abh/cmsd/
>>>
>>> There are three files: XrdCmsCluster.cc XrdCmsNode.cc XrdCmsProtocol.cc
>>>
>>> Please do a source replacement and recompile. Unfortunately, the cmsd
>>> will
>>> need to be replaced on each node regardless of role. My apologies for 
>>> the
>>> disruption. Please let me know how it goes.
>>>
>>> Andy
>>>
>>> ----- Original Message ----- From: "wen guan" <[log in to unmask]>
>>> To: "Andrew Hanushevsky" <[log in to unmask]>
>>> Cc: <[log in to unmask]>
>>> Sent: Sunday, December 13, 2009 7:04 AM
>>> Subject: Re: xrootd with more than 65 machines
>>>
>>>
>>> Hi Andrew,
>>>
>>>
>>> Thanks.
>>> I used the new cmsd at atlas-bkp1 manager. But it's still dropping
>>> nodes. And in supervisor's log, I cannot find any dataserver to
>>> register to it.
>>>
>>> The new logs are in http://higgs03.cs.wisc.edu/wguan/*.20091213.
>>> The manager is patched at 091213 08:38:15.
>>>
>>> Wen
>>>
>>> On Sun, Dec 13, 2009 at 1:52 AM, Andrew Hanushevsky
>>> <[log in to unmask]> wrote:
>>>>
>>>> Hi Wen
>>>>
>>>> You will find the source replacement at:
>>>>
>>>> http://www.slac.stanford.edu/~abh/cmsd/
>>>>
>>>> It's XrdCmsCluster.cc and it replaces 
>>>> xrootd/src/XrdCms/XrdCmsCluster.cc
>>>>
>>>> I'm stepping out for a couple of hours but will be back to see how
>>>> things
>>>> went. Sorry for the issues :-(
>>>>
>>>> Andy
>>>>
>>>> On Sun, 13 Dec 2009, wen guan wrote:
>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> I prefer a source replacement. Then I can compile it.
>>>>>
>>>>> Thanks
>>>>> Wen
>>>>>>
>>>>>> I can do one of two things here:
>>>>>>
>>>>>> 1) Supply a source replacement and then you would recompile, or
>>>>>>
>>>>>> 2) Give me the uname -a of where the cmsd will run and I'll supply a
>>>>>> binary
>>>>>> replacement for you.
>>>>>>
>>>>>> Your choice.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> On Sun, 13 Dec 2009, wen guan wrote:
>>>>>>
>>>>>>> Hi Andrew
>>>>>>>
>>>>>>> The problem is found. Great. Thanks.
>>>>>>>
>>>>>>> Where can I find the patched cmsd?
>>>>>>>
>>>>>>> Wen
>>>>>>>
>>>>>>> On Sat, Dec 12, 2009 at 11:36 PM, Andrew Hanushevsky
>>>>>>> <[log in to unmask]> wrote:
>>>>>>>>
>>>>>>>> Hi Wen,
>>>>>>>>
>>>>>>>> I found the problem. Looks like a regression from way back when.
>>>>>>>> There
>>>>>>>> is
>>>>>>>> a
>>>>>>>> missing flag on the redirect. This will require a patched cmsd but
>>>>>>>> you
>>>>>>>> need
>>>>>>>> only to replace the redirector's cmsd as this only affects the
>>>>>>>> redirector.
>>>>>>>> How would you like to proceed?
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote:
>>>>>>>>
>>>>>>>>> Hi Andrew,
>>>>>>>>>
>>>>>>>>> It doesn't work. atlas-bkp1 manager still dropping nodes again.
>>>>>>>>> In supervisor, I still haven't seen any dataserver registered. I
>>>>>>>>> said
>>>>>>>>> "I updated the ntp" because you said "the log timestamp do not
>>>>>>>>> overlap".
>>>>>>>>>
>>>>>>>>> Wen
>>>>>>>>>
>>>>>>>>> On Sat, Dec 12, 2009 at 9:33 PM, Andrew Hanushevsky
>>>>>>>>> <[log in to unmask]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Wen,
>>>>>>>>>>
>>>>>>>>>> Do you mean that everything is now working? It could be that you
>>>>>>>>>> removed
>>>>>>>>>> the
>>>>>>>>>> xrd.timeout directive. That really could cause problems. As for
>>>>>>>>>> the
>>>>>>>>>> delays,
>>>>>>>>>> that is normal when the redirector thinks something is going
>>>>>>>>>> wrong.
>>>>>>>>>> The
>>>>>>>>>> strategy is to delay clients until it can get back to a stable
>>>>>>>>>> configuration. This usually prevents jobs from crashing during
>>>>>>>>>> stressful
>>>>>>>>>> periods.
>>>>>>>>>>
>>>>>>>>>> Andy
>>>>>>>>>>
>>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Andrew,
>>>>>>>>>>>
>>>>>>>>>>> I restarted it to do supervisor test. Also because xrootd 
>>>>>>>>>>> manager
>>>>>>>>>>> frequently doesn't response. (*) is the cms.log, the file select
>>>>>>>>>>> is
>>>>>>>>>>> delayed again and again. When do a restart, all things are fine.
>>>>>>>>>>> Now
>>>>>>>>>>> I
>>>>>>>>>>> am trying to find a clue about it.
>>>>>>>>>>>
>>>>>>>>>>> (*)
>>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask]
>>>>>>>>>>> do_Select:
>>>>>>>>>>> wc
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2
>>>>>>>>>>> 091212 00:00:19 21318 Select seeking
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2
>>>>>>>>>>> 091212 00:00:19 21318 UnkFile rc=1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> path=/atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2
>>>>>>>>>>> 091212 00:00:19 21318 slot3.14949:[log in to unmask]
>>>>>>>>>>> do_Select:
>>>>>>>>>>> delay 5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /atlas/xrootd/users/fang/MC8.108004.PythiaPhotonJet4.7TeV.e444_s479_r635_dmp81_tid001090/LOG/dig.001090._000066.log.2
>>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 2+-1 post=0
>>>>>>>>>>> 091212 00:00:19 21318 Dispatch redirector.21313:14@atlas-bkp2 
>>>>>>>>>>> for
>>>>>>>>>>> select dlen=166
>>>>>>>>>>> 091212 00:00:19 21318 XrdLink: Setting link ref to 1+1 post=0
>>>>>>>>>>> 091212 00:00:19 21318 XrdSched: running redirector inq=0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There is no core file. I copied a new copies of the logs to the
>>>>>>>>>>> link
>>>>>>>>>>> below.
>>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/
>>>>>>>>>>>
>>>>>>>>>>> Wen
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Dec 12, 2009 at 3:16 AM, Andrew Hanushevsky
>>>>>>>>>>> <[log in to unmask]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Wen,
>>>>>>>>>>>>
>>>>>>>>>>>> I see in the server log that it is restarting often. Could you
>>>>>>>>>>>> take
>>>>>>>>>>>> a
>>>>>>>>>>>> look
>>>>>>>>>>>> in the c193 to see if you have any core files? Also please make
>>>>>>>>>>>> sure
>>>>>>>>>>>> that
>>>>>>>>>>>> core files are enabled as Linux defaults the size to 0. The
>>>>>>>>>>>> first
>>>>>>>>>>>> step
>>>>>>>>>>>> here
>>>>>>>>>>>> is to find out why your servers are restarting.
>>>>>>>>>>>>
>>>>>>>>>>>> Andy
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, 12 Dec 2009, wen guan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Andrew,
>>>>>>>>>>>>>
>>>>>>>>>>>>> the logs can be found here. From the log you can see 
>>>>>>>>>>>>> atlas-bkp1
>>>>>>>>>>>>> manager are dropping nodes again and again which tries to
>>>>>>>>>>>>> connect
>>>>>>>>>>>>> to
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> http://higgs03.cs.wisc.edu/wguan/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Dec 11, 2009 at 11:41 PM, Andrew Hanushevsky
>>>>>>>>>>>>> <[log in to unmask]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Wen, Could you start everything up and provide me a 
>>>>>>>>>>>>>> pointer
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> manager log file, supervisor log file, and one data server
>>>>>>>>>>>>>> logfile
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> which cover the same time-frame (from start to some point
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> things are working or not). That way I can see what is
>>>>>>>>>>>>>> happening.
>>>>>>>>>>>>>> At
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> moment I only see two "bad" things in the config file:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Only atlas-bkp1.cs.wisc.edu is designated as a manager but
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> claim,
>>>>>>>>>>>>>> via
>>>>>>>>>>>>>> the all.manager directive, that there are three (bkp2 and
>>>>>>>>>>>>>> bkp3).
>>>>>>>>>>>>>> While
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> should work, the log file will be dense with error messages.
>>>>>>>>>>>>>> Please
>>>>>>>>>>>>>> correct
>>>>>>>>>>>>>> this to be consistent and make it easier to see real errors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is not a problem for me. Because this config is used in
>>>>>>>>>>>>> dataserver. In manager, I updated the if 
>>>>>>>>>>>>> atlas-bkp1.cs.wisc.edu
>>>>>>>>>>>>> to
>>>>>>>>>>>>> atlas-bkp2 or something. This is a history problem. at first
>>>>>>>>>>>>> only
>>>>>>>>>>>>> atlas-bkp1 is used. atlas-bkp2 and atlas-bkp3 are added later.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Please use cms.space not olb.space (for historical reasons
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> latter
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> still accepted and over-rides the former, but that will soon
>>>>>>>>>>>>>> end),
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> please use only one (the config file uses both directives).
>>>>>>>>>>>>>
>>>>>>>>>>>>> yes. I should remove this line. in fact cms.space is in the 
>>>>>>>>>>>>> cfg
>>>>>>>>>>>>> too.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Wen
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The xrootd has an internal mechanism to connect servers with
>>>>>>>>>>>>>> supervisors
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> allow for maximum reliability. You cannot change that
>>>>>>>>>>>>>> algorithm
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> no need to do so. You should *never* tell anyone to directly
>>>>>>>>>>>>>> connect
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> supervisor. If you do, you will likely get unreachable nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for dropping data servers, it would appear to me, given 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> flurry
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> such activity, that something either crashed or was 
>>>>>>>>>>>>>> restarted.
>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>> why
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> would be good to see the complete log of each one of the
>>>>>>>>>>>>>> entities.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Andrew,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I read the document. and write a config
>>>>>>>>>>>>>>> file(http://wisconsin.cern.ch/~wguan/xrdcluster.cfg).
>>>>>>>>>>>>>>> I used my conf, I can see manager is dispatch message to
>>>>>>>>>>>>>>> supervisor. But I cannot see any dataserver tries to connect
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> supervisor. At the same time, in the manager's log, I can 
>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>> dataserver are Dropped.
>>>>>>>>>>>>>>> How does xrootd decide which dataserver will connect
>>>>>>>>>>>>>>> supervisor?
>>>>>>>>>>>>>>> should I specify some dataservers to connect the supervisor?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (*) supervisor log
>>>>>>>>>>>>>>> 091211 15:07:00 30028 Dispatch manager.0:20@atlas-bkp2 for
>>>>>>>>>>>>>>> state
>>>>>>>>>>>>>>> dlen=42
>>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_State:
>>>>>>>>>>>>>>> /atlas/xrootd/users/wguan/test/test131141
>>>>>>>>>>>>>>> 091211 15:07:00 30028 manager.0:20@atlas-bkp2 do_StateFWD:
>>>>>>>>>>>>>>> Path
>>>>>>>>>>>>>>> find
>>>>>>>>>>>>>>> failed for state /atlas/xrootd/users/wguan/test/test131141
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (*)manager log
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu TSpace=5587GB
>>>>>>>>>>>>>>> NumFS=1
>>>>>>>>>>>>>>> FSpace=5693644MB MinFR=57218MB Util=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c185.chtc.wisc.edu adding path: 
>>>>>>>>>>>>>>> w
>>>>>>>>>>>>>>> /atlas
>>>>>>>>>>>>>>> 091211 04:13:24 15661 
>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Space: 5696231MB free; 0% util
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol:
>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 logged in.
>>>>>>>>>>>>>>> 091211 04:13:24 001 XrdInet: Accepted connection from
>>>>>>>>>>>>>>> [log in to unmask]
>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: running
>>>>>>>>>>>>>>> ?:[log in to unmask]
>>>>>>>>>>>>>>> inq=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdProtocol: matched protocol cmsd
>>>>>>>>>>>>>>> 091211 04:13:24 15661 ?:[log in to unmask] XrdPoll: FD 79
>>>>>>>>>>>>>>> attached
>>>>>>>>>>>>>>> to poller 2; num=22
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Add server.21739:[log in to unmask]
>>>>>>>>>>>>>>> bumps
>>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 #63
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Drop_Node:
>>>>>>>>>>>>>>> server.15905:[log in to unmask]:1094 dropped.
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Add Shoved
>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 to cluster; 
>>>>>>>>>>>>>>> id=63.78;
>>>>>>>>>>>>>>> num=64;
>>>>>>>>>>>>>>> min=51
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu TSpace=5587GB
>>>>>>>>>>>>>>> NumFS=1
>>>>>>>>>>>>>>> FSpace=5721854MB MinFR=57218MB Util=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Admit c187.chtc.wisc.edu adding path: 
>>>>>>>>>>>>>>> w
>>>>>>>>>>>>>>> /atlas
>>>>>>>>>>>>>>> 091211 04:13:24 15661 
>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Space: 5721854MB free; 0% util
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol:
>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 logged in.
>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdLink: Unable to recieve from
>>>>>>>>>>>>>>> c187.chtc.wisc.edu; connection reset by peer
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:24 15661 XrdSched: scheduling drop node in 60
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Remove_Node
>>>>>>>>>>>>>>> server.21739:[log in to unmask]:1094 node 63.78
>>>>>>>>>>>>>>> 091211 04:13:24 15661 Protocol:
>>>>>>>>>>>>>>> server.21739:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:24 15661 server.21739:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 79 detached from poller 2; num=21
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c177.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c177.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=16
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.24718:[log in to unmask]:1094 node 0.3
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.21656:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.21656:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 16 detached from poller 2; num=20
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c179.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=21
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.17065:[log in to unmask]:1094 node 1.4
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.7978:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7978:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 21
>>>>>>>>>>>>>>> detached from poller 1; num=21
>>>>>>>>>>>>>>> 091211 04:13:27 15661 State: Status changed to suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Send status to
>>>>>>>>>>>>>>> redirector.15656:14@atlas-bkp2
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c182.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c182.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=19
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.12937:[log in to unmask]:1094 node 7.10
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.26620:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26620:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 19 detached from poller 2; num=19
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c178.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c178.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=15
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.10842:[log in to unmask]:1094 node 9.12
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.11901:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.11901:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 15 detached from poller 1; num=20
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.5535:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c181.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c181.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=17
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.5535:[log in to unmask]:1094 node 5.8
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.13984:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.13984:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 17 detached from poller 0; num=21
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c183.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c183.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=22
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.23711:[log in to unmask]:1094 node 8.11
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.27735:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.27735:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 22 detached from poller 2; num=18
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c184.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=20
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.4131:[log in to unmask]:1094 node 3.6
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.26787:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.26787:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 20 detached from poller 0; num=20
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c185.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c185.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=23
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.10585:[log in to unmask]:1094 node 6.9
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.8524:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.8524:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 23
>>>>>>>>>>>>>>> detached from poller 0; num=19
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 
>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c180.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c180.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=18
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.20264:[log in to unmask]:1094 node 4.7
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.14636:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.14636:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 18 detached from poller 1; num=19
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Dispatch
>>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094
>>>>>>>>>>>>>>> for status dlen=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.1656:[log in to unmask]:1094
>>>>>>>>>>>>>>> do_Status:
>>>>>>>>>>>>>>> suspend
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=-1 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Node: c186.chtc.wisc.edu service
>>>>>>>>>>>>>>> suspended
>>>>>>>>>>>>>>> 091211 04:13:27 15661 XrdLink: No RecvAll() data from
>>>>>>>>>>>>>>> c186.chtc.wisc.edu
>>>>>>>>>>>>>>> FD=24
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Update Counts Parm1=0 Parm2=0
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Remove_Node
>>>>>>>>>>>>>>> server.1656:[log in to unmask]:1094 node 2.5
>>>>>>>>>>>>>>> 091211 04:13:27 15661 Protocol:
>>>>>>>>>>>>>>> server.7849:[log in to unmask]
>>>>>>>>>>>>>>> logged
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>> 091211 04:13:27 15661 server.7849:[log in to unmask]
>>>>>>>>>>>>>>> XrdPoll:
>>>>>>>>>>>>>>> FD
>>>>>>>>>>>>>>> 24
>>>>>>>>>>>>>>> detached from poller 1; num=18
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:14 15661 XrdSched: scheduling drop node in 13
>>>>>>>>>>>>>>> seconds
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.66 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.68 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.69 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.67 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.70 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.71 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.72 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.73 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.74 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.75 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=1
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.76 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: Now have 68 workers
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>> 091211 04:14:24 15661 Drop_Node 63.77 cancelled.
>>>>>>>>>>>>>>> 091211 04:14:24 15661 XrdSched: running drop node inq=0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Wen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Dec 11, 2009 at 9:50 PM, Andrew Hanushevsky
>>>>>>>>>>>>>>> <[log in to unmask]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Wen,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To go past 64 data servers you will need to setup one or
>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>> supervisors.
>>>>>>>>>>>>>>>> This does not logically change the current configuration 
>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>> have.
>>>>>>>>>>>>>>>> You
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> need to configure one or more *new* servers (or at least
>>>>>>>>>>>>>>>> xrootd
>>>>>>>>>>>>>>>> processes)
>>>>>>>>>>>>>>>> whose role is supervisor. We'd like them to run in separate
>>>>>>>>>>>>>>>> machines
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> reliability purposes, but they could run on the manager 
>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> long
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>> give each one a unique instance name (i.e., -n option).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The front part of the cmsd reference explains how to do
>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://xrootd.slac.stanford.edu/doc/prod/cms_config.htm
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andy
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, 11 Dec 2009, wen guan wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Andrew,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there any change to configure xrootd with more than 65
>>>>>>>>>>>>>>>>> machines? I used the configure below but it doesn't work.
>>>>>>>>>>>>>>>>> Should
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> configure some machines' manager to be supvervisor?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://wisconsin.cern.ch/~wguan/xrdcluster.cfg
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Wen
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>