Print

Print


Hi,

Won't the client try other IPs if the first one fails?

I'd also recommend not to mess with DNS ... changes can take a really long time 
to propagate ... probably longer than it takes to fix the problem :) Or you want 
to mess with local DNS at each cluster?

In short, I'd expect that one RR-DNS entry will work ok for both cases:
- jobs falling back to xrootd;
- managers connecting to meta managers.

Matevz

On 03/28/14 13:20, Andrew Hanushevsky wrote:
> Hi Tommaso,
>
> Indeed, if this is what you plan to implement then you would need one entry that
> is stable and another that is not. While I don't quite understand what you are
> trying to accomplish I can say that anything that relies on the presence or
> absence of an entry in DNS will likely not work. Why? Because imagine that there
> are jobs running between the time a server dies and the time then dead server is
> removed from DNS, which could be a substantial delay in the eyes of the jobs.
> What happens to those jobs? Likely they will fail because they rely on the DNS
> entries to be correct. So, any scheme to avoid dead servers really has to be
> handled by the application not some external agent.
>
> Andy
>
> On Fri, 28 Mar 2014, Tommaso Boccali wrote:
>
>> ciao andrew, understood & it makes sense, thanks.
>> But then. I need a different solution for stageout.
>> As you explained to me here,
>>
>> all.manager meta all xrootd.infn.it+ 1213  (probably wrong since gmail is
>> playing with the text, but nevermind ... it is what you wrote ;)
>>
>> needs xrootd.infn.it to be the list of all possible redirectors, regardless
>> of their state. Fine.
>>
>> On the other hand, for CMSSW fallback I need to specify something like
>>
>> if file /store/file.root not locally available --> try root://
>> xrootd.infn.it//store/file.root
>>
>> in this case, instead, I want xrootd.infn.it to resolve only to those
>> redirectors which are _currently_ ok, right?
>>
>> then I am afraid I need 2 DNS aliases
>>
>> - xrootd-fulllist.infn.it : the machines which are installed as redirector,
>> from which nagios never removes anything
>> - xrootd.infn.it : the subset of the previous with only currently working
>> redirectors, to be used in the fallback statement.
>>
>> Is this correct?
>>
>> thanks again!
>>
>> tom
>>
>>
>>
>> On Fri, Mar 28, 2014 at 8:41 PM, Andrew Hanushevsky
>> <[log in to unmask]>wrote:
>>
>>> Hi Tommaso,
>>>
>>> See below...
>>>
>>>
>>> On Fri, 28 Mar 2014, Tommaso Boccali wrote:
>>>
>>>  Let's say we prepare 2 regional redirectors, 1.1.1.1 and 2.2.2.2, and we
>>>> punt them in the DNS as xrootd.infn.it (no round robin: "host
>>>> xrootd.infn.it"
>>>> will return 2 IP addresses).
>>>> Since we want to use xrootd.infn.it as fallback, we plan to have a nagios
>>>> test which checks 1.1.1.1 and 2.2.2.2 periodically, and in case one is NOT
>>>> ok, it is removed from the DNS.
>>>>
>>> You shuld never remove anuthing from DNS, it will break all the
>>> recoverability aspects of xrootd. Even if it's broken it should remian in
>>> DNS. Hence, you don't need anything special. Just leave both servers in DNS
>>> all teh time.
>>>
>>> > So, question was:
>>>
>>>> let's say that site ABCD needs to restart the xrootd local servers, which
>>>> are configured as
>>>>
>>>> all.manager meta all *xrootd.infn.it <http://xrootd.infn.it>*+ 1213
>>>>
>>> Uhm, tyhe above won't work/ Perhaps you really wanted to say
>>>
>>>
>>> all.manager meta all xrootd.infn.it+ 1213
>>>
>>>> what happens if AT THE RESTART MOMENT xrootd.infn.it only resolves  to
>>>> 1.1.1.1 (since eventually 2.2.2.2 is broken)? And even more, what if 2
>>>> hours later 2.2.2.2 enters again the DNS resolution for
>>>>
>>> It won't re-resolve. That's why you always leave both addresses in DNS.
>>>
>>> Andy
>>>
>>
>>
>>
>> --
>> Tommaso Boccali
>> INFN Pisa
>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1