Hi,
Won't the client try other IPs if the first one fails?
I'd also recommend not to mess with DNS ... changes can take a really long time to propagate ... probably longer than it takes to fix the problem :) Or you want to mess with local DNS at each cluster?
In short, I'd expect that one RR-DNS entry will work ok for both cases:
- jobs falling back to xrootd;
- managers connecting to meta managers.
Matevz
On 03/28/14 13:20, Andrew Hanushevsky wrote:
Hi Tommaso,
Indeed, if this is what you plan to implement then you would need one entry that
is stable and another that is not. While I don't quite understand what you are
trying to accomplish I can say that anything that relies on the presence or
absence of an entry in DNS will likely not work. Why? Because imagine that there
are jobs running between the time a server dies and the time then dead server is
removed from DNS, which could be a substantial delay in the eyes of the jobs.
What happens to those jobs? Likely they will fail because they rely on the DNS
entries to be correct. So, any scheme to avoid dead servers really has to be
handled by the application not some external agent.
Andy
On Fri, 28 Mar 2014, Tommaso Boccali wrote:
ciao andrew, understood & it makes sense, thanks.
But then. I need a different solution for stageout.
As you explained to me here,
all.manager meta all xrootd.infn.it+ 1213 (probably wrong since gmail is
playing with the text, but nevermind ... it is what you wrote ;)
needs xrootd.infn.it to be the list of all possible redirectors, regardless
of their state. Fine.
On the other hand, for CMSSW fallback I need to specify something like
if file /store/file.root not locally available --> try root://
xrootd.infn.it//store/file.root
in this case, instead, I want xrootd.infn.it to resolve only to those
redirectors which are _currently_ ok, right?
then I am afraid I need 2 DNS aliases
- xrootd-fulllist.infn.it : the machines which are installed as redirector,
from which nagios never removes anything
- xrootd.infn.it : the subset of the previous with only currently working
redirectors, to be used in the fallback statement.
Is this correct?
thanks again!
tom
On Fri, Mar 28, 2014 at 8:41 PM, Andrew Hanushevsky
<[log in to unmask]>wrote:
Hi Tommaso,
See below...
On Fri, 28 Mar 2014, Tommaso Boccali wrote:
Let's say we prepare 2 regional redirectors, 1.1.1.1 and 2.2.2.2, and we
punt them in the DNS as xrootd.infn.it (no round robin: "hostYou shuld never remove anuthing from DNS, it will break all the
xrootd.infn.it"
will return 2 IP addresses).
Since we want to use xrootd.infn.it as fallback, we plan to have a nagios
test which checks 1.1.1.1 and 2.2.2.2 periodically, and in case one is NOT
ok, it is removed from the DNS.
recoverability aspects of xrootd. Even if it's broken it should remian in
DNS. Hence, you don't need anything special. Just leave both servers in DNS
all teh time.
> So, question was:
let's say that site ABCD needs to restart the xrootd local servers, whichUhm, tyhe above won't work/ Perhaps you really wanted to say
are configured as
all.manager meta all *xrootd.infn.it <http://xrootd.infn.it>*+ 1213
all.manager meta all xrootd.infn.it+ 1213
what happens if AT THE RESTART MOMENT xrootd.infn.it only resolves toIt won't re-resolve. That's why you always leave both addresses in DNS.
1.1.1.1 (since eventually 2.2.2.2 is broken)? And even more, what if 2
hours later 2.2.2.2 enters again the DNS resolution for
Andy
--
Tommaso Boccali
INFN Pisa
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1