Hi, Won't the client try other IPs if the first one fails? I'd also recommend not to mess with DNS ... changes can take a really long time to propagate ... probably longer than it takes to fix the problem :) Or you want to mess with local DNS at each cluster? In short, I'd expect that one RR-DNS entry will work ok for both cases: - jobs falling back to xrootd; - managers connecting to meta managers. Matevz On 03/28/14 13:20, Andrew Hanushevsky wrote: > Hi Tommaso, > > Indeed, if this is what you plan to implement then you would need one entry that > is stable and another that is not. While I don't quite understand what you are > trying to accomplish I can say that anything that relies on the presence or > absence of an entry in DNS will likely not work. Why? Because imagine that there > are jobs running between the time a server dies and the time then dead server is > removed from DNS, which could be a substantial delay in the eyes of the jobs. > What happens to those jobs? Likely they will fail because they rely on the DNS > entries to be correct. So, any scheme to avoid dead servers really has to be > handled by the application not some external agent. > > Andy > > On Fri, 28 Mar 2014, Tommaso Boccali wrote: > >> ciao andrew, understood & it makes sense, thanks. >> But then. I need a different solution for stageout. >> As you explained to me here, >> >> all.manager meta all xrootd.infn.it+ 1213 (probably wrong since gmail is >> playing with the text, but nevermind ... it is what you wrote ;) >> >> needs xrootd.infn.it to be the list of all possible redirectors, regardless >> of their state. Fine. >> >> On the other hand, for CMSSW fallback I need to specify something like >> >> if file /store/file.root not locally available --> try root:// >> xrootd.infn.it//store/file.root >> >> in this case, instead, I want xrootd.infn.it to resolve only to those >> redirectors which are _currently_ ok, right? >> >> then I am afraid I need 2 DNS aliases >> >> - xrootd-fulllist.infn.it : the machines which are installed as redirector, >> from which nagios never removes anything >> - xrootd.infn.it : the subset of the previous with only currently working >> redirectors, to be used in the fallback statement. >> >> Is this correct? >> >> thanks again! >> >> tom >> >> >> >> On Fri, Mar 28, 2014 at 8:41 PM, Andrew Hanushevsky >> <[log in to unmask]>wrote: >> >>> Hi Tommaso, >>> >>> See below... >>> >>> >>> On Fri, 28 Mar 2014, Tommaso Boccali wrote: >>> >>> Let's say we prepare 2 regional redirectors, 1.1.1.1 and 2.2.2.2, and we >>>> punt them in the DNS as xrootd.infn.it (no round robin: "host >>>> xrootd.infn.it" >>>> will return 2 IP addresses). >>>> Since we want to use xrootd.infn.it as fallback, we plan to have a nagios >>>> test which checks 1.1.1.1 and 2.2.2.2 periodically, and in case one is NOT >>>> ok, it is removed from the DNS. >>>> >>> You shuld never remove anuthing from DNS, it will break all the >>> recoverability aspects of xrootd. Even if it's broken it should remian in >>> DNS. Hence, you don't need anything special. Just leave both servers in DNS >>> all teh time. >>> >>> > So, question was: >>> >>>> let's say that site ABCD needs to restart the xrootd local servers, which >>>> are configured as >>>> >>>> all.manager meta all *xrootd.infn.it <http://xrootd.infn.it>*+ 1213 >>>> >>> Uhm, tyhe above won't work/ Perhaps you really wanted to say >>> >>> >>> all.manager meta all xrootd.infn.it+ 1213 >>> >>>> what happens if AT THE RESTART MOMENT xrootd.infn.it only resolves to >>>> 1.1.1.1 (since eventually 2.2.2.2 is broken)? And even more, what if 2 >>>> hours later 2.2.2.2 enters again the DNS resolution for >>>> >>> It won't re-resolve. That's why you always leave both addresses in DNS. >>> >>> Andy >>> >> >> >> >> -- >> Tommaso Boccali >> INFN Pisa >> > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1