Ok, it can be done even easier because the logic is constructed like that: <Resolve URLs>; for (try=0; try< maxretry; ....) { while <resolved url-left> { // try connect if (failed) <remove tried url>; } } So I propose to resolve inside the retry loop, still all aliases will be tried in the inner loop for (try=0; try< maxretry; ....) { <Resolve URLs>; while <resolved url-left> { // try connect if (failed) <remove tried url>; } } Is that ok? Cheers Andreas. On Wed, Jun 8, 2011 at 11:00 AM, Andrew Hanushevsky <[log in to unmask]>wrote: > Hi Andreas, > > If you limit it to a list size of 1 then it would be perfect. Please submit > the patch. > > > Andy > > On Wed, 8 Jun 2011, Andreas-Joachim Peters wrote: > > Hi Andy, >> the DNS is manipulated to point always to the working machine(machine in >> production). >> >> If we could do it only in the case where the list size = 1, that would be >> perfectly fine and the default case for real load balancing would still be >> like before. >> >> Cheers Andreas. >> >> On Wed, Jun 8, 2011 at 5:02 AM, Andrew Hanushevsky <[log in to unmask] >> >wrote: >> >> Hi Andreas, >>> >>> I think the issue here is that the reason the list is not re-translated >>> every time because he client picks (in a random order) through the list >>> when >>> it reconnects. If the DNS returns the list in random order itself (as >>> most >>> do now) then the process may, at best, become ineffecient (i.e., failing >>> hosts being unnecessarily retried) and at worst, never converge. >>> >>> Most larger sites setup a DNS entry with multiple addresses so that they >>> can automatically fail-over this mode of operation. As I look at the >>> code, >>> this process works only if the list of addresses is stable (at least >>> until >>> all of them have been tried once). Hence, that's why the list is >>> translated >>> only once. >>> >>> Three options exist: a) the easy one is to have an option (e.g. envar) >>> control the behaviour with old behaviour being the default, b) only >>> retranslated the list after all entries have been tried (I think this is >>> much harder), and c) retranslate only if the DNS call returned a single >>> entry (don't know if this is really what you are after but it's the >>> safest). >>> >>> I still don't quite understand how retranslating the DNS will allow you >>> to >>> be more fault tolerant unless the DNS knows to return an address that is >>> working when the one it has is not working. Hint? >>> >>> Andy >>> >>> >>> On Wed, 8 Jun 2011, Andreas-Joachim Peters wrote: >>> >>> Hi, >>> >>>> I have the following request to change some basic behaviour of the >>>> xrootd >>>> client code. >>>> >>>> Currently when an XrdClient::Open was issued or XrdClientAdmin::Connect >>>> any >>>> DNS alias is only resolved once in the beginning and then stays forever >>>> as >>>> target in the loop which honours the settings for retry/reconnect etc. >>>> >>>> Andy ... are there any objections to change this behaviour and to >>>> resolve >>>> the alias again before each retry? I would need that behavour to have an >>>> active/passive failover via DNS alias configured. >>>> >>>> The code change is trivial (two lines inserted). >>>> >>>> Cheers Andreas. >>>> >>>> >>>> >>