Print

Print


Hi Andy,
the DNS is manipulated to point always to the working machine(machine in production).

If we could do it only in the case where the list size = 1, that would be perfectly fine and the default case for real load balancing would still be like before.

Cheers Andreas.

On Wed, Jun 8, 2011 at 5:02 AM, Andrew Hanushevsky <[log in to unmask]> wrote:
Hi Andreas,

I think the issue here is that the reason the list is not re-translated every time because he client picks (in a random order) through the list when it reconnects. If the DNS returns the list in random order itself (as most do now) then the process may, at best, become ineffecient (i.e., failing hosts being unnecessarily retried) and at worst, never converge.

Most larger sites setup a DNS entry with multiple addresses so that they can automatically fail-over this mode of operation. As I look at the code, this process works only if the list of addresses is stable (at least until all of them have been tried once). Hence, that's why the list is translated only once.

Three options exist: a) the easy one is to have an option (e.g. envar) control the behaviour with old behaviour being the default, b) only retranslated the list after all entries have been tried (I think this is much harder), and c) retranslate only if the DNS call returned a single entry (don't know if this is really what you are after but it's the safest).

I still don't quite understand how retranslating the DNS will allow you to be more fault tolerant unless the DNS knows to return an address that is working when the one it has is not working. Hint?

Andy


On Wed, 8 Jun 2011, Andreas-Joachim Peters wrote:

Hi,
I have the following request to change some basic behaviour of the xrootd
client code.

Currently when an XrdClient::Open was issued or XrdClientAdmin::Connect any
DNS alias is only resolved once in the beginning and then stays forever as
target in the loop which honours the settings for retry/reconnect etc.

Andy ... are there any objections to change this behaviour and to resolve
the alias again before each retry? I would need that behavour to have an
active/passive failover via DNS alias configured.

The code change is trivial (two lines inserted).

Cheers Andreas.