Ok,
it can be done even easier because the logic is constructed like that:

<Resolve URLs>;
for (try=0; try< maxretry; ....)  {
   while <resolved url-left> {
     // try connect
     if (failed) <remove tried url>;
   }
}

So I propose to resolve inside the retry loop, still all aliases will be tried in the inner loop

for (try=0; try< maxretry; ....)  {
   <Resolve URLs>;
   while <resolved url-left> {
     // try connect
    if (failed) <remove tried url>;
   }
}

Is that ok?

Cheers Andreas.

On Wed, Jun 8, 2011 at 11:00 AM, Andrew Hanushevsky <[log in to unmask]> wrote:
Hi Andreas,

If you limit it to a list size of 1 then it would be perfect. Please submit the patch.


Andy

On Wed, 8 Jun 2011, Andreas-Joachim Peters wrote:

Hi Andy,
the DNS is manipulated to point always to the working machine(machine in
production).

If we could do it only in the case where the list size = 1, that would be
perfectly fine and the default case for real load balancing would still be
like before.

Cheers Andreas.

On Wed, Jun 8, 2011 at 5:02 AM, Andrew Hanushevsky <[log in to unmask]>wrote:

Hi Andreas,

I think the issue here is that the reason the list is not re-translated
every time because he client picks (in a random order) through the list when
it reconnects. If the DNS returns the list in random order itself (as most
do now) then the process may, at best, become ineffecient (i.e., failing
hosts being unnecessarily retried) and at worst, never converge.

Most larger sites setup a DNS entry with multiple addresses so that they
can automatically fail-over this mode of operation. As I look at the code,
this process works only if the list of addresses is stable (at least until
all of them have been tried once). Hence, that's why the list is translated
only once.

Three options exist: a) the easy one is to have an option (e.g. envar)
control the behaviour with old behaviour being the default, b) only
retranslated the list after all entries have been tried (I think this is
much harder), and c) retranslate only if the DNS call returned a single
entry (don't know if this is really what you are after but it's the safest).

I still don't quite understand how retranslating the DNS will allow you to
be more fault tolerant unless the DNS knows to return an address that is
working when the one it has is not working. Hint?

Andy


On Wed, 8 Jun 2011, Andreas-Joachim Peters wrote:

 Hi,
I have the following request to change some basic behaviour of the xrootd
client code.

Currently when an XrdClient::Open was issued or XrdClientAdmin::Connect
any
DNS alias is only resolved once in the beginning and then stays forever as
target in the loop which honours the settings for retry/reconnect etc.

Andy ... are there any objections to change this behaviour and to resolve
the alias again before each retry? I would need that behavour to have an
active/passive failover via DNS alias configured.

The code change is trivial (two lines inserted).

Cheers Andreas.