Print

Print


ok, I hope I understood better now (plus some tests).

The easy change I had tried 

break ;
->
  if (locallogid != -1) break;


is not safe, since can lead to an infinite loop. Something like

                    if (locallogid != -1) break;
                    else urlArray.EraseUrl(thisUrl);
	} else {

makes more sense.

That said, I sort of understand the general behaviour now

1- the client gets an array of possible urls (size 2 in our tests).
2- from that, it always chooses the same entry # in the array (given the non randomness on the same file of getRandom)
3- a connect failure does not lead to “try another”, but to “try again from scratch, from point #1
4- if the list of urls you get back at #1 is always in the same order (due to local NSCP caching), you always choose the same entry # which maps to the same url, so if you failed at first, you fail forever

If I am not wrong, the only HA mechanism here relies on the fact that DNS randomizes the order of the aliased IP - which it does, but unfortunately NSCP vanishes that

tom
On 24 Apr 2014, at 11:57, Tommaso Boccali <[log in to unmask]> wrote:

> ciao fabrizio, thanks indeed that makes sense, I was just saying that then there is no intrinsic randomness, and you (the client ) has to rely on the “else” to try all the clients.
> 
> Indeed I can be chasing ghosts, but I read
> 
>        while (urlArray.Size() > 0) {
> 
>          unsigned int seed = XrdOucCRC::CRC32((const unsigned char*)unfo.File.c_str(), unfo.File.length());
> 
>            // Get an url from the available set
>            if ((thisUrl = urlArray.GetARandomUrl(seed))) {
>                std::cout <<" TOMMASO GOT "<<thisUrl->Host<<std::endl;
> 
>                if (fConnModule->CheckHostDomain(thisUrl->Host)) {
>                    nogoodurl = FALSE;
> 
>                    Info(XrdClientDebug::kHIDEBUG, "Open", "Trying to connect to " <<
>                         thisUrl->Host << ":" << thisUrl->Port << ". Connect try " <<
>                         connectTry+1);
>                    locallogid = fConnModule->Connect(*thisUrl, this);
>                    // To find out if we have tried the whole URLs set
>                    std::cout <<" TOMMASO GOT2 "<<locallogid<<std::endl;
>                    urlstried++;
>>>>>>                   break;
>                } else {
>                    // Invalid domain: drop the url and move to next, if any
>                    urlArray.EraseUrl(thisUrl);
>                    continue;
>                }
>            }
>        }
> 
> so we “break” in any case, even if fConnModule->Connect( failed.
> 
> I tentatitevely changed this to 
> 
>                    if (locallogid != -1) break;
> 
> and now it tries again with the other server if the first connection fails
> 
> 
> probably what i say does not make sense, but I cannot see otherwise how you spot a failed connection, without checking the result of “connect” …
> 
> tom
> 
> On 24 Apr 2014, at 11:29, Fabrizio Furano <[log in to unmask]> wrote:
> 
>> Hi Matevz,
>> 
>> I think that I can comment on this historical item.
>> Yes, the reproduceability was explicitely requested a couple of
>> aeons ago. It made sense to me too, easing the debugging.
>> 
>> For the other issue with attempt #7 I don't know...
>> 
>> Fabrizio
>> 
>> 
>> On 24/04/14 11:20, Tommaso Boccali wrote:
>>> please note another thing:
>>> 
>>>          unsigned int seed = XrdOucCRC::CRC32((const unsigned
>>> char*)unfo.File.c_str(), unfo.File.length());
>>> 
>>> so per file always the same random number is spit out, is this wanted?
>>> clearly to have this working one needs the “else” to be operational
>>> above, otherwise there is neither randomness, not retry ….
>> 
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
> 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1