ok, I hope I understood better now (plus some tests).
The easy change I had tried
break ;
->
if (locallogid != -1) break;
is not safe, since can lead to an infinite loop. Something like
if (locallogid != -1) break;
else urlArray.EraseUrl(thisUrl);
} else {
makes more sense.
That said, I sort of understand the general behaviour now
1- the client gets an array of possible urls (size 2 in our tests).
2- from that, it always chooses the same entry # in the array (given the non randomness on the same file of getRandom)
3- a connect failure does not lead to “try another”, but to “try again from scratch, from point #1
4- if the list of urls you get back at #1 is always in the same order (due to local NSCP caching), you always choose the same entry # which maps to the same url, so if you failed at first, you fail forever
If I am not wrong, the only HA mechanism here relies on the fact that DNS randomizes the order of the aliased IP - which it does, but unfortunately NSCP vanishes that
tom
On 24 Apr 2014, at 11:57, Tommaso Boccali <[log in to unmask]> wrote:
> ciao fabrizio, thanks indeed that makes sense, I was just saying that then there is no intrinsic randomness, and you (the client ) has to rely on the “else” to try all the clients.
>
> Indeed I can be chasing ghosts, but I read
>
> while (urlArray.Size() > 0) {
>
> unsigned int seed = XrdOucCRC::CRC32((const unsigned char*)unfo.File.c_str(), unfo.File.length());
>
> // Get an url from the available set
> if ((thisUrl = urlArray.GetARandomUrl(seed))) {
> std::cout <<" TOMMASO GOT "<<thisUrl->Host<<std::endl;
>
> if (fConnModule->CheckHostDomain(thisUrl->Host)) {
> nogoodurl = FALSE;
>
> Info(XrdClientDebug::kHIDEBUG, "Open", "Trying to connect to " <<
> thisUrl->Host << ":" << thisUrl->Port << ". Connect try " <<
> connectTry+1);
> locallogid = fConnModule->Connect(*thisUrl, this);
> // To find out if we have tried the whole URLs set
> std::cout <<" TOMMASO GOT2 "<<locallogid<<std::endl;
> urlstried++;
>>>>>> break;
> } else {
> // Invalid domain: drop the url and move to next, if any
> urlArray.EraseUrl(thisUrl);
> continue;
> }
> }
> }
>
> so we “break” in any case, even if fConnModule->Connect( failed.
>
> I tentatitevely changed this to
>
> if (locallogid != -1) break;
>
> and now it tries again with the other server if the first connection fails
>
>
> probably what i say does not make sense, but I cannot see otherwise how you spot a failed connection, without checking the result of “connect” …
>
> tom
>
> On 24 Apr 2014, at 11:29, Fabrizio Furano <[log in to unmask]> wrote:
>
>> Hi Matevz,
>>
>> I think that I can comment on this historical item.
>> Yes, the reproduceability was explicitely requested a couple of
>> aeons ago. It made sense to me too, easing the debugging.
>>
>> For the other issue with attempt #7 I don't know...
>>
>> Fabrizio
>>
>>
>> On 24/04/14 11:20, Tommaso Boccali wrote:
>>> please note another thing:
>>>
>>> unsigned int seed = XrdOucCRC::CRC32((const unsigned
>>> char*)unfo.File.c_str(), unfo.File.length());
>>>
>>> so per file always the same random number is spit out, is this wanted?
>>> clearly to have this working one needs the “else” to be operational
>>> above, otherwise there is neither randomness, not retry ….
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|