ok, I hope I understood better now (plus some tests). The easy change I had tried break ; -> if (locallogid != -1) break; is not safe, since can lead to an infinite loop. Something like if (locallogid != -1) break; else urlArray.EraseUrl(thisUrl); } else { makes more sense. That said, I sort of understand the general behaviour now 1- the client gets an array of possible urls (size 2 in our tests). 2- from that, it always chooses the same entry # in the array (given the non randomness on the same file of getRandom) 3- a connect failure does not lead to “try another”, but to “try again from scratch, from point #1 4- if the list of urls you get back at #1 is always in the same order (due to local NSCP caching), you always choose the same entry # which maps to the same url, so if you failed at first, you fail forever If I am not wrong, the only HA mechanism here relies on the fact that DNS randomizes the order of the aliased IP - which it does, but unfortunately NSCP vanishes that tom On 24 Apr 2014, at 11:57, Tommaso Boccali <[log in to unmask]> wrote: > ciao fabrizio, thanks indeed that makes sense, I was just saying that then there is no intrinsic randomness, and you (the client ) has to rely on the “else” to try all the clients. > > Indeed I can be chasing ghosts, but I read > > while (urlArray.Size() > 0) { > > unsigned int seed = XrdOucCRC::CRC32((const unsigned char*)unfo.File.c_str(), unfo.File.length()); > > // Get an url from the available set > if ((thisUrl = urlArray.GetARandomUrl(seed))) { > std::cout <<" TOMMASO GOT "<<thisUrl->Host<<std::endl; > > if (fConnModule->CheckHostDomain(thisUrl->Host)) { > nogoodurl = FALSE; > > Info(XrdClientDebug::kHIDEBUG, "Open", "Trying to connect to " << > thisUrl->Host << ":" << thisUrl->Port << ". Connect try " << > connectTry+1); > locallogid = fConnModule->Connect(*thisUrl, this); > // To find out if we have tried the whole URLs set > std::cout <<" TOMMASO GOT2 "<<locallogid<<std::endl; > urlstried++; >>>>>> break; > } else { > // Invalid domain: drop the url and move to next, if any > urlArray.EraseUrl(thisUrl); > continue; > } > } > } > > so we “break” in any case, even if fConnModule->Connect( failed. > > I tentatitevely changed this to > > if (locallogid != -1) break; > > and now it tries again with the other server if the first connection fails > > > probably what i say does not make sense, but I cannot see otherwise how you spot a failed connection, without checking the result of “connect” … > > tom > > On 24 Apr 2014, at 11:29, Fabrizio Furano <[log in to unmask]> wrote: > >> Hi Matevz, >> >> I think that I can comment on this historical item. >> Yes, the reproduceability was explicitely requested a couple of >> aeons ago. It made sense to me too, easing the debugging. >> >> For the other issue with attempt #7 I don't know... >> >> Fabrizio >> >> >> On 24/04/14 11:20, Tommaso Boccali wrote: >>> please note another thing: >>> >>> unsigned int seed = XrdOucCRC::CRC32((const unsigned >>> char*)unfo.File.c_str(), unfo.File.length()); >>> >>> so per file always the same random number is spit out, is this wanted? >>> clearly to have this working one needs the “else” to be operational >>> above, otherwise there is neither randomness, not retry …. >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1