Print

Print


I thought that DNS RR serves the purpose of light-weight load balancing 
between multiple redirectors. But now, I see that each redirector know 
about the others and they can be configured as fail-over or distributing 
the load between each other.

Thanks, I will try that
Pavel


Fabrizio Furano wrote:
> Hi Pavel,
>
>  the "problem" is the DNS RR. The client receives only one IP addr and 
> keeps that. You should only create an alias instead of using RR at the 
> DNS level and make sure that the DNS gives all the aliases when 
> requested to translate the addr.
>
>  For example, this is the output of nslookup in the case of the 
> redirectors at SLAC:
>
> fabrizio@bradipo 10:01:26 ~>nslookup kanolb-a.slac.stanford.edu
> Server:         192.84.143.16
> Address:        192.84.143.16#53
>
> Non-authoritative answer:
> Name:   kanolb-a.slac.stanford.edu
> Address: 134.79.85.23
> Name:   kanolb-a.slac.stanford.edu
> Address: 134.79.85.24
>
>
>  as you can see, both IPs are returned to the client at the same time, 
> and they will be considered during the connection phase.
>
>  Fabrizio
>
> Pavel Jakl wrote:
>> Hi Fabrizio and Andy,
>>
>> I am not sure if we discussed this before, but let me explain my 
>> problem. When Andy has implemented multiple redirectors for clusters 
>> bigger than 64 servers, I though that it would bring us the full 
>> recoverability in the case that something happened to the host acting 
>> as redirector.
>> I did few tests and has found that client is not ready for that. 
>> Maybe I am wrong and doing something wrong, so let me explain it.
>> We have DNS RR containing 2 servers configured as full redirectors 
>> and managers of the cluster.
>> [The example: DNS RR - xrdstar.rcf.bnl.gov and 2 redirectors 
>> (rcas6132, rcas6182)]
>>
>> The problem is that client will initially resolve one of the 
>> redirectors, but if the particular host is down, the client doesn't 
>> try to  connect to the second redirector.  It even doesn't keep track 
>> of the servers which are available under DNS RR. I am not against 
>> that client will try to connect fixed number of times, but when he is 
>> not successful, move to other server under DNS RR.
>>
>> As you can see in the example, it resolves rcas6132 which was 
>> temporarily down, but didn't try the second one rcas6182... I am not 
>> sure, but how you handle this in SLAC ? What I am doing wrong ?
>>
>> Thanks
>> Pavel
>>
>> CINT/ROOT C/C++ Interpreter version 5.16.13, June 8, 2006
>> Type ? for help. Commands must be C++ statements.
>> Enclose multiple statements between { }.
>> *** Float Point Exception is OFF ***
>> *** Start at Date : Sun Apr 22 08:07:06 2007
>> QAInfo:You are using STAR_LEVEL : dev, ROOT_LEVEL : 5.12.00 and node 
>> : rcas6009.rcf.bnl.gov
>> root4star [0]
>> Processing XROOTD_macro.C...
>> 070422 08:07:06 001 Xrd: Create: (C) 2004 SLAC INFN XrdClient 
>> kXR_ver002+kXR_asyncap
>> 070422 08:07:06 001 Xrd: TakeUrl: parsing url:
>> 070422 08:07:06 001 Xrd: GetDomainToMatch: 
>> GetHostName(rcas6009.rcf.bnl.gov) returned name=rcas6009.rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: GetDomainToMatch: 
>> GetDomain(rcas6009.rcf.bnl.gov) --> rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: parsing: 
>> root://xrdstar.rcf.bnl.gov:1097//data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root 
>>
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: protocol: root
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: file: 
>> /data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root 
>>
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: list of [host:port] : 
>> xrdstar.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: Remote file to open is 
>> '/data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root' 
>>
>> 070422 08:07:06 001 Xrd: XrdClientUrlSet: parsing entity: 
>> xrdstar.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: TakeUrl: parsing url: xrdstar.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: TakeUrl:    HostWPort:   
>> xrdstar.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: TakeUrl:    File:   /
>> 070422 08:07:06 001 Xrd: TakeUrl:    Host:   xrdstar.rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: TakeUrl:    Port:   1097
>> 070422 08:07:06 001 Xrd: ConvertDNSAlias: resolving 
>> xrdstar.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: CheckPort: specified port (1097) potentially 
>> valid.
>> 070422 08:07:06 001 Xrd: ConvertDNSAlias: found host 
>> rcas6132.rcf.bnl.gov with addr 130.199.206.182
>> 070422 08:07:06 001 Xrd: ShowUrls: The converted URLs count is 1
>> 070422 08:07:06 001 Xrd: ShowUrls: URL n.1: 
>> root://rcas6132.rcf.bnl.gov:1097//data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root. 
>>
>> 070422 08:07:06 001 Xrd: GetDomainToMatch: 
>> GetHostName(rcas6132.rcf.bnl.gov) returned name=rcas6132.rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: GetDomainToMatch: 
>> GetDomain(rcas6132.rcf.bnl.gov) --> rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: CheckHostDomain: Resolved 
>> [rcas6132.rcf.bnl.gov]'s domain name into [rcf.bnl.gov]
>> 070422 08:07:06 001 Xrd: DomainMatcher: search for 'rcf.bnl.gov' in 
>> '<unknown>'
>> 070422 08:07:06 001 Xrd: DomainMatcher: checking domain: <unknown>
>> 070422 08:07:06 001 Xrd: DomainMatcher: no domain matching 
>> 'rcf.bnl.gov' found in '<unknown>'
>> 070422 08:07:06 001 Xrd: DomainMatcher: search for 'rcf.bnl.gov' in 
>> 'rcf.bnl.gov|usatlas.bnl.gov'
>> 070422 08:07:06 001 Xrd: DomainMatcher: checking domain: rcf.bnl.gov
>> 070422 08:07:06 001 Xrd: DomainMatcher: domain: rcf.bnl.gov matches 
>> 'rcf.bnl.gov' (matching chars: 11)
>> 070422 08:07:06 001 Xrd: CheckHostDomain: Access granted to the 
>> domain of [rcas6132.rcf.bnl.gov].
>> 070422 08:07:06 001 Xrd: Open: Trying to connect to 
>> rcas6132.rcf.bnl.gov:1097. Connect try 1
>> 070422 08:07:06 001 Xrd: XrdClientConn: Trying to connect to 
>> 130.199.206.182:1097
>> 070422 08:07:06 001 Xrd: Connect: Creating a logical connection...
>> 070422 08:07:06 001 Xrd: Connect: Physical connection not found. 
>> Creating a new one...
>> 070422 08:07:06 001 Xrd: Touch: Setting last use to current 
>> time1177243626
>> 070422 08:07:06 001 Xrd: Connect: Connecting to 
>> [rcas6132.rcf.bnl.gov:1097]
>> 070422 08:07:06 001 Xrd: ClientSock::TryConnect: Trying to connect 
>> torcas6132.rcf.bnl.gov(130.199.206.182):1097 Timeout=60
>> 070422 08:07:06 001 Xrd: ClientSock::TryConnect: Connection 
>> torcas6132.rcf.bnl.gov:1097 failed. (-1)
>> 070422 08:07:06 001 Xrd: Connect: can't open connection to 
>> [rcas6132.rcf.bnl.gov:1097]
>> 070422 08:07:06 001 Xrd: PhyConnection: Disconnecting socket...
>> 070422 08:07:06 001 Xrd: XrdClientPhyConnection: Destroying. [:-1]
>> 070422 08:07:06 001 Xrd: PhyConnection: Disconnecting socket...
>> 070422 08:07:06 001 Xrd: Connect: Connect(rcas6132.rcf.bnl.gov, 1097) 
>> returned -1
>> 070422 08:07:06 001 Xrd: XrdNetFile: Error creating logical 
>> connection to rcas6132.rcf.bnl.gov:1097
>> 070422 08:07:06 001 Xrd: Open: Disconnecting.
>> 070422 08:07:06 001 Xrd: Open: Connection attempt failed. Sleeping 20 
>> seconds.
>>
>>
>>
>>