I thought that DNS RR serves the purpose of light-weight load balancing between multiple redirectors. But now, I see that each redirector know about the others and they can be configured as fail-over or distributing the load between each other. Thanks, I will try that Pavel Fabrizio Furano wrote: > Hi Pavel, > > the "problem" is the DNS RR. The client receives only one IP addr and > keeps that. You should only create an alias instead of using RR at the > DNS level and make sure that the DNS gives all the aliases when > requested to translate the addr. > > For example, this is the output of nslookup in the case of the > redirectors at SLAC: > > fabrizio@bradipo 10:01:26 ~>nslookup kanolb-a.slac.stanford.edu > Server: 192.84.143.16 > Address: 192.84.143.16#53 > > Non-authoritative answer: > Name: kanolb-a.slac.stanford.edu > Address: 134.79.85.23 > Name: kanolb-a.slac.stanford.edu > Address: 134.79.85.24 > > > as you can see, both IPs are returned to the client at the same time, > and they will be considered during the connection phase. > > Fabrizio > > Pavel Jakl wrote: >> Hi Fabrizio and Andy, >> >> I am not sure if we discussed this before, but let me explain my >> problem. When Andy has implemented multiple redirectors for clusters >> bigger than 64 servers, I though that it would bring us the full >> recoverability in the case that something happened to the host acting >> as redirector. >> I did few tests and has found that client is not ready for that. >> Maybe I am wrong and doing something wrong, so let me explain it. >> We have DNS RR containing 2 servers configured as full redirectors >> and managers of the cluster. >> [The example: DNS RR - xrdstar.rcf.bnl.gov and 2 redirectors >> (rcas6132, rcas6182)] >> >> The problem is that client will initially resolve one of the >> redirectors, but if the particular host is down, the client doesn't >> try to connect to the second redirector. It even doesn't keep track >> of the servers which are available under DNS RR. I am not against >> that client will try to connect fixed number of times, but when he is >> not successful, move to other server under DNS RR. >> >> As you can see in the example, it resolves rcas6132 which was >> temporarily down, but didn't try the second one rcas6182... I am not >> sure, but how you handle this in SLAC ? What I am doing wrong ? >> >> Thanks >> Pavel >> >> CINT/ROOT C/C++ Interpreter version 5.16.13, June 8, 2006 >> Type ? for help. Commands must be C++ statements. >> Enclose multiple statements between { }. >> *** Float Point Exception is OFF *** >> *** Start at Date : Sun Apr 22 08:07:06 2007 >> QAInfo:You are using STAR_LEVEL : dev, ROOT_LEVEL : 5.12.00 and node >> : rcas6009.rcf.bnl.gov >> root4star [0] >> Processing XROOTD_macro.C... >> 070422 08:07:06 001 Xrd: Create: (C) 2004 SLAC INFN XrdClient >> kXR_ver002+kXR_asyncap >> 070422 08:07:06 001 Xrd: TakeUrl: parsing url: >> 070422 08:07:06 001 Xrd: GetDomainToMatch: >> GetHostName(rcas6009.rcf.bnl.gov) returned name=rcas6009.rcf.bnl.gov >> 070422 08:07:06 001 Xrd: GetDomainToMatch: >> GetDomain(rcas6009.rcf.bnl.gov) --> rcf.bnl.gov >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: parsing: >> root://xrdstar.rcf.bnl.gov:1097//data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root >> >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: protocol: root >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: file: >> /data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root >> >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: list of [host:port] : >> xrdstar.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: Remote file to open is >> '/data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root' >> >> 070422 08:07:06 001 Xrd: XrdClientUrlSet: parsing entity: >> xrdstar.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: TakeUrl: parsing url: xrdstar.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: TakeUrl: HostWPort: >> xrdstar.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: TakeUrl: File: / >> 070422 08:07:06 001 Xrd: TakeUrl: Host: xrdstar.rcf.bnl.gov >> 070422 08:07:06 001 Xrd: TakeUrl: Port: 1097 >> 070422 08:07:06 001 Xrd: ConvertDNSAlias: resolving >> xrdstar.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: CheckPort: specified port (1097) potentially >> valid. >> 070422 08:07:06 001 Xrd: ConvertDNSAlias: found host >> rcas6132.rcf.bnl.gov with addr 130.199.206.182 >> 070422 08:07:06 001 Xrd: ShowUrls: The converted URLs count is 1 >> 070422 08:07:06 001 Xrd: ShowUrls: URL n.1: >> root://rcas6132.rcf.bnl.gov:1097//data1/reco/productionCentral/FullField/P05ic/2004/053/st_physics_5053078_raw_3020016.MuDst.root. >> >> 070422 08:07:06 001 Xrd: GetDomainToMatch: >> GetHostName(rcas6132.rcf.bnl.gov) returned name=rcas6132.rcf.bnl.gov >> 070422 08:07:06 001 Xrd: GetDomainToMatch: >> GetDomain(rcas6132.rcf.bnl.gov) --> rcf.bnl.gov >> 070422 08:07:06 001 Xrd: CheckHostDomain: Resolved >> [rcas6132.rcf.bnl.gov]'s domain name into [rcf.bnl.gov] >> 070422 08:07:06 001 Xrd: DomainMatcher: search for 'rcf.bnl.gov' in >> '<unknown>' >> 070422 08:07:06 001 Xrd: DomainMatcher: checking domain: <unknown> >> 070422 08:07:06 001 Xrd: DomainMatcher: no domain matching >> 'rcf.bnl.gov' found in '<unknown>' >> 070422 08:07:06 001 Xrd: DomainMatcher: search for 'rcf.bnl.gov' in >> 'rcf.bnl.gov|usatlas.bnl.gov' >> 070422 08:07:06 001 Xrd: DomainMatcher: checking domain: rcf.bnl.gov >> 070422 08:07:06 001 Xrd: DomainMatcher: domain: rcf.bnl.gov matches >> 'rcf.bnl.gov' (matching chars: 11) >> 070422 08:07:06 001 Xrd: CheckHostDomain: Access granted to the >> domain of [rcas6132.rcf.bnl.gov]. >> 070422 08:07:06 001 Xrd: Open: Trying to connect to >> rcas6132.rcf.bnl.gov:1097. Connect try 1 >> 070422 08:07:06 001 Xrd: XrdClientConn: Trying to connect to >> 130.199.206.182:1097 >> 070422 08:07:06 001 Xrd: Connect: Creating a logical connection... >> 070422 08:07:06 001 Xrd: Connect: Physical connection not found. >> Creating a new one... >> 070422 08:07:06 001 Xrd: Touch: Setting last use to current >> time1177243626 >> 070422 08:07:06 001 Xrd: Connect: Connecting to >> [rcas6132.rcf.bnl.gov:1097] >> 070422 08:07:06 001 Xrd: ClientSock::TryConnect: Trying to connect >> torcas6132.rcf.bnl.gov(130.199.206.182):1097 Timeout=60 >> 070422 08:07:06 001 Xrd: ClientSock::TryConnect: Connection >> torcas6132.rcf.bnl.gov:1097 failed. (-1) >> 070422 08:07:06 001 Xrd: Connect: can't open connection to >> [rcas6132.rcf.bnl.gov:1097] >> 070422 08:07:06 001 Xrd: PhyConnection: Disconnecting socket... >> 070422 08:07:06 001 Xrd: XrdClientPhyConnection: Destroying. [:-1] >> 070422 08:07:06 001 Xrd: PhyConnection: Disconnecting socket... >> 070422 08:07:06 001 Xrd: Connect: Connect(rcas6132.rcf.bnl.gov, 1097) >> returned -1 >> 070422 08:07:06 001 Xrd: XrdNetFile: Error creating logical >> connection to rcas6132.rcf.bnl.gov:1097 >> 070422 08:07:06 001 Xrd: Open: Disconnecting. >> 070422 08:07:06 001 Xrd: Open: Connection attempt failed. Sleeping 20 >> seconds. >> >> >> >>