Ciao Andrew, thanks a lot. What worries me is the test I did with TFile::Open which seems to be broken as well. Since in CMSSW we use that, in the real use case which counts I am afraid we have a problem. Could you try that? Unfortunately now I add to restart the redirector, since I got during the night MANY failures, since a couple of sites are already using the redundant setup tom On Thu, Apr 17, 2014 at 7:09 AM, Andrew Hanushevsky <[log in to unmask]>wrote: > Hi Tommaso, > > I have check the new client and it works as expected. I couldn't copy the > file because I am not authroized but it did find a working redirector. > Oddly enough, it also found things in Germany and the like. I assume you > redirect upstream? If not, then I need to check with Lukasz how he managed > to get to your global redirector. > > I agree that the old client is broken. I don't know when that regression > happened as it worked the last time I checked this. The question is > whether we should fix the old client since with release 4 the old client is > deprecated and no one would actually use it even if it got fixed. R4 is due > out in the middle of May. > > Andy > > > > On Thu, 17 Apr 2014, Tommaso Boccali wrote: > > ciao Andrew! >> I think you do not need at this level, because by choice we have left the >> redirectors w/o authentication. >> >> So if you look to a command like >> >> xrdcp -d 10 root:// >> xrootd-cms.infn.it//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root. >> >>> &log >>> >> >> (after a voms-proxy-destroy in my case) I still see the usual fixed order >> >> -bash-3.2$ grep ShowUrls log >> 140417 01:12:15 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:12:15 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:12:15 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> 140417 01:12:15 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:12:15 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:12:15 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> 140417 01:12:20 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:12:20 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:12:20 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> ... >> >> eventually is this would not fail, then you would get an error when trying >> to access the real file, but at least in my case I "die" before. >> >> Of course for this to make sense I need to leave off one of the >> redirectors >> (xrootd-redic.pi.infn.it). Also, you can test the same behavior with >> >> xrdcp -d 10 root://xrootd-redic.pi.infn.it, >> xrootd.ba.infn.it//store/data/Run2013A/MinimumBias/RECO/ >> PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211-BE02-003048D2BC62.root >> . >> >>> & log1 & >>> >> >> again, I get >> >> -bash-3.2$ grep ShowUrls log1 >> 140417 01:15:59 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:15:59 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:15:59 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> 140417 01:15:59 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:15:59 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:15:59 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> 140417 01:16:04 001 Xrd: ShowUrls: The converted URLs count is 2 >> 140417 01:16:04 001 Xrd: ShowUrls: URL n.1: root:// >> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >> C67B-E211-BE02-003048D2BC62.root >> . >> 140417 01:16:04 001 Xrd: ShowUrls: URL n.2: root:// >> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >> BE02-003048D2BC62.root >> . >> ... >> >> >> I try and leave the redirector OFF for the night, if you want to try. I >> hope I will not get big side effects :( >> >> tom >> >> >> >> >> >> On Thu, Apr 17, 2014 at 12:33 AM, Andrew Hanushevsky <[log in to unmask] >> >wrote: >> >> Hi Tommaso, >>> >>> So I need a certificate to reproduce your test from here? I also can >>> supply you with access to xrdcopy if you happen to have AFS installed. >>> >>> Andy >>> >>> *From:* Tommaso Boccali <[log in to unmask]> >>> *Sent:* Tuesday, April 15, 2014 10:02 PM >>> *To:* Andrew Hanushevsky <[log in to unmask]> >>> *Cc:* [log in to unmask] >>> *Subject:* Re: problem with aliased redirectors >>> >>> >>> ciao Andrew! >>> I have problems checking with xrdcopy, since that is not distributed with >>> CMS software, I have to find a way. For the moment, another hint >>> something >>> is not ok in the randomization in xrdcp: >>> >>> I tried (with xrootd.ba.infn.it ON and xrootd-redic.pi.infn.it OFF) >>> >>> xrdcp -d 10 root://xrootd-redic.pi.infn.it,xrootd.ba.infn.it >>> //store/data/Run2013A/MinimumBias/RECO/PromptReco- >>> v1/000/212/188/00000/6C246B92-C67B-E211-BE02-003048D2BC62.root< >>> http://xrootd-cms.infn.it//store/data/Run2013A/ >>> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >>> C67B-E211-BE02-003048D2BC62.root>. >>> >>> >>> so putting explicitly the list of servers in the command line. >>> So, this always fails (xrootd-redic.pi.infn.it is always tried, 8 times, >>> and the other never reached). >>> >>> Instead >>> >>> xrdcp -d 10 root://xrootd.ba.infn.it,xrootd-redic.pi.infn.it >>> //store/data/Run2013A/MinimumBias/RECO/PromptReco- >>> v1/000/212/188/00000/6C246B92-C67B-E211-BE02-003048D2BC62.root< >>> http://xrootd-cms.infn.it//store/data/Run2013A/ >>> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >>> C67B-E211-BE02-003048D2BC62.root>. >>> >>> >>> always works at the first attempt. >>> >>> In any case, I think we basically care about the behavior of >>> TFile::Open() >>> from our SW, not direct copy commands >>> >>> >>> This for example should not fail: >>> >>> root [5] TFile* ii = TFile::Open("root://xrootd-redic.pi.infn.it, >>> xrootd.ba.infn.it//store/data/Run2013A/MinimumBias/RECO/ >>> PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211-BE02- >>> 003048D2BC62.root >>> ") >>> >>> 140416 06:58:05 001 Xrd: Connect: can't open connection to [ >>> xrootd-redic.pi.infn.it:1094] >>> 140416 06:58:05 001 Xrd: XrdNetFile: Error creating logical connection to >>> xrootd-redic.pi.infn.it:1094 >>> Error in <TXNetFile::CreateXClient>: open attempt failed on root:// >>> xrootd-redic.pi.infn.it, >>> xrootd.ba.infn.it//store/data/Run2013A/MinimumBias/RECO/ >>> PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211-BE02- >>> 003048D2BC62.root >>> >>> (does not seem to give a second try to the other server) >>> >>> and this seems even worse: >>> >>> root [7] TFile* ii = TFile::Open("root:// >>> xrootd-cms.infn.it//store/data/Run2013A/MinimumBias/ >>> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >>> BE02-003048D2BC62.root >>> ") >>> >>> 140416 06:59:11 001 Xrd: Connect: can't open connection to [ >>> xrootd-redic.pi.infn.it:1094] >>> 140416 06:59:11 001 Xrd: XrdNetFile: Error creating logical connection to >>> xrootd-redic.pi.infn.it:1094 >>> Error in <TXNetFile::CreateXClient>: open attempt failed on root:// >>> xrootd-cms.infn.it//store/data/Run2013A/MinimumBias/ >>> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >>> BE02-003048D2BC62.root >>> >>> so not even a second attempt is tried .... >>> >>> this instead works >>> >>> root [1] TFile* ii = TFile::Open("root://xrootd.ba.infn.it, >>> xrootd-redic.pi.infn.it//store/data/Run2013A/ >>> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >>> C67B-E211-BE02-003048D2BC62.root >>> ") >>> >>> 140416 07:01:18 001 Xrd: GoToAnotherServer: Going to: >>> t2-cms-xrootd01.desy.de:1094 >>> 140416 07:01:18 001 Xrd: GoToAnotherServer: Going to: >>> dcache-cms-xrootd.desy.de:1094 >>> 140416 07:01:18 001 Xrd: GoToAnotherServer: Going to: >>> 131.169.191.230:20982 >>> >>> tommaso >>> >>> >>> >>> >>> On Tue, Apr 15, 2014 at 11:08 PM, Andrew Hanushevsky <[log in to unmask] >>> >wrote: >>> >>> Hi Tommaso, >>>> >>>> DNS round-robin, while it looks good in small scale tests, rarely works >>>> all that well. The reason is that DNS round-robins whenever a look-up is >>>> made regardless of the reason for the lookup. With a of clients that may >>>> very well lead to suboptimal ordering. So, the xrootd client gets all of >>>> the addresses and uses an algorithm that better spreads the access. >>>> >>>> As for why xrdcp didnÿÿt go after the seconds entry is mysterious but I >>>> would say itÿÿs a bug. Could you try the same test again but use >>>> xrdcopy? >>>> Thatÿÿs the new version of the client. >>>> >>>> Andy >>>> >>>> *From:* Tommaso Boccali <[log in to unmask]> >>>> *Sent:* Tuesday, April 15, 2014 3:48 AM >>>> *To:* [log in to unmask] >>>> *Subject:* Re: problem with aliased redirectors >>>> >>>> >>>> as additional info, the DNS seems to do well its RR job: from the same >>>> machine >>>> >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> >>>> So each time the order returned is random, in case xrootd would need to >>>> depend on this >>>> >>>> BUT: inside xrdcp log, the order seems always to be the same (*) >>>> >>>> is some caching done inside xrdcp killing the RR? >>>> >>>> tom >>>> >>>> *: >>>> >>>> -bash-3.2$ grep DNS log >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:21 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:26 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:26 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:26 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:31 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:31 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:31 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:36 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:36 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:36 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:41 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:41 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:41 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:46 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:46 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:46 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:51 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:51 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:51 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:32:56 001 Xrd: ConvertDNSAlias: resolving xrootd-cms.infn.it >>>> 140415 12:32:56 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:32:56 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> >>>> On 15 Apr 2014, at 12:33, Tommaso Boccali <[log in to unmask]> >>>> wrote: >>>> >>>> Ciao, >>>> as from a previous discussion, we have setup an aliased DNS xrootd >>>> redirector, >>>> >>>> which is >>>> >>>> -bash-3.2$ host xrootd-cms.infn.it >>>> xrootd-cms.infn.it has address 90.147.66.75 >>>> xrootd-cms.infn.it has address 193.205.76.83 >>>> >>>> I was playing with some crash tests, and I do not get the result. >>>> >>>> So: I switched off the redirector 193.205.76.83, while keeping it into >>>> the alias, and I issued a >>>> >>>> xrdcp -d 10 root:// >>>> xrootd-cms.infn.it//store/data/Run2013A/MinimumBias/ >>>> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >>>> BE02-003048D2BC62.root. >>>> >>>> I was assuming that the client would have recognized the alias, and >>>> eventually tried a second host if the first was not available. >>>> >>>> In the log ( https://www.dropbox.com/s/zmp9uyreqm4qwhg/xrootd.log ) >>>> I see eventually the client recognizes the situation: >>>> >>>> >>>> 140415 12:25:11 001 Xrd: ConvertDNSAlias: found host >>>> xrootd-redic.pi.infn.it with addr 193.205.76.83 >>>> 140415 12:25:11 001 Xrd: ConvertDNSAlias: found host >>>> xrootd.ba.infn.itwith addr 90.147.66.75 >>>> >>>> 140415 12:25:11 001 Xrd: ShowUrls: The converted URLs count is 2 >>>> 140415 12:25:11 001 Xrd: ShowUrls: URL n.1: root:// >>>> xrootd-redic.pi.infn.it:1094//store/data/Run2013A/ >>>> MinimumBias/RECO/PromptReco-v1/000/212/188/00000/6C246B92- >>>> C67B-E211-BE02-003048D2BC62.root >>>> . >>>> 140415 12:25:11 001 Xrd: ShowUrls: URL n.2: root:// >>>> xrootd.ba.infn.it:1094//store/data/Run2013A/MinimumBias/ >>>> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >>>> BE02-003048D2BC62.root >>>> . >>>> >>>> but then >>>> >>>> 140415 12:25:46 001 Xrd: Open: Trying to connect to >>>> xrootd-redic.pi.infn.it:1094. Connect try 8 >>>> 140415 12:25:46 001 Xrd: XrdClientConn: Trying to connect to >>>> 193.205.76.83:1094 >>>> 140415 12:25:46 001 Xrd: Connect: Creating a logical connection... >>>> 140415 12:25:46 001 Xrd: Connect: Physical connection not found. >>>> Creating >>>> a new one... >>>> 140415 12:25:46 001 Xrd: Connect: Connecting to [ >>>> xrootd-redic.pi.infn.it:1094] >>>> 140415 12:25:46 001 Xrd: ClientSock::TryConnect_low: Trying to connect >>>> to >>>> xrootd-redic.pi.infn.it(193.205.76.83):1094 Windowsize=0 Timeout=120 >>>> 140415 12:25:46 001 Xrd: ClientSock::TryConnect_low: Connection >>>> toxrootd-redic.pi.infn.it:1094 failed. (-1) >>>> 140415 12:25:46 001 Xrd: Connect: can't open connection to [ >>>> xrootd-redic.pi.infn.it:1094] >>>> 140415 12:25:46 001 Xrd: PhyConnection: Disconnecting socket... >>>> 140415 12:25:46 001 Xrd: Connect: Connect(xrootd-redic.pi.infn.it, >>>> 1094) >>>> returned -1 >>>> 140415 12:25:46 001 Xrd: XrdNetFile: Error creating logical connection >>>> to >>>> xrootd-redic.pi.infn.it:1094 >>>> 140415 12:25:46 001 Xrd: Open: Disconnecting. >>>> 140415 12:25:46 001 Xrd: Cache: Cache Status -------------------------- >>>> 140415 12:25:46 001 Xrd: Cache: -------------------------------------- >>>> fTotalByteCount = 0 >>>> Last server error 10000 ('') >>>> Error accessing path/file for root:// >>>> xrootd-cms.infn.it//store/data/Run2013A/MinimumBias/ >>>> RECO/PromptReco-v1/000/212/188/00000/6C246B92-C67B-E211- >>>> BE02-003048D2BC62.root >>>> >>>> so no attempt is done on the other. What is wrong here? all in all it >>>> tries 8 times to connect to the SAME server, and 0 times to the other >>>> ... >>>> >>>> >>>> thanks a lot >>>> >>>> tom >>>> >>>> -- >>>> Tommaso Boccali >>>> INFN Pisa >>>> >>>> ------------------------------ >>>> >>>> >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-L list, click the following link: >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-L list, click the following link: >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>>> >>>> >>> >>> >>> -- >>> Tommaso Boccali >>> INFN Pisa >>> >>> ------------------------------ >>> >>> >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-L list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >>> >>> >> >> >> -- >> Tommaso Boccali >> INFN Pisa >> > -- Tommaso Boccali INFN Pisa ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1