Andy's reply ... -------- Forwarded Message -------- Subject: Re: tried= ignored on redirection Date: Thu, 23 Oct 2014 17:16:49 -0700 From: Andrew Hanushevsky <[log in to unmask]> Organization: Stanford University/SLAC To: Matevz Tadel <[log in to unmask]> Matevz, I am not surprised you get back to UCSD. There are two path to UCSD and the tried only identifies one of them in a particular circuit. So, you are almost guaranteed to trip over UCSD again unless the file is found elsewhere. Andy -----Original Message----- From: Matevz Tadel Sent: Thursday, October 23, 2014 4:26 PM To: Andrew Hanushevsky Cc: Lukasz Janyst ; xrootd-dev ; Jeff Dost Subject: Re: tried= ignored on redirection On 10/23/14 16:22, Matevz Tadel wrote: > Thanks Andy! > > I'm still confused about why the UNL redirects me back to UCSD for the > initial > stat (and cmsxrootd1.fnal.gov does not). > > Retrying the same xrdcps now, I never get redirected back to UCSD for the > actual > open request -- but this is probably because the file can always be opened > in EU > now. Yes, indeed ... if I take a file that is only at UCSD I still get back to UCSD despite the initial tried. Shouldn't open fail then? Did we manage to criss-cross the redirectors, meta-managers and meta-meta managers beyond failure? :) Matevz > I managed to gather all configs and versions from involved redirectors, > they are > all here: > http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/ > > \m > > On 10/20/14 22:28, Andrew Hanushevsky wrote: >> >> xrdfed09.cern.ch >> xrdfed10.cern.ch >> >> for 2: >> cmsxrootd.fnal.gov is actually two hosts: >> cmsxrootd1.fnal.gov >> xrootd.unl.edu >> >> In such cases, to handle .tried. correctly, the client must specify the >> cluster >> ID on the tried not the actual host that it used as that still leaves the >> other >> host free to be tried. So, it may look to you as if the .tried. was >> ignored. >> That didn.t happen, it was honored but the other path was free to be used >> and >> likely chosen. >> >> The whole idea of using cluster ID is good but from the client.s >> perspective it >> is problematic as the client needs to ask one of the servers (another >> interaction) what its cluster ID is and use that in the .tried. string. >> Servers >> will actually use their cluster ID when they need to tack on a tried on a >> static >> redirect, which it why that works the same way every time. >> >> Now, that is the case in all releases prior to 4.x. In 4.x we realized >> that this >> would be a problem and the cmsd resolves the cluster ID ahead of time. If >> two >> hosts have the same cluster ID then one is considered the primary and the >> other >> is considered the backup (this extends to 3 and so on). This makes it >> impossible >> for the .tried. with a host name to get back into a cluster when you are >> trying >> to ignore the cluster. >> >> While this solves the .tried. problem, it does have side-effects. It >> means that >> only one of n redirectors will always be used for all requests and we >> won.t >> switch to another one unless that one fails. In many ways that.s good >> because it >> dramatically cuts down on duplicate queries. The alternative would be to >> resolve >> a host name to the group of hosts that are actually the .same. and >> automatically >> excluded the group. However, that would increase the duplicate queries >> and we >> made the trade-off that duplicate queries were worse than always using >> the same >> redirector until it failed. >> >> Andy >> >> On Mon, 20 Oct 2014, Matevz Tadel wrote: >> >>> Hi Andy, >>> >>> I guess this scrolled off the context window :) Do these logs help? Any >>> ideas >>> what I should still try? >>> >>> I can try getting config files from all cms meta managers ... I guess >>> this >>> would be come handy in any case :) >>> >>> Cheers, >>> Matevz >>> >>> On 10/10/14 14:20, Matevz Tadel wrote: >>>> I ran the same xrdcp to UNL and FNAL, 3 times each, all within a span >>>> of a >>>> couple minutes [1]. Here are the logs (.txt) and results of grep -e >>>> kXR_stat >>>> -e kXR_open -e kXR_redirect (.grep): >>>> >>>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/ >>>> >>>> Observations: >>>> >>>> The initial stat has two modes: >>>> 1. it fails in fnal-1 and fnal-3; >>>> 2. it is redirected back to UCSD for fnal-2 and all unls. >>>> I find it really strange fnal-2 is different than 1 and 3 in this >>>> respect. >>>> >>>> For 1, redirection then goes -> cms-xrd-global.cern.ch -> >>>> xrootd-redic.pi.infn.it -> madhatter.csc.fi -> server where file is >>>> opened ok. >>>> >>>> For 2 redirection to xrootd-redic.pi.infn.it doesn't happen and we get >>>> redirected back to cmsxrootd1.fnal.gov (for both fnal-2 and all unls) >>>> which >>>> then sends us to UCSD where we open the file -- but this is the place >>>> we were >>>> not supposed to come back to. >>>> >>>> I assume the real question is why I get redirected back to >>>> cmsxrootd1.fnal.gov (despite tried=). Another thing ... why don't I get >>>> sent >>>> to pisa on other con3ection attempts? >>>> >>>> Could it be that cms-xrd-global.cern.ch has: >>>> a) too short timeouts (but it should be in the cache!); >>>> b) wrong address for the US peer metamanager (cmsxrootd1.fnal.gov >>>> instead of >>>> the DNS alias cmsxrootd.fnal.gov+)? >>>> >>>> Matevz >>>> >>>> >>>> [1] The commands that were run: >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-unl-1.txt 2>&1 >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-fnal-1.txt 2>&1 >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-unl-2.txt 2>&1 >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-fnal-2.txt 2>&1 >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-unl-3.txt 2>&1 >>>> >>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>>> >>>> /dev/null > ~/buf/xrdcp-tried-fnal-3.txt 2>&1 >>>> >>>> >>>> >>>> On 10/03/14 13:01, Andrew Hanushevsky wrote: >>>>> On Fri, 3 Oct 2014, Matevz Tadel wrote: >>>>> >>>>>>> Could you clean up the log and follow through with all of the >>>>>>> redirections? >>>>>> >>>>>> You want me to run with debug 3 and only grep out redirection and >>>>>> stat/open >>>>>> messages? >>>>> Yes, that would give us the request and the response only. >>>>> >>>>>>> I still think the client version you are using may be dropping the >>>>>>> tried >>>>>>> history. >>>>>> OK, I will take the head of master next time, I had 4.0.x-stable now >>>>>> (but >>>>>> maybe forgot to pull in latest changes). >>>>> You could try that but according t Lukasz, that should not happen in >>>>> the new >>>>> client. >>>>> >>>>>> What I've noticed: >>>>>> 1. If I go to UNL redirector it will send me back to UCSD (v4.0.2). >>>>>> 2. If I go to FNAL one, it sends me off to EU, as it should (v3.3.3). >>>>>> 3. If I use the DNS alias for both of those, one of the two happens, >>>>>> obviously. >>>>> Odd, there shouldn't be a diference between versions here. Then again, >>>>> from >>>>> the >>>>> above you aren't doing exactly the same thing. If you go to UNL what >>>>> the >>>>> difference between V4 and V3, if any? Same question for FNAL. >>>>> >>>>>> Is it possible UNL has the file in cache and tried= gets ignored in >>>>>> this >>>>>> case? >>>>> Nope, the tried gets processed before the cache is inspected. So, even >>>>> if the >>>>> location has been cached, it is ignored. Now the big difference >>>>> between V3 and >>>>> V4 is that if your cluster has two replicated redierctors subscribing >>>>> to a >>>>> manager, V3 would treat both as separate entities. In V4, it picks one >>>>> of he >>>>> two >>>>> and only uses that one while the other is held as a hot backup. So, if >>>>> the one >>>>> fails it will automatically switch to the other one. >>>>> >>>>> Andy >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-DEV list, click the following link: >>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>>> >>>> ######################################################################## >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-DEV list, click the following link: >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>> > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-DEV list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1