Thanks Andy! I'm still confused about why the UNL redirects me back to UCSD for the initial stat (and cmsxrootd1.fnal.gov does not). Retrying the same xrdcps now, I never get redirected back to UCSD for the actual open request -- but this is probably because the file can always be opened in EU now. I managed to gather all configs and versions from involved redirectors, they are all here: http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/ \m On 10/20/14 22:28, Andrew Hanushevsky wrote: > > xrdfed09.cern.ch > xrdfed10.cern.ch > > for 2: > cmsxrootd.fnal.gov is actually two hosts: > cmsxrootd1.fnal.gov > xrootd.unl.edu > > In such cases, to handle .tried. correctly, the client must specify the cluster > ID on the tried not the actual host that it used as that still leaves the other > host free to be tried. So, it may look to you as if the .tried. was ignored. > That didn.t happen, it was honored but the other path was free to be used and > likely chosen. > > The whole idea of using cluster ID is good but from the client.s perspective it > is problematic as the client needs to ask one of the servers (another > interaction) what its cluster ID is and use that in the .tried. string. Servers > will actually use their cluster ID when they need to tack on a tried on a static > redirect, which it why that works the same way every time. > > Now, that is the case in all releases prior to 4.x. In 4.x we realized that this > would be a problem and the cmsd resolves the cluster ID ahead of time. If two > hosts have the same cluster ID then one is considered the primary and the other > is considered the backup (this extends to 3 and so on). This makes it impossible > for the .tried. with a host name to get back into a cluster when you are trying > to ignore the cluster. > > While this solves the .tried. problem, it does have side-effects. It means that > only one of n redirectors will always be used for all requests and we won.t > switch to another one unless that one fails. In many ways that.s good because it > dramatically cuts down on duplicate queries. The alternative would be to resolve > a host name to the group of hosts that are actually the .same. and automatically > excluded the group. However, that would increase the duplicate queries and we > made the trade-off that duplicate queries were worse than always using the same > redirector until it failed. > > Andy > > On Mon, 20 Oct 2014, Matevz Tadel wrote: > >> Hi Andy, >> >> I guess this scrolled off the context window :) Do these logs help? Any ideas >> what I should still try? >> >> I can try getting config files from all cms meta managers ... I guess this >> would be come handy in any case :) >> >> Cheers, >> Matevz >> >> On 10/10/14 14:20, Matevz Tadel wrote: >>> I ran the same xrdcp to UNL and FNAL, 3 times each, all within a span of a >>> couple minutes [1]. Here are the logs (.txt) and results of grep -e kXR_stat >>> -e kXR_open -e kXR_redirect (.grep): >>> >>> http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/ >>> >>> Observations: >>> >>> The initial stat has two modes: >>> 1. it fails in fnal-1 and fnal-3; >>> 2. it is redirected back to UCSD for fnal-2 and all unls. >>> I find it really strange fnal-2 is different than 1 and 3 in this respect. >>> >>> For 1, redirection then goes -> cms-xrd-global.cern.ch -> >>> xrootd-redic.pi.infn.it -> madhatter.csc.fi -> server where file is opened ok. >>> >>> For 2 redirection to xrootd-redic.pi.infn.it doesn't happen and we get >>> redirected back to cmsxrootd1.fnal.gov (for both fnal-2 and all unls) which >>> then sends us to UCSD where we open the file -- but this is the place we were >>> not supposed to come back to. >>> >>> I assume the real question is why I get redirected back to >>> cmsxrootd1.fnal.gov (despite tried=). Another thing ... why don't I get sent >>> to pisa on other con3ection attempts? >>> >>> Could it be that cms-xrd-global.cern.ch has: >>> a) too short timeouts (but it should be in the cache!); >>> b) wrong address for the US peer metamanager (cmsxrootd1.fnal.gov instead of >>> the DNS alias cmsxrootd.fnal.gov+)? >>> >>> Matevz >>> >>> >>> [1] The commands that were run: >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-unl-1.txt 2>&1 >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-fnal-1.txt 2>&1 >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-unl-2.txt 2>&1 >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-fnal-2.txt 2>&1 >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-unl-3.txt 2>&1 >>> >>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force >>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' >>> /dev/null > ~/buf/xrdcp-tried-fnal-3.txt 2>&1 >>> >>> >>> >>> On 10/03/14 13:01, Andrew Hanushevsky wrote: >>>> On Fri, 3 Oct 2014, Matevz Tadel wrote: >>>> >>>>>> Could you clean up the log and follow through with all of the redirections? >>>>> >>>>> You want me to run with debug 3 and only grep out redirection and stat/open >>>>> messages? >>>> Yes, that would give us the request and the response only. >>>> >>>>>> I still think the client version you are using may be dropping the tried >>>>>> history. >>>>> OK, I will take the head of master next time, I had 4.0.x-stable now (but >>>>> maybe forgot to pull in latest changes). >>>> You could try that but according t Lukasz, that should not happen in the new >>>> client. >>>> >>>>> What I've noticed: >>>>> 1. If I go to UNL redirector it will send me back to UCSD (v4.0.2). >>>>> 2. If I go to FNAL one, it sends me off to EU, as it should (v3.3.3). >>>>> 3. If I use the DNS alias for both of those, one of the two happens, >>>>> obviously. >>>> Odd, there shouldn't be a diference between versions here. Then again, from the >>>> above you aren't doing exactly the same thing. If you go to UNL what the >>>> difference between V4 and V3, if any? Same question for FNAL. >>>> >>>>> Is it possible UNL has the file in cache and tried= gets ignored in this case? >>>> Nope, the tried gets processed before the cache is inspected. So, even if the >>>> location has been cached, it is ignored. Now the big difference between V3 and >>>> V4 is that if your cluster has two replicated redierctors subscribing to a >>>> manager, V3 would treat both as separate entities. In V4, it picks one of he >>>> two >>>> and only uses that one while the other is held as a hot backup. So, if the one >>>> fails it will automatically switch to the other one. >>>> >>>> Andy >>>> >>>> ######################################################################## >>>> Use REPLY-ALL to reply to list >>>> >>>> To unsubscribe from the XROOTD-DEV list, click the following link: >>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>> >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-DEV list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1