Print

Print


Thanks Andy!

I'm still confused about why the UNL redirects me back to UCSD for the initial 
stat (and cmsxrootd1.fnal.gov does not).

Retrying the same xrdcps now, I never get redirected back to UCSD for the actual 
open request -- but this is probably because the file can always be opened in EU 
now.

I managed to gather all configs and versions from involved redirectors, they are 
all here:
http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/

\m

On 10/20/14 22:28, Andrew Hanushevsky wrote:
>
> xrdfed09.cern.ch
> xrdfed10.cern.ch
>
> for 2:
> cmsxrootd.fnal.gov is actually two hosts:
> cmsxrootd1.fnal.gov
> xrootd.unl.edu
>
> In such cases, to handle .tried. correctly, the client must specify the cluster
> ID on the tried not the actual host that it used as that still leaves the other
> host free to be tried. So, it may look to you as if the .tried. was ignored.
> That didn.t happen, it was honored but the other path was free to be used and
> likely chosen.
>
> The whole idea of using cluster ID is good but from the client.s perspective it
> is problematic as the client needs to ask one of the servers (another
> interaction) what its cluster ID is and use that in the .tried. string. Servers
> will actually use their cluster ID when they need to tack on a tried on a static
> redirect, which it why that works the same way every time.
>
> Now, that is the case in all releases prior to 4.x. In 4.x we realized that this
> would be a problem and the cmsd resolves the cluster ID ahead of time. If two
> hosts have the same cluster ID then one is considered the primary and the other
> is considered the backup (this extends to 3 and so on). This makes it impossible
> for the .tried. with a host name to get back into a cluster when you are trying
> to ignore the cluster.
>
> While this solves the .tried. problem, it does have side-effects. It means that
> only one of n redirectors will always be used for all requests and we won.t
> switch to another one unless that one fails. In many ways that.s good because it
> dramatically cuts down on duplicate queries. The alternative would be to resolve
> a host name to the group of hosts that are actually the .same. and automatically
> excluded the group. However, that would increase the duplicate queries and we
> made the trade-off that duplicate queries were worse than always using the same
> redirector until it failed.
>
> Andy
>
> On Mon, 20 Oct 2014, Matevz Tadel wrote:
>
>> Hi Andy,
>>
>> I guess this scrolled off the context window :) Do these logs help? Any ideas
>> what I should still try?
>>
>> I can try getting config files from all cms meta managers ... I guess this
>> would be come handy in any case :)
>>
>> Cheers,
>> Matevz
>>
>> On 10/10/14 14:20, Matevz Tadel wrote:
>>> I ran the same xrdcp to UNL and FNAL, 3 times each, all within a span of a
>>> couple minutes [1]. Here are the logs (.txt) and results of grep -e kXR_stat
>>> -e kXR_open -e kXR_redirect (.grep):
>>>
>>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/
>>>
>>> Observations:
>>>
>>> The initial stat has two modes:
>>>     1. it fails in fnal-1 and fnal-3;
>>>     2. it is redirected back to UCSD for fnal-2 and all unls.
>>> I find it really strange fnal-2 is different than 1 and 3 in this respect.
>>>
>>> For 1, redirection then goes -> cms-xrd-global.cern.ch ->
>>> xrootd-redic.pi.infn.it -> madhatter.csc.fi -> server where file is opened ok.
>>>
>>> For 2 redirection to xrootd-redic.pi.infn.it doesn't happen and we get
>>> redirected back to cmsxrootd1.fnal.gov (for both fnal-2 and all unls) which
>>> then sends us to UCSD where we open the file -- but this is the place we were
>>> not supposed to come back to.
>>>
>>> I assume the real question is why I get redirected back to
>>> cmsxrootd1.fnal.gov (despite tried=). Another thing ... why don't I get sent
>>> to pisa on other con3ection attempts?
>>>
>>> Could it be that cms-xrd-global.cern.ch has:
>>> a) too short timeouts (but it should be in the cache!);
>>> b) wrong address for the US peer metamanager (cmsxrootd1.fnal.gov instead of
>>> the DNS alias cmsxrootd.fnal.gov+)?
>>>
>>> Matevz
>>>
>>>
>>> [1] The commands that were run:
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-unl-1.txt 2>&1
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-fnal-1.txt 2>&1
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-unl-2.txt 2>&1
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-fnal-2.txt 2>&1
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-unl-3.txt 2>&1
>>>
>>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force
>>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu'
>>> /dev/null > ~/buf/xrdcp-tried-fnal-3.txt 2>&1
>>>
>>>
>>>
>>> On 10/03/14 13:01, Andrew Hanushevsky wrote:
>>>> On Fri, 3 Oct 2014, Matevz Tadel wrote:
>>>>
>>>>>> Could you clean up the log and follow through with all of the redirections?
>>>>>
>>>>> You want me to run with debug 3 and only grep out redirection and stat/open
>>>>> messages?
>>>> Yes, that would give us the request and the response only.
>>>>
>>>>>> I still think the client version you are using may be dropping the tried
>>>>>> history.
>>>>> OK, I will take the head of master next time, I had 4.0.x-stable now (but
>>>>> maybe forgot to pull in latest changes).
>>>> You could try that but according t Lukasz, that should not happen in the new
>>>> client.
>>>>
>>>>> What I've noticed:
>>>>> 1. If I go to UNL redirector it will send me back to UCSD (v4.0.2).
>>>>> 2. If I go to FNAL one, it sends me off to EU, as it should (v3.3.3).
>>>>> 3. If I use the DNS alias for both of those, one of the two happens,
>>>>> obviously.
>>>> Odd, there shouldn't be a diference between versions here. Then again, from the
>>>> above you aren't doing exactly the same thing. If you go to UNL what the
>>>> difference between V4 and V3, if any? Same question for FNAL.
>>>>
>>>>> Is it possible UNL has the file in cache and tried= gets ignored in this case?
>>>> Nope, the tried gets processed before the cache is inspected. So, even if the
>>>> location has been cached, it is ignored. Now the big difference between V3 and
>>>> V4 is that if your cluster has two replicated redierctors subscribing to a
>>>> manager, V3 would treat both as separate entities. In V4, it picks one of he
>>>> two
>>>> and only uses that one while the other is held as a hot backup. So, if the one
>>>> fails it will automatically switch to the other one.
>>>>
>>>> Andy
>>>>
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1