Print

Print


xrdfed09.cern.ch
xrdfed10.cern.ch

for 2:
cmsxrootd.fnal.gov is actually two hosts:
cmsxrootd1.fnal.gov
xrootd.unl.edu

In such cases, to handle .tried. correctly, the client must specify the 
cluster ID on the tried not the actual host that it used as that still 
leaves the other host free to be tried. So, it may look to you as if the 
.tried. was ignored. That didn.t happen, it was honored but the other path 
was free to be used and likely chosen.

The whole idea of using cluster ID is good but from the client.s 
perspective it is problematic as the client needs to ask one of the 
servers (another interaction) what its cluster ID is and use that in the 
.tried. string. Servers will actually use their cluster ID when they need 
to tack on a tried on a static redirect, which it why that works the same 
way every time.

Now, that is the case in all releases prior to 4.x. In 4.x we realized 
that this would be a problem and the cmsd resolves the cluster ID ahead of 
time. If two hosts have the same cluster ID then one is considered the 
primary and the other is considered the backup (this extends to 3 and so 
on). This makes it impossible for the .tried. with a host name to get back 
into a cluster when you are trying to ignore the cluster.

While this solves the .tried. problem, it does have side-effects. It means 
that only one of n redirectors will always be used for all requests and we 
won.t switch to another one unless that one fails. In many ways that.s 
good because it dramatically cuts down on duplicate queries. The 
alternative would be to resolve a host name to the group of hosts that are 
actually the .same. and automatically excluded the group. However, that 
would increase the duplicate queries and we made the trade-off that 
duplicate queries were worse than always using the same redirector until 
it failed.

Andy

On Mon, 20 Oct 2014, Matevz Tadel wrote:

> Hi Andy,
>
> I guess this scrolled off the context window :) Do these logs help? Any ideas 
> what I should still try?
>
> I can try getting config files from all cms meta managers ... I guess this 
> would be come handy in any case :)
>
> Cheers,
> Matevz
>
> On 10/10/14 14:20, Matevz Tadel wrote:
>> I ran the same xrdcp to UNL and FNAL, 3 times each, all within a span of a 
>> couple minutes [1]. Here are the logs (.txt) and results of grep -e 
>> kXR_stat -e kXR_open -e kXR_redirect (.grep):
>>
>>    http://uaf-2.t2.ucsd.edu/~matevz/tmp/xrd-tried/
>> 
>> Observations:
>> 
>> The initial stat has two modes:
>>     1. it fails in fnal-1 and fnal-3;
>>     2. it is redirected back to UCSD for fnal-2 and all unls.
>> I find it really strange fnal-2 is different than 1 and 3 in this respect.
>> 
>> For 1, redirection then goes -> cms-xrd-global.cern.ch -> 
>> xrootd-redic.pi.infn.it -> madhatter.csc.fi -> server where file is opened 
>> ok.
>> 
>> For 2 redirection to xrootd-redic.pi.infn.it doesn't happen and we get 
>> redirected back to cmsxrootd1.fnal.gov (for both fnal-2 and all unls) which 
>> then sends us to UCSD where we open the file -- but this is the place we 
>> were not supposed to come back to.
>> 
>> I assume the real question is why I get redirected back to 
>> cmsxrootd1.fnal.gov (despite tried=). Another thing ... why don't I get 
>> sent to pisa on other con3ection attempts?
>> 
>> Could it be that cms-xrd-global.cern.ch has:
>> a) too short timeouts (but it should be in the cache!);
>> b) wrong address for the US peer metamanager (cmsxrootd1.fnal.gov instead 
>> of the DNS alias cmsxrootd.fnal.gov+)?
>> 
>> Matevz
>> 
>> 
>> [1] The commands that were run:
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-unl-1.txt 2>&1
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-fnal-1.txt 2>&1
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-unl-2.txt 2>&1
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-fnal-2.txt 2>&1
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://xrootd.unl.edu:1094//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-unl-3.txt 2>&1
>> 
>> XRD_NETWORKSTACK=IPv4 xrdcp --debug 3 --force 
>> 'root://cmsxrootd.fnal.gov//store/mc/Summer12_DR53X/DYJetsToLL_M-10To50_TuneZ2Star_8TeV-madgraph/AODSIM/PU_S10_START53_V7A-v1/00000/064C50C4-DA1B-E211-BA43-848F69FD289B.root?hdfs_block_size=134217728&tried=xrootd.t2.ucsd.edu' 
>> /dev/null > ~/buf/xrdcp-tried-fnal-3.txt 2>&1
>> 
>> 
>> 
>> On 10/03/14 13:01, Andrew Hanushevsky wrote:
>>> On Fri, 3 Oct 2014, Matevz Tadel wrote:
>>> 
>>>>> Could you clean up the log and follow through with all of the 
>>>>> redirections?
>>>> 
>>>> You want me to run with debug 3 and only grep out redirection and 
>>>> stat/open
>>>> messages?
>>> Yes, that would give us the request and the response only.
>>> 
>>>>> I still think the client version you are using may be dropping the tried
>>>>> history.
>>>> OK, I will take the head of master next time, I had 4.0.x-stable now (but
>>>> maybe forgot to pull in latest changes).
>>> You could try that but according t Lukasz, that should not happen in the 
>>> new
>>> client.
>>> 
>>>> What I've noticed:
>>>> 1. If I go to UNL redirector it will send me back to UCSD (v4.0.2).
>>>> 2. If I go to FNAL one, it sends me off to EU, as it should (v3.3.3).
>>>> 3. If I use the DNS alias for both of those, one of the two happens, 
>>>> obviously.
>>> Odd, there shouldn't be a diference between versions here. Then again, 
>>> from the
>>> above you aren't doing exactly the same thing. If you go to UNL what the
>>> difference between V4 and V3, if any? Same question for FNAL.
>>> 
>>>> Is it possible UNL has the file in cache and tried= gets ignored in this 
>>>> case?
>>> Nope, the tried gets processed before the cache is inspected. So, even if 
>>> the
>>> location has been cached, it is ignored. Now the big difference between V3 
>>> and
>>> V4 is that if your cluster has two replicated redierctors subscribing to a
>>> manager, V3 would treat both as separate entities. In V4, it picks one of 
>>> he two
>>> and only uses that one while the other is held as a hot backup. So, if the 
>>> one
>>> fails it will automatically switch to the other one.
>>> 
>>> Andy
>>> 
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>> 
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>> 
>> ########################################################################
>> Use REPLY-ALL to reply to list
>> 
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>> 
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1