Print

Print


Hi Matevz,

Yes, all is OK except for the poor showing of single node HDFS performance. 
Is it that most file system calls are single threaded? I suspect that is the 
case otherwise I should not see an order of magnitude difference in 
performance. Something to ask the HDFS guys.

Andy

-----Original Message----- 
From: Matevz Tadel
Sent: Tuesday, February 04, 2014 6:44 PM
To: Andrew Hanushevsky ; xrootd-dev
Subject: Re: cms.dfs question

Hi Andy, everybody,

The scaling issues indeed only showed when doing open requests through the
meta-manager ... so all is OK. Right? :)

Sorry for all this noise ... I'll pay a round at the Federated Storage 
workshop :)

Matevz

On 2/4/14 5:43 PM, Matevz Tadel wrote:
> Hi Andy,
>
> Before we go too far ... let me check with the guy who ran the tests if he
> indeed went through site redirectors directly and not through the meta 
> manager.
>
> With meta manager in the game --- the results would make sense, right?
>
> Matevz
>
> On 2/4/14 5:29 PM, Matevz Tadel wrote:
>> Hi Andy,
>>
>> All versions are for the local redirectors.
>>
>> UCSD (3.3.3) had "lookup distrib redirect immed" and performs well 
>> (linear
>> scaling up to 300 Hz).
>>
>> Wisconsin (3.3.3), Nebraska (3.3.1, I think) and Purdue (3.3.2) had 
>> "lookup
>> central redirect immed" and performed poorly (clogged up at 10-20Hz). 
>> Wisconsin
>> and Nebraska already made the change to UCSD settings and are now scaling 
>> ok,
>> too.
>>
>> Matevz
>>
>> On 2/4/14 4:58 PM, Andrew Hanushevsky wrote:
>>> Hi Matevz,
>>>
>>> OK, then I am actually confused as well. So, which site has which option 
>>> and
>>> what release is each of them running.
>>>
>>> Andy
>>>
>>> -----Original Message----- From: Matevz Tadel
>>> Sent: Tuesday, February 04, 2014 4:49 PM
>>> To: Andrew Hanushevsky ; xrootd-dev
>>> Subject: Re: cms.dfs question
>>>
>>> Hi Andy,
>>>
>>> b) was actually the case, no meta-managers involved, just open request 
>>> on local
>>> manager, for a while we know (believe) is available on the site. And 
>>> both sites
>>> had redirect immed. It was lookup distrib vs. central that made the 
>>> difference
>>> which I did not expect (I also thought that redirect immed is the only 
>>> thing
>>> that matters).
>>>
>>> Matevz
>>>
>>> On 2/4/14 4:10 PM, Andrew Hanushevsky wrote:
>>>> Hi Matevz,
>>>>
>>>> Ah, I see you are as confused as Wilko (and perhaps me :-). We just 
>>>> spent some
>>>> time understanding what is going on for you (yes, my explanation made 
>>>> some
>>>> assumptions).
>>>>
>>>> a) I assumed you first talked to a meta-redirector that asked several 
>>>> hadoop
>>>> sites whether they had the file. You then got redirected to some xrootd 
>>>> and
>>>> opened the file there. Indeed the "redirect" setting in this case is 
>>>> immaterial
>>>> as the lookup has already been done.
>>>>
>>>> b) If (a) was not true, instead you went directly to a particular 
>>>> Hadoop
>>>> cluster
>>>> and opened the file on it's local redirector then the "redirect" makes 
>>>> a big
>>>> difference. In this case, immed means no lookup is done and you get 
>>>> sent to
>>>> some
>>>> server which will honor or fail your request.
>>>>
>>>> Andy
>>>>
>>>> -----Original Message----- From: Matevz Tadel
>>>> Sent: Tuesday, February 04, 2014 3:53 PM
>>>> To: Andrew Hanushevsky ; xrootd-dev
>>>> Subject: Re: cms.dfs question
>>>>
>>>> Hi Andy,
>>>>
>>>> Thanks for the explanation! To make sure I understand:
>>>>
>>>> 1. When redirector does not know if a file exists, it still has to 
>>>> perform the
>>>> lookup, as configured.
>>>>
>>>> 2. When using "lookup central", we are actually "measuring" the limit 
>>>> of hdfs
>>>> lookup on a single node (the redirector).
>>>>
>>>> Now I understand that redirector has to perform the lookup when it 
>>>> doesn't know
>>>> if a file exists ... otherwise it can not report to meta-manager(s).
>>>>
>>>> Would it make sense to have the equivalent of "lookup none" for open 
>>>> requests.
>>>> The client can then deal directly with a data server. It's true that 
>>>> the
>>>> redirector does not "learn" anything useful in this case so it can lead 
>>>> to more
>>>> trouble down the road, especially with mis-behaving users/clients.
>>>>
>>>> Matevz
>>>>
>>>> On 2/4/14 3:20 PM, Andrew Hanushevsky wrote:
>>>>> Hi Matevz,
>>>>>
>>>>> You are getting caught in the lookup" phase. Distributed lookup will 
>>>>> always
>>>>> scale better then central lookup, when a lookup *has* to be performed. 
>>>>> The
>>>>> redirect part is what to do when a lookup can be avoided because the
>>>>> information
>>>>> is already cached. Immed is always the best option is you have a true
>>>>> distributed file system underneath.
>>>>>
>>>>> Anyway, I can't say that I have convinced people that distributed 
>>>>> normally has
>>>>> better scaling, and I have tried.  Unfortunately, the majority still 
>>>>> seems to
>>>>> gravitate to centralized vertical design options because they are more
>>>>> comforting.
>>>>>
>>>>> Andy
>>>>>
>>>>> -----Original Message----- From: Matevz Tadel
>>>>> Sent: Tuesday, February 04, 2014 2:46 PM
>>>>> To: xrootd-dev
>>>>> Subject: cms.dfs question
>>>>>
>>>>> Hi,
>>>>>
>>>>> We (AAA) are doing redirection rate scaling tests and noticed a large
>>>>> difference
>>>>> between *hadoop* sites based on how cms.dfs is setup.
>>>>>
>>>>> This works great (scaling beyond 300Hz):
>>>>>    cms.dfs lookup distrib redirect immed
>>>>> and this saturates at ~20Hz:
>>>>>    cms.dfs lookup central redirect immed
>>>>>
>>>>> I'm puzzled, because I'd expect that "redirect immed" trumps whatever 
>>>>> lookup
>>>>> setting one might choose. We were lucky -- we had two sites that chose
>>>>> different
>>>>> values for lookup :)
>>>>>
>>>>> Matevz
>>>>>
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>>
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1