Print

Print


Hi Brian,

On 2/4/14 6:54 PM, Brian Bockelman wrote:
> Hi Matevz,
>
> Not sure if I'm so convinced all is well.

What I meant was that the configuration change and the related change in 
performance makes (some) sense ... I agree we should understand what breaks in 
the "lookup central" case for hdfs.

Matevz

> If I use the HDFS client (via FUSE) in a single thread to query the NN as fast as possible, I get a query rate of about 4.7 kHZ.  Why is the Xrootd single-client performance 2 orders of magnitude slower than FUSE?
>
> Brian
>
> On Feb 4, 2014, at 8:44 PM, Matevz Tadel <[log in to unmask]> wrote:
>
>> Hi Andy, everybody,
>>
>> The scaling issues indeed only showed when doing open requests through the meta-manager ... so all is OK. Right? :)
>>
>> Sorry for all this noise ... I'll pay a round at the Federated Storage workshop :)
>>
>> Matevz
>>
>> On 2/4/14 5:43 PM, Matevz Tadel wrote:
>>> Hi Andy,
>>>
>>> Before we go too far ... let me check with the guy who ran the tests if he
>>> indeed went through site redirectors directly and not through the meta manager.
>>>
>>> With meta manager in the game --- the results would make sense, right?
>>>
>>> Matevz
>>>
>>> On 2/4/14 5:29 PM, Matevz Tadel wrote:
>>>> Hi Andy,
>>>>
>>>> All versions are for the local redirectors.
>>>>
>>>> UCSD (3.3.3) had "lookup distrib redirect immed" and performs well (linear
>>>> scaling up to 300 Hz).
>>>>
>>>> Wisconsin (3.3.3), Nebraska (3.3.1, I think) and Purdue (3.3.2) had "lookup
>>>> central redirect immed" and performed poorly (clogged up at 10-20Hz). Wisconsin
>>>> and Nebraska already made the change to UCSD settings and are now scaling ok,
>>>> too.
>>>>
>>>> Matevz
>>>>
>>>> On 2/4/14 4:58 PM, Andrew Hanushevsky wrote:
>>>>> Hi Matevz,
>>>>>
>>>>> OK, then I am actually confused as well. So, which site has which option and
>>>>> what release is each of them running.
>>>>>
>>>>> Andy
>>>>>
>>>>> -----Original Message----- From: Matevz Tadel
>>>>> Sent: Tuesday, February 04, 2014 4:49 PM
>>>>> To: Andrew Hanushevsky ; xrootd-dev
>>>>> Subject: Re: cms.dfs question
>>>>>
>>>>> Hi Andy,
>>>>>
>>>>> b) was actually the case, no meta-managers involved, just open request on local
>>>>> manager, for a while we know (believe) is available on the site. And both sites
>>>>> had redirect immed. It was lookup distrib vs. central that made the difference
>>>>> which I did not expect (I also thought that redirect immed is the only thing
>>>>> that matters).
>>>>>
>>>>> Matevz
>>>>>
>>>>> On 2/4/14 4:10 PM, Andrew Hanushevsky wrote:
>>>>>> Hi Matevz,
>>>>>>
>>>>>> Ah, I see you are as confused as Wilko (and perhaps me :-). We just spent some
>>>>>> time understanding what is going on for you (yes, my explanation made some
>>>>>> assumptions).
>>>>>>
>>>>>> a) I assumed you first talked to a meta-redirector that asked several hadoop
>>>>>> sites whether they had the file. You then got redirected to some xrootd and
>>>>>> opened the file there. Indeed the "redirect" setting in this case is immaterial
>>>>>> as the lookup has already been done.
>>>>>>
>>>>>> b) If (a) was not true, instead you went directly to a particular Hadoop
>>>>>> cluster
>>>>>> and opened the file on it's local redirector then the "redirect" makes a big
>>>>>> difference. In this case, immed means no lookup is done and you get sent to
>>>>>> some
>>>>>> server which will honor or fail your request.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> -----Original Message----- From: Matevz Tadel
>>>>>> Sent: Tuesday, February 04, 2014 3:53 PM
>>>>>> To: Andrew Hanushevsky ; xrootd-dev
>>>>>> Subject: Re: cms.dfs question
>>>>>>
>>>>>> Hi Andy,
>>>>>>
>>>>>> Thanks for the explanation! To make sure I understand:
>>>>>>
>>>>>> 1. When redirector does not know if a file exists, it still has to perform the
>>>>>> lookup, as configured.
>>>>>>
>>>>>> 2. When using "lookup central", we are actually "measuring" the limit of hdfs
>>>>>> lookup on a single node (the redirector).
>>>>>>
>>>>>> Now I understand that redirector has to perform the lookup when it doesn't know
>>>>>> if a file exists ... otherwise it can not report to meta-manager(s).
>>>>>>
>>>>>> Would it make sense to have the equivalent of "lookup none" for open requests.
>>>>>> The client can then deal directly with a data server. It's true that the
>>>>>> redirector does not "learn" anything useful in this case so it can lead to more
>>>>>> trouble down the road, especially with mis-behaving users/clients.
>>>>>>
>>>>>> Matevz
>>>>>>
>>>>>> On 2/4/14 3:20 PM, Andrew Hanushevsky wrote:
>>>>>>> Hi Matevz,
>>>>>>>
>>>>>>> You are getting caught in the lookup" phase. Distributed lookup will always
>>>>>>> scale better then central lookup, when a lookup *has* to be performed. The
>>>>>>> redirect part is what to do when a lookup can be avoided because the
>>>>>>> information
>>>>>>> is already cached. Immed is always the best option is you have a true
>>>>>>> distributed file system underneath.
>>>>>>>
>>>>>>> Anyway, I can't say that I have convinced people that distributed normally has
>>>>>>> better scaling, and I have tried.  Unfortunately, the majority still seems to
>>>>>>> gravitate to centralized vertical design options because they are more
>>>>>>> comforting.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> -----Original Message----- From: Matevz Tadel
>>>>>>> Sent: Tuesday, February 04, 2014 2:46 PM
>>>>>>> To: xrootd-dev
>>>>>>> Subject: cms.dfs question
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We (AAA) are doing redirection rate scaling tests and noticed a large
>>>>>>> difference
>>>>>>> between *hadoop* sites based on how cms.dfs is setup.
>>>>>>>
>>>>>>> This works great (scaling beyond 300Hz):
>>>>>>>    cms.dfs lookup distrib redirect immed
>>>>>>> and this saturates at ~20Hz:
>>>>>>>    cms.dfs lookup central redirect immed
>>>>>>>
>>>>>>> I'm puzzled, because I'd expect that "redirect immed" trumps whatever lookup
>>>>>>> setting one might choose. We were lucky -- we had two sites that chose
>>>>>>> different
>>>>>>> values for lookup :)
>>>>>>>
>>>>>>> Matevz
>>>>>>>
>>>>>>> ########################################################################
>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>
>>>>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>>>>
>>>>>
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>>
>>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>>
>>>> ########################################################################
>>>> Use REPLY-ALL to reply to list
>>>>
>>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>>
>>> ########################################################################
>>> Use REPLY-ALL to reply to list
>>>
>>> To unsubscribe from the XROOTD-DEV list, click the following link:
>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1