Hi Matevz, Yes, all is OK except for the poor showing of single node HDFS performance. Is it that most file system calls are single threaded? I suspect that is the case otherwise I should not see an order of magnitude difference in performance. Something to ask the HDFS guys. Andy -----Original Message----- From: Matevz Tadel Sent: Tuesday, February 04, 2014 6:44 PM To: Andrew Hanushevsky ; xrootd-dev Subject: Re: cms.dfs question Hi Andy, everybody, The scaling issues indeed only showed when doing open requests through the meta-manager ... so all is OK. Right? :) Sorry for all this noise ... I'll pay a round at the Federated Storage workshop :) Matevz On 2/4/14 5:43 PM, Matevz Tadel wrote: > Hi Andy, > > Before we go too far ... let me check with the guy who ran the tests if he > indeed went through site redirectors directly and not through the meta > manager. > > With meta manager in the game --- the results would make sense, right? > > Matevz > > On 2/4/14 5:29 PM, Matevz Tadel wrote: >> Hi Andy, >> >> All versions are for the local redirectors. >> >> UCSD (3.3.3) had "lookup distrib redirect immed" and performs well >> (linear >> scaling up to 300 Hz). >> >> Wisconsin (3.3.3), Nebraska (3.3.1, I think) and Purdue (3.3.2) had >> "lookup >> central redirect immed" and performed poorly (clogged up at 10-20Hz). >> Wisconsin >> and Nebraska already made the change to UCSD settings and are now scaling >> ok, >> too. >> >> Matevz >> >> On 2/4/14 4:58 PM, Andrew Hanushevsky wrote: >>> Hi Matevz, >>> >>> OK, then I am actually confused as well. So, which site has which option >>> and >>> what release is each of them running. >>> >>> Andy >>> >>> -----Original Message----- From: Matevz Tadel >>> Sent: Tuesday, February 04, 2014 4:49 PM >>> To: Andrew Hanushevsky ; xrootd-dev >>> Subject: Re: cms.dfs question >>> >>> Hi Andy, >>> >>> b) was actually the case, no meta-managers involved, just open request >>> on local >>> manager, for a while we know (believe) is available on the site. And >>> both sites >>> had redirect immed. It was lookup distrib vs. central that made the >>> difference >>> which I did not expect (I also thought that redirect immed is the only >>> thing >>> that matters). >>> >>> Matevz >>> >>> On 2/4/14 4:10 PM, Andrew Hanushevsky wrote: >>>> Hi Matevz, >>>> >>>> Ah, I see you are as confused as Wilko (and perhaps me :-). We just >>>> spent some >>>> time understanding what is going on for you (yes, my explanation made >>>> some >>>> assumptions). >>>> >>>> a) I assumed you first talked to a meta-redirector that asked several >>>> hadoop >>>> sites whether they had the file. You then got redirected to some xrootd >>>> and >>>> opened the file there. Indeed the "redirect" setting in this case is >>>> immaterial >>>> as the lookup has already been done. >>>> >>>> b) If (a) was not true, instead you went directly to a particular >>>> Hadoop >>>> cluster >>>> and opened the file on it's local redirector then the "redirect" makes >>>> a big >>>> difference. In this case, immed means no lookup is done and you get >>>> sent to >>>> some >>>> server which will honor or fail your request. >>>> >>>> Andy >>>> >>>> -----Original Message----- From: Matevz Tadel >>>> Sent: Tuesday, February 04, 2014 3:53 PM >>>> To: Andrew Hanushevsky ; xrootd-dev >>>> Subject: Re: cms.dfs question >>>> >>>> Hi Andy, >>>> >>>> Thanks for the explanation! To make sure I understand: >>>> >>>> 1. When redirector does not know if a file exists, it still has to >>>> perform the >>>> lookup, as configured. >>>> >>>> 2. When using "lookup central", we are actually "measuring" the limit >>>> of hdfs >>>> lookup on a single node (the redirector). >>>> >>>> Now I understand that redirector has to perform the lookup when it >>>> doesn't know >>>> if a file exists ... otherwise it can not report to meta-manager(s). >>>> >>>> Would it make sense to have the equivalent of "lookup none" for open >>>> requests. >>>> The client can then deal directly with a data server. It's true that >>>> the >>>> redirector does not "learn" anything useful in this case so it can lead >>>> to more >>>> trouble down the road, especially with mis-behaving users/clients. >>>> >>>> Matevz >>>> >>>> On 2/4/14 3:20 PM, Andrew Hanushevsky wrote: >>>>> Hi Matevz, >>>>> >>>>> You are getting caught in the lookup" phase. Distributed lookup will >>>>> always >>>>> scale better then central lookup, when a lookup *has* to be performed. >>>>> The >>>>> redirect part is what to do when a lookup can be avoided because the >>>>> information >>>>> is already cached. Immed is always the best option is you have a true >>>>> distributed file system underneath. >>>>> >>>>> Anyway, I can't say that I have convinced people that distributed >>>>> normally has >>>>> better scaling, and I have tried. Unfortunately, the majority still >>>>> seems to >>>>> gravitate to centralized vertical design options because they are more >>>>> comforting. >>>>> >>>>> Andy >>>>> >>>>> -----Original Message----- From: Matevz Tadel >>>>> Sent: Tuesday, February 04, 2014 2:46 PM >>>>> To: xrootd-dev >>>>> Subject: cms.dfs question >>>>> >>>>> Hi, >>>>> >>>>> We (AAA) are doing redirection rate scaling tests and noticed a large >>>>> difference >>>>> between *hadoop* sites based on how cms.dfs is setup. >>>>> >>>>> This works great (scaling beyond 300Hz): >>>>> cms.dfs lookup distrib redirect immed >>>>> and this saturates at ~20Hz: >>>>> cms.dfs lookup central redirect immed >>>>> >>>>> I'm puzzled, because I'd expect that "redirect immed" trumps whatever >>>>> lookup >>>>> setting one might choose. We were lucky -- we had two sites that chose >>>>> different >>>>> values for lookup :) >>>>> >>>>> Matevz >>>>> >>>>> ######################################################################## >>>>> Use REPLY-ALL to reply to list >>>>> >>>>> To unsubscribe from the XROOTD-DEV list, click the following link: >>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>>> >>> >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >>> ######################################################################## >>> Use REPLY-ALL to reply to list >>> >>> To unsubscribe from the XROOTD-DEV list, click the following link: >>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-DEV list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-DEV list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1