Print

Print


Hi Fons,

  yes, in principle I agree, and I believe that the 'shortcut' feature 
in PROOF is a cool idea which enhances the overall processing rate. 
However, to make this possible without having to track file locations 
(in a slow DB? And what about spreading replicas? First you have to make 
them, and count this pre-processing in the benchmarks :-D) you chose to 
make the xrootd cluster and the worker nodes coexist, to exploit the 
location speed of the xrootd storage part. That is a cool idea too, and 
I ignore if with other systems that will be possible or easy to accomplish.

  The only thing is that I am skeptical about the fact that the proof 
scheduler will *in any case* be able to keep always the most efficient 
matching between worker nodes and their local storage. I suppose that, 
having many concurrent users and many different files, the gain will be 
reduced due to the difficulty of the scheduler to find always the 
optimal match. I suppose that the efficiency would decrease 
asynptotically to a middle point between the performance with and 
without shortcut. But that is not a loss, it's an advantage over the 
'without' case, i.e. with the storage system completely detached from 
the proof cluster.

  So, I don't think that you don't need the local storage or the 
shortcut feature, unless you have a very powerful network/storage 
behind. You would end up in putting more horsepower in the 
storage/network with respect to the worker pool.

  The advantage which typically storage systems give is that of a 
greater flexibility, however. With xrootd you share the nodes and get 
the best of the two worlds, with other storage systems I really do not know.

Fabrizio



Fons Rademakers ha scritto:
> Hi Fabrizio,
> 
>   assume a small rack of 20 1U/2U dual-quad-cores + 8 disks each. Such a 
> rack can process: 20 * 8 * 15 = 2.4GB/s  (15MB/s ROOT compressed file 
> reading speed, I/O bound query). Now such a rack would need a switch 
> with a  dual 10GB uplink to get just 2 GB/s in over the network. Now add 
> another couple of such racks. You would need a disk pool + a lot of 10GB 
> eth equipment per rack. You still think it scales better than having 
> disks close to the CPU's?
> 
> Cheers, Fons.
> 
> 
> Fabrizio Furano wrote:
>> Hi Pablo,
>>
>>  that's very interesting, and I agree completely with your conclusion, 
>> i.e. in most cases the lan data access is more efficient and scales 
>> better with respect to local disk access. Many times this is not very 
>> well understood by people, always striving to keep local files at any 
>> cost.
>>
>>  It would be very interesting to have a comparison between the 
>> performance in proof between a dcache storage and an analogous xrootd 
>> storage, which is the default solution for that. With the same pool of 
>> workers of course.
>>
>>  From what I've understood, dcache uses a read ahead mechanism (at the 
>> client side), while xrootd uses a scheme which is mixed with informed 
>> async prefetching.
>>
>>  Fabrizio
>>
>> Pablo Fernandez ha scritto:
>>> Hi all,
>>>
>>> I would like to share with you some information about my testings of 
>>> performance in Proof with different storage schemas.
>>> http://root.cern.ch/phpBB2/viewtopic.php?t=6236
>>>
>>> I have translated this topic to the Proof Forum since seems to me 
>>> more Proof-related than just xrootd, I hope you don't mind.
>>>
>>> BR/Pablo
>>
>