Hi Fons,
yes, in principle I agree, and I believe that the 'shortcut' feature
in PROOF is a cool idea which enhances the overall processing rate.
However, to make this possible without having to track file locations
(in a slow DB? And what about spreading replicas? First you have to make
them, and count this pre-processing in the benchmarks :-D) you chose to
make the xrootd cluster and the worker nodes coexist, to exploit the
location speed of the xrootd storage part. That is a cool idea too, and
I ignore if with other systems that will be possible or easy to accomplish.
The only thing is that I am skeptical about the fact that the proof
scheduler will *in any case* be able to keep always the most efficient
matching between worker nodes and their local storage. I suppose that,
having many concurrent users and many different files, the gain will be
reduced due to the difficulty of the scheduler to find always the
optimal match. I suppose that the efficiency would decrease
asynptotically to a middle point between the performance with and
without shortcut. But that is not a loss, it's an advantage over the
'without' case, i.e. with the storage system completely detached from
the proof cluster.
So, I don't think that you don't need the local storage or the
shortcut feature, unless you have a very powerful network/storage
behind. You would end up in putting more horsepower in the
storage/network with respect to the worker pool.
The advantage which typically storage systems give is that of a
greater flexibility, however. With xrootd you share the nodes and get
the best of the two worlds, with other storage systems I really do not know.
Fabrizio
Fons Rademakers ha scritto:
> Hi Fabrizio,
>
> assume a small rack of 20 1U/2U dual-quad-cores + 8 disks each. Such a
> rack can process: 20 * 8 * 15 = 2.4GB/s (15MB/s ROOT compressed file
> reading speed, I/O bound query). Now such a rack would need a switch
> with a dual 10GB uplink to get just 2 GB/s in over the network. Now add
> another couple of such racks. You would need a disk pool + a lot of 10GB
> eth equipment per rack. You still think it scales better than having
> disks close to the CPU's?
>
> Cheers, Fons.
>
>
> Fabrizio Furano wrote:
>> Hi Pablo,
>>
>> that's very interesting, and I agree completely with your conclusion,
>> i.e. in most cases the lan data access is more efficient and scales
>> better with respect to local disk access. Many times this is not very
>> well understood by people, always striving to keep local files at any
>> cost.
>>
>> It would be very interesting to have a comparison between the
>> performance in proof between a dcache storage and an analogous xrootd
>> storage, which is the default solution for that. With the same pool of
>> workers of course.
>>
>> From what I've understood, dcache uses a read ahead mechanism (at the
>> client side), while xrootd uses a scheme which is mixed with informed
>> async prefetching.
>>
>> Fabrizio
>>
>> Pablo Fernandez ha scritto:
>>> Hi all,
>>>
>>> I would like to share with you some information about my testings of
>>> performance in Proof with different storage schemas.
>>> http://root.cern.ch/phpBB2/viewtopic.php?t=6236
>>>
>>> I have translated this topic to the Proof Forum since seems to me
>>> more Proof-related than just xrootd, I hope you don't mind.
>>>
>>> BR/Pablo
>>
>
|