Print

Print


  Hi Niall,

On Mon, Apr 10, 2006 at 06:12:11PM +0100, Niall Dalton wrote:
> I've been reading the documentation on xrootd, and I'm wondering how it 
> might work in practice for a particular workload I'm trying to optimize.
> 
> xrootd performs very well to provide high aggregate read performance for 
> one or many clients. Can someone provide information, or point me 
> towards published results, on write intensive and concurrent read/write 
> intensive applications? 

  This may not be what you are looking for, but here is an example note 
on high rate I/O disk server benchmarks: 

  http://cmsdoc.cern.ch/cms/cpt/Computing/Technical/documents/tier0/20060321/Disk_server_benchmarks_v2.doc

That study didn't actually use xrootd, but another protocol, rfio, but that 
matters little since it was essentially a _disk server_ benchmark. The 
reads/writes were large (1.5MB blocks). For xrootd-specific benchmarks
with smaller reads/writes, one of the others on the Cc might be able to
answer.

> How are large data sets normally loaded into xrootd?

  The typical high-energy-physics use case involves serving data to
large numbers of applications which may do many small reads. On write data 
is _usually_ streamed into the system, though, e.g.:

   o An application on some machine writes a file to the machine's local 
     disk and and when it finishes copies it over into some larger storage 
     space via xrootd. (Our applications are also cpu-intensive so the write
     would often be "slow" if it wrote directly to the larger storage, hence
     often we just write to the local disk and after N hours push the 
     resulting file over.)

   o An application tries to read a file via xrootd and xrootd (transparently,
     behind the scenes) triggers the staging (streaming write) of a file 
     from a tape system onto a disk xrootd sees, after which xrootd serves 
     it (for read) to the application across the network.

This isn't to say that more fine-grained writes won't be performant, it
is just that we normally do streaming writes.

> I'm trying to understand how well xrootd would perform if there was say 
> a stream of data (about 25GB over 8 hours each day, peak write event 
> rates of 125,000 messages a second) which had to be written while 
> simultaneously there was a number of clients attempting to analysis the 
> data - either large historical analyses or perhaps just interested in 
> the last 10 minutes worth of events.

  25GB isn't actually a lot over 8 hours: less than 1MB/s average, so perhaps 
the disk server benchmarks above aren't really what you are looking for. It 
sounds like you are more interested in very high transaction rates for very 
small reads/writes, is that correct? Here I would expect that xrootd would 
perform much better than other technologies (e.g. rfio mentioned above), but 
Andy or someone else on this list can probably better respond as to what 
they've benchmarked for that regime.

  (Could you tell us a bit more about your application? ;-)

                                   Pete

-------------------------------------------------------------------------
Peter Elmer     E-mail: [log in to unmask]      Phone: +41 (22) 767-4644
Address: CERN Division PPE, Bat. 32 2C-14, CH-1211 Geneva 23, Switzerland
-------------------------------------------------------------------------