2011/5/16 Charles G Waldman <[log in to unmask]>:
> > When the traffic optimization is supposed to happen in XRootD, you
> > would see things like a XRootD::Read of len=489, because any external
> > application (ROOT in this case) just needs to state what data it needs
> > and the task of figuring out how to fetch it in the most optimal way
> > is entrusted to XRootD which will optimize the actual network reads.
> > The tweaking can then be done by the parameters you mention. Since
> > XRootD is a generic data access software it has no knowledge of the
> > underlying data file format so all it can do is some statistical guess
> > work that may be more or less optimal.
> Yes, but for sites which are running a particular type of job - for
> example, a USATLAS site - the jobs have similar data-access patterns
> and it makes sense to have one set of read-ahead params for all the
> jobs, controlled by an environment variable or setup file. For
> example at the MWT2 site, we were using dcap access, and were having
> good results with WAN reads once we set a few environment variables:
> export DC_LOCAL_CACHE_MEMORY_PER_FILE=10000000
> export DC_LOCAL_CACHE_BLOCK_SIZE=32768
> export DCACHE_RA_BUFFER=16000
> which were applied to all ATLAS jobs. We have not been able to get
> the same level of performance with xrootd across the WAN as we had
> gotten with dcache with the above env. vars. There should be controls
> which can be adjusted by (knowledgeable) site admins, who can tune
> this to get optimal performance for their given operating parameters
> (network latency, disk I/O performance, memory limitation, etc).
Yes, this is quite correct. You can do similar adjustment with
XRootD. Unfortunately in the older versions (coming with ROOT<5.28)
you can do that only by putting a .rootrc file in the CWD of the job
that you're running or in $ROOTSYS/etc, in versions coming with ROOT
>5.28 you can set it via envvars.
You can have three values for ReadAheadStrategy:
0) no read ahead
1) sequential - read some data ahead of currently requested buffer
(how much should be read is specified by ReadAheadSize)
2) sliding window - window centered on the recent average slides
through the file following the stream of the requests
ReadCacheSize denotes how much data should be cached per file.
> > 2) When the traffic optimization is supposed to happen in ROOT, you
> > would most probably see xrootd being asked to perform vector reads
> > (XrdClient::ReadV) of around 30 megs. This is far more optimal because
> > ROOT knows its file format and can easily predict which parts of the
> > file it will need in the nearest future so it is able to prefetch data
> > before it is needed by the application. The parameters here are the
> > ones that you mention below: tree->SetCacheSize and friends.
> OK, but as I understand it, there is no way to turn this on at the
> site level - this can't be enabled in system.rootrc or via an env.
> var, can it? It seems this requires users to rewrite their code.
> > > Question 1)
> > > Why is ReadAheadSize set to 0 here? And what's the best way to override this?
> > Because most probably it is assumed that the optimization will be
> > done in ROOT and XRootD should not bother to do anything.
> > > We'd love to turn it on by default. But the read-ahead needs to put the data somewhere: we need some extra memory; 30 to 100MB are enough. But with the experiments' never ending quest for memory resources turning this on by default is not an option: a 4-core machine with 4 jobs would eat an additional 400MB. Too much.
> > Well, if you turn on the read-ahead or any prefetching you need to
> > store the additional data somewhere and RAM is the easiest target. We
> > work on some code that will hopefully be committed to ROOT soon which
> > will enable it to prefetch the data blocks and store them on disk.
> This could affect performance negatively - for a multi-core host, you
> are now creating more I/O to the local disk, which can become a
> bottleneck. This is exactly the reason we prefer remote-access to
> stage-in for job inputs - having 24 or more jobs all access the same
> local disk can create I/O contention. For worker nodes which are not
> short on RAM, the additional memory usage of a (modest) cache is not
> a problem. (Using 10MB/file for dCache gave a huge performance boost
> and did not cause undue memory pressure. We did not experiment with
> lower values but I suspect even 1MB/file would be enough to help
Depends on the usecase, it also introduces parallel prefetching of
TreeCache buffers so the first results are quite promising. In any
case it will be possible to disable/tweak it depending on the
> > > Instead you need to turn it on by yourself by calling
> > >
> > > tree->SetCacheSize(10000000);
> > > tree->AddBranchToCache("*");
> > >
> > >
> > > I don't think we can force users to do this, is there somewhere else in the stack
> > > that this code could be inserted?
> > Not really, since it's dependent on the user data that is being
> > read. You could argue that it should use some caching by default but
> > it's debatable.
> Well, we are comparing performance of xrootd and dcache. For jobs
> accessing data across LAN, the performance is comparable. But once
> the WAN enters the picture, the xrootd performace is poor compared to
> what we get with dCache. So, I'm trying to at least reproduce the
> dCache performance results ... (using ATLAS "Hammercloud" job
> performance as my metric).
Do similar tweaking that you have done for dCache :)
> > > Question 2)
> > > Is this "session not found" the cause of the failures?
> > This is a request to end a session that apparently does not
> > exist... I will have a closer look. What is your access pattern? Do
> > you have long standing jobs that keep the connections open for a long
> > time, or is it more like the jobs fetching the data they need to
> > process and quiting after the processing is over?
> This is a "canned" ATLAS job from the Hammercloud testing system - using
> Test template: 73 (stress) - Muon 220.127.116.11 PANDA default data-access
> I'll do some research to find out more about what this job is actually
As already commented by Andy: this is a bug that will be fixed.
Thanks for reporting!