2011/5/16 Charles G Waldman <[log in to unmask]>: > > When the traffic optimization is supposed to happen in XRootD, you > > would see things like a XRootD::Read of len=489, because any external > > application (ROOT in this case) just needs to state what data it needs > > and the task of figuring out how to fetch it in the most optimal way > > is entrusted to XRootD which will optimize the actual network reads. > > The tweaking can then be done by the parameters you mention. Since > > XRootD is a generic data access software it has no knowledge of the > > underlying data file format so all it can do is some statistical guess > > work that may be more or less optimal. > > Yes, but for sites which are running a particular type of job - for > example, a USATLAS site - the jobs have similar data-access patterns > and it makes sense to have one set of read-ahead params for all the > jobs, controlled by an environment variable or setup file. For > example at the MWT2 site, we were using dcap access, and were having > good results with WAN reads once we set a few environment variables: > > export DC_LOCAL_CACHE_MEMORY_PER_FILE=10000000 > export DC_LOCAL_CACHE_BLOCK_SIZE=32768 > export DCACHE_RA_BUFFER=16000 > > which were applied to all ATLAS jobs. We have not been able to get > the same level of performance with xrootd across the WAN as we had > gotten with dcache with the above env. vars. There should be controls > which can be adjusted by (knowledgeable) site admins, who can tune > this to get optimal performance for their given operating parameters > (network latency, disk I/O performance, memory limitation, etc). Yes, this is quite correct. You can do similar adjustment with XRootD. Unfortunately in the older versions (coming with ROOT<5.28) you can do that only by putting a .rootrc file in the CWD of the job that you're running or in $ROOTSYS/etc, in versions coming with ROOT >5.28 you can set it via envvars. You can have three values for ReadAheadStrategy: 0) no read ahead 1) sequential - read some data ahead of currently requested buffer (how much should be read is specified by ReadAheadSize) 2) sliding window - window centered on the recent average slides through the file following the stream of the requests ReadCacheSize denotes how much data should be cached per file. > > 2) When the traffic optimization is supposed to happen in ROOT, you > > would most probably see xrootd being asked to perform vector reads > > (XrdClient::ReadV) of around 30 megs. This is far more optimal because > > ROOT knows its file format and can easily predict which parts of the > > file it will need in the nearest future so it is able to prefetch data > > before it is needed by the application. The parameters here are the > > ones that you mention below: tree->SetCacheSize and friends. > > OK, but as I understand it, there is no way to turn this on at the > site level - this can't be enabled in system.rootrc or via an env. > var, can it? It seems this requires users to rewrite their code. Correct. > > > Question 1) > > > Why is ReadAheadSize set to 0 here? And what's the best way to override this? > > > > Because most probably it is assumed that the optimization will be > > done in ROOT and XRootD should not bother to do anything. > > > > > We'd love to turn it on by default. But the read-ahead needs to put the data somewhere: we need some extra memory; 30 to 100MB are enough. But with the experiments' never ending quest for memory resources turning this on by default is not an option: a 4-core machine with 4 jobs would eat an additional 400MB. Too much. > > > > Well, if you turn on the read-ahead or any prefetching you need to > > store the additional data somewhere and RAM is the easiest target. We > > work on some code that will hopefully be committed to ROOT soon which > > will enable it to prefetch the data blocks and store them on disk. > > This could affect performance negatively - for a multi-core host, you > are now creating more I/O to the local disk, which can become a > bottleneck. This is exactly the reason we prefer remote-access to > stage-in for job inputs - having 24 or more jobs all access the same > local disk can create I/O contention. For worker nodes which are not > short on RAM, the additional memory usage of a (modest) cache is not > a problem. (Using 10MB/file for dCache gave a huge performance boost > and did not cause undue memory pressure. We did not experiment with > lower values but I suspect even 1MB/file would be enough to help > significantly). Depends on the usecase, it also introduces parallel prefetching of TreeCache buffers so the first results are quite promising. In any case it will be possible to disable/tweak it depending on the particular needs. > > > > Instead you need to turn it on by yourself by calling > > > > > > tree->SetCacheSize(10000000); > > > tree->AddBranchToCache("*"); > > > > > > > > > I don't think we can force users to do this, is there somewhere else in the stack > > > that this code could be inserted? > > > > Not really, since it's dependent on the user data that is being > > read. You could argue that it should use some caching by default but > > it's debatable. > > Well, we are comparing performance of xrootd and dcache. For jobs > accessing data across LAN, the performance is comparable. But once > the WAN enters the picture, the xrootd performace is poor compared to > what we get with dCache. So, I'm trying to at least reproduce the > dCache performance results ... (using ATLAS "Hammercloud" job > performance as my metric). Do similar tweaking that you have done for dCache :) > > > > Question 2) > > > Is this "session not found" the cause of the failures? > > > > This is a request to end a session that apparently does not > > exist... I will have a closer look. What is your access pattern? Do > > you have long standing jobs that keep the connections open for a long > > time, or is it more like the jobs fetching the data they need to > > process and quiting after the processing is over? > > This is a "canned" ATLAS job from the Hammercloud testing system - using > > Test template: 73 (stress) - Muon 16.0.3.3 PANDA default data-access > > I'll do some research to find out more about what this job is actually > doing. > As already commented by Andy: this is a bug that will be fixed. Thanks for reporting! Cheers, Lukasz