Print

Print


From what I understand the storage admins at RAL see that CEPH is healthy from the point of view of the storage.

Right. Ceph can be perfectly healthy and performant and still not be fast enough to beat the request timeout. One thing you can monitor for is the ratio of data in / out of the cache process -- you can likely see this impact when there's much greater data in than out.

It's not clear from your above debug statements when the timeout is hitting. Andy mentioned that the internal default is 30 minutes -- but there are several ways this can get overridden (CMS does this, for example).

For my understanding, why doesn't xrootd reopen the file? We see quite a few "socket errors" in general but this is never (apparently) fatal.

There's a complex set of retry rules for recovery from errors; I'm not an expert on them. However, in general, if a server is overloaded and you retry then you just end up with a more-overloaded server. It's not clear to me that retrying on timeouts is an obvious strategy.

The file reads are in general not random I suppose - in normal operations we would read from the beginning to the end of the file, looping over the events and my test jobs do exactly that

Unfortunately, this really depends on the LHCb files themselves. In CMS, for example, some of the metadata read up front is scattered throughout the file -- it might only be a few MB of data in total but works maximally poorly with caching strategies. I believe you can get the Xrootd client or server to log each specific offset in a vectored read to see what ROOT is trying to do.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1259#issuecomment-663102286", "url": "https://github.com/xrootd/xrootd/issues/1259#issuecomment-663102286", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1