@bbockelm

In that case, I'm a bit befuddled where to turn next. I assume that you have already eliminated various standard network problems?

Firewall, connection limits, latency, tcp wrappers - yes, we tested a lot of them.
A month ago we still had some transfer issues as identified by our perfSonar (separate server on the same network), but fixing those upstream did not change the checksum issues.

Are there any idle timeouts in the server you've set that might be terminating the connection unexpectedly?

We are back to using defaults - could you please name any particular settings you would recommend we test?

It's not really a step toward solution but, in terms of having fewer moving parts, can you reproduce in a data-server-only setup?

We can try stand-alone again, but I expect the current redirector (we would use that, since it is registered everywhere) to struggle with the load (it is a tiny VM for just redirecting). I will synchronize the configs and we can test it for a few hours.

@wyang007

My GUESS from what I see is that when many checksum requests coming in, checksum calculation are actually slow (CPU limit or HD's priority to write over read? don't know).

We currently allow for 100 simultaneous checksums (which we have not yet reached). The data is read over the network and as far as I can see, we do not hit any hardware limitations (CPU, RAM, network, disk).

In that case, FTS may timeout the transfer after 1 minute (and from the FTS logs I saw, all checksum timeout happened exactly after 1 minute).

Where do you see that? From the FTS logs I can see a timeout after 15 minutes, not 1.
For very big files (e.g. 8 GB), we would expect the checksum to take a bit longer than 60s, but for most of the failing files, checksum finish within 20s.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: <xrootd/xrootd/issues/1736/1188755582@github.com>

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1736#issuecomment-1188755582", "url": "https://github.com/xrootd/xrootd/issues/1736#issuecomment-1188755582", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1