Hi again,
I still don't have any further information on the crashes unfortunately as we currently have the production queue that processed files via xcache taken offline.
However, i can provide 2 new pieces of feedback:
The ciosync
workaround seems to partially solve the problem. However, not 100% - sometimes we still observed that the number of files started again to rise steadily as seen in this plot plot_openfiles_05-11_05_18.pdf
This might be a different issue, but i suspect it happens as well when the file limit is reached: We see some files that end up corrupted in the cache (wrong checksum). These files are marked as "complete" in the .cinfo
files and when downloading them via the xcache server, the client receives the corrupted file. Out of 200k "complete" files in our cache we saw 91 such cases. From a quick check of 2 of these files they seem to have these things in common:
.cinfo
file contains a certain number of bytesMissed
, but not matching the size of the empty blockCheers,
Nikolai
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1