Oggetto: File exists error when reading from XCache server
Data: Fri, 2 Oct 2020 12:25:21 +0200
Dear all,

we are using XCache for testing the integration of CINECA HPC with CMS workflows. I'd say that everything looks quite good except for a strange problem appearing rarely enough but quite related to the number of client connected.

What we see is client failing to read a file with error "File exists". After that the same file keep failing for all other requests, so in a certain sense "corrupted".
We see this happening with different status of the file in cache, sometimes it has size==0, sometimes is partially there, so no particular pattern there. A typical error on xcache machine (with a high enough (I hope) debug level for xrd and ofs) is like this one (*).

Some additional details:

- fs where data are stored is on an high performance gpfs (ssd underneath)
- network bandwith is 40Gpbs (used less then 2)
- machine and disk does not look busy at all. The load is less than 1 for a 16 core machine, and the iowait is in practice null
- connected clients are around 500-1000
- the number of open file is stable at  ~"lsof -p 3100 " with limit for xrootd at 65k
- the configuration used is here (**) and the xrootd version is 4.12.3

Did you have any previous experience with anything similar? Could you help us to understand what is happening?

Please let me know if you need any additional information.




