Print

Print


Dear experts,

I'm using a proxy file cache server to serve inputs for a computing 
cluster deployed on a cloud environment for CMS experiment workflows.
Actually it consists in a cluster of 3 machines under a common 
redirector, the flavor of the machine is with quite high RAM (256GB) and 
low latency high IO volume (10TB each), the bandwidth is 10Gbps for each 
server.

So, the setup is working quite nicely but, starting from around 800-1000 
concurrent jobs, we started to face with an increasing frequency 
connection errors server-side like this (*), corresponding to client 
failure as (**). Unfortunately I did not manage to find out more 
debugging information :/ Moreover they look not really correlated to the 
origin server chosen, so I suspect that could be something related to 
the network or the cache host machine configuration.
You can find here (***) the relevant part of the cache xrd 
configuration. Do you have any idea/guidance or previous experiences 
regarding this kind of issue?

Cheers,
Diego

(*)
[2018-07-02 13:51:27.968006 +0000][Error  ][AsyncSock         ] 
[[log in to unmask]:1094 #0.0] Unable to initiate the connection: 
[ERROR] Socket error: Network is unreachable
(**)
failure when reading from 192.168.77.20:32294 (unknown site); failed 
with error '[ERROR] Operation expired' (errno=0, code=206).
(***)
set rdtrCache=192.168.72.247
set rdtrPortCmsd=31112
set rdtrGlobal=xrootd-cms.infn.it
set rdtrGlobalPort=1094
set cacheLowWm=0.8
set cacheHiWm=0.9
set cacheLogLevel=error
set cachePath=/storage
set cacheRam=60
set cacheStreams=256
set prefetch=0
set blkSize=512k

all.export /
all.role  server
oss.localroot $cachePath

xrd.port 32294
ofs.osslib   libXrdPss.so
pss.cachelib libXrdFileCache.so

pss.origin $rdtrGlobal:$rdtrGlobalPort

pss.config streams 256 workers 16
pss.setopt ConnectTimeout 30
pss.setopt DebugLevel 3
pss.setopt RequestTimeout 30

xrootd.seclib /usr/lib64/libXrdSec.so

pfc.diskusage $cacheLowWm $cacheHiWm
pfc.ram       ${cacheRam}g

pfc.blocksize   $blkSize
pfc.prefetch    $prefetch

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1