Dear experts,
I'm using a proxy file cache server to serve inputs for a computing
cluster deployed on a cloud environment for CMS experiment workflows.
Actually it consists in a cluster of 3 machines under a common
redirector, the flavor of the machine is with quite high RAM (256GB) and
low latency high IO volume (10TB each), the bandwidth is 10Gbps for each
server.
So, the setup is working quite nicely but, starting from around 800-1000
concurrent jobs, we started to face with an increasing frequency
connection errors server-side like this (*), corresponding to client
failure as (**). Unfortunately I did not manage to find out more
debugging information :/ Moreover they look not really correlated to the
origin server chosen, so I suspect that could be something related to
the network or the cache host machine configuration.
You can find here (***) the relevant part of the cache xrd
configuration. Do you have any idea/guidance or previous experiences
regarding this kind of issue?
Cheers,
Diego
(*)
[2018-07-02 13:51:27.968006 +0000][Error ][AsyncSock ]
[[log in to unmask]:1094 #0.0] Unable to initiate the connection:
[ERROR] Socket error: Network is unreachable
(**)
failure when reading from 192.168.77.20:32294 (unknown site); failed
with error '[ERROR] Operation expired' (errno=0, code=206).
(***)
set rdtrCache=192.168.72.247
set rdtrPortCmsd=31112
set rdtrGlobal=xrootd-cms.infn.it
set rdtrGlobalPort=1094
set cacheLowWm=0.8
set cacheHiWm=0.9
set cacheLogLevel=error
set cachePath=/storage
set cacheRam=60
set cacheStreams=256
set prefetch=0
set blkSize=512k
all.export /
all.role server
oss.localroot $cachePath
xrd.port 32294
ofs.osslib libXrdPss.so
pss.cachelib libXrdFileCache.so
pss.origin $rdtrGlobal:$rdtrGlobalPort
pss.config streams 256 workers 16
pss.setopt ConnectTimeout 30
pss.setopt DebugLevel 3
pss.setopt RequestTimeout 30
xrootd.seclib /usr/lib64/libXrdSec.so
pfc.diskusage $cacheLowWm $cacheHiWm
pfc.ram ${cacheRam}g
pfc.blocksize $blkSize
pfc.prefetch $prefetch
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|