Hi Diego, How do you have the ulimits set? The nofile also limits the number of sockets. That's what we use at UCSD: [0741] root@xcache-01 ~# cat /etc/security/limits.d/50-xrootd.conf xrootd soft nproc 20000 xrootd hard nproc 21000 xrootd soft nofile 99000 xrootd hard nofile 100000 * soft core unlimited Cheers, Matevz On 7/2/18 4:23 PM, Diego Ciangottini wrote: > Dear experts, > > I'm using a proxy file cache server to serve inputs for a computing cluster > deployed on a cloud environment for CMS experiment workflows. > Actually it consists in a cluster of 3 machines under a common redirector, the > flavor of the machine is with quite high RAM (256GB) and low latency high IO > volume (10TB each), the bandwidth is 10Gbps for each server. > > So, the setup is working quite nicely but, starting from around 800-1000 > concurrent jobs, we started to face with an increasing frequency connection > errors server-side like this (*), corresponding to client failure as (**). > Unfortunately I did not manage to find out more debugging information :/ > Moreover they look not really correlated to the origin server chosen, so I > suspect that could be something related to the network or the cache host machine > configuration. > You can find here (***) the relevant part of the cache xrd configuration. Do you > have any idea/guidance or previous experiences regarding this kind of issue? > > Cheers, > Diego > > (*) > [2018-07-02 13:51:27.968006 +0000][Error ][AsyncSock ] > [[log in to unmask]:1094 #0.0] Unable to initiate the connection: [ERROR] > Socket error: Network is unreachable > (**) > failure when reading from 192.168.77.20:32294 (unknown site); failed with error > '[ERROR] Operation expired' (errno=0, code=206). > (***) > set rdtrCache=192.168.72.247 > set rdtrPortCmsd=31112 > set rdtrGlobal=xrootd-cms.infn.it > set rdtrGlobalPort=1094 > set cacheLowWm=0.8 > set cacheHiWm=0.9 > set cacheLogLevel=error > set cachePath=/storage > set cacheRam=60 > set cacheStreams=256 > set prefetch=0 > set blkSize=512k > > all.export / > all.role server > oss.localroot $cachePath > > xrd.port 32294 > ofs.osslib libXrdPss.so > pss.cachelib libXrdFileCache.so > > pss.origin $rdtrGlobal:$rdtrGlobalPort > > pss.config streams 256 workers 16 > pss.setopt ConnectTimeout 30 > pss.setopt DebugLevel 3 > pss.setopt RequestTimeout 30 > > xrootd.seclib /usr/lib64/libXrdSec.so > > pfc.diskusage $cacheLowWm $cacheHiWm > pfc.ram ${cacheRam}g > > pfc.blocksize $blkSize > pfc.prefetch $prefetch > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1