Print

Print


Hi Diego,

How do you have the ulimits set? The nofile also limits the number of sockets.

That's what we use at UCSD:

[0741] root@xcache-01 ~# cat /etc/security/limits.d/50-xrootd.conf
xrootd     soft    nproc     20000
xrootd     hard    nproc     21000
xrootd     soft    nofile    99000
xrootd     hard    nofile    100000
*          soft    core      unlimited

Cheers,
Matevz

On 7/2/18 4:23 PM, Diego Ciangottini wrote:
> Dear experts,
> 
> I'm using a proxy file cache server to serve inputs for a computing cluster 
> deployed on a cloud environment for CMS experiment workflows.
> Actually it consists in a cluster of 3 machines under a common redirector, the 
> flavor of the machine is with quite high RAM (256GB) and low latency high IO 
> volume (10TB each), the bandwidth is 10Gbps for each server.
> 
> So, the setup is working quite nicely but, starting from around 800-1000 
> concurrent jobs, we started to face with an increasing frequency connection 
> errors server-side like this (*), corresponding to client failure as (**). 
> Unfortunately I did not manage to find out more debugging information :/ 
> Moreover they look not really correlated to the origin server chosen, so I 
> suspect that could be something related to the network or the cache host machine 
> configuration.
> You can find here (***) the relevant part of the cache xrd configuration. Do you 
> have any idea/guidance or previous experiences regarding this kind of issue?
> 
> Cheers,
> Diego
> 
> (*)
> [2018-07-02 13:51:27.968006 +0000][Error  ][AsyncSock         ] 
> [[log in to unmask]:1094 #0.0] Unable to initiate the connection: [ERROR] 
> Socket error: Network is unreachable
> (**)
> failure when reading from 192.168.77.20:32294 (unknown site); failed with error 
> '[ERROR] Operation expired' (errno=0, code=206).
> (***)
> set rdtrCache=192.168.72.247
> set rdtrPortCmsd=31112
> set rdtrGlobal=xrootd-cms.infn.it
> set rdtrGlobalPort=1094
> set cacheLowWm=0.8
> set cacheHiWm=0.9
> set cacheLogLevel=error
> set cachePath=/storage
> set cacheRam=60
> set cacheStreams=256
> set prefetch=0
> set blkSize=512k
> 
> all.export /
> all.role  server
> oss.localroot $cachePath
> 
> xrd.port 32294
> ofs.osslib   libXrdPss.so
> pss.cachelib libXrdFileCache.so
> 
> pss.origin $rdtrGlobal:$rdtrGlobalPort
> 
> pss.config streams 256 workers 16
> pss.setopt ConnectTimeout 30
> pss.setopt DebugLevel 3
> pss.setopt RequestTimeout 30
> 
> xrootd.seclib /usr/lib64/libXrdSec.so
> 
> pfc.diskusage $cacheLowWm $cacheHiWm
> pfc.ram       ${cacheRam}g
> 
> pfc.blocksize   $blkSize
> pfc.prefetch    $prefetch
> 
> ########################################################################
> Use REPLY-ALL to reply to list
> 
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1