Hi Diego,
How do you have the ulimits set? The nofile also limits the number of sockets.
That's what we use at UCSD:
[0741] root@xcache-01 ~# cat /etc/security/limits.d/50-xrootd.conf
xrootd soft nproc 20000
xrootd hard nproc 21000
xrootd soft nofile 99000
xrootd hard nofile 100000
* soft core unlimited
Cheers,
Matevz
On 7/2/18 4:23 PM, Diego Ciangottini wrote:
> Dear experts,
>
> I'm using a proxy file cache server to serve inputs for a computing cluster
> deployed on a cloud environment for CMS experiment workflows.
> Actually it consists in a cluster of 3 machines under a common redirector, the
> flavor of the machine is with quite high RAM (256GB) and low latency high IO
> volume (10TB each), the bandwidth is 10Gbps for each server.
>
> So, the setup is working quite nicely but, starting from around 800-1000
> concurrent jobs, we started to face with an increasing frequency connection
> errors server-side like this (*), corresponding to client failure as (**).
> Unfortunately I did not manage to find out more debugging information :/
> Moreover they look not really correlated to the origin server chosen, so I
> suspect that could be something related to the network or the cache host machine
> configuration.
> You can find here (***) the relevant part of the cache xrd configuration. Do you
> have any idea/guidance or previous experiences regarding this kind of issue?
>
> Cheers,
> Diego
>
> (*)
> [2018-07-02 13:51:27.968006 +0000][Error ][AsyncSock ]
> [[log in to unmask]:1094 #0.0] Unable to initiate the connection: [ERROR]
> Socket error: Network is unreachable
> (**)
> failure when reading from 192.168.77.20:32294 (unknown site); failed with error
> '[ERROR] Operation expired' (errno=0, code=206).
> (***)
> set rdtrCache=192.168.72.247
> set rdtrPortCmsd=31112
> set rdtrGlobal=xrootd-cms.infn.it
> set rdtrGlobalPort=1094
> set cacheLowWm=0.8
> set cacheHiWm=0.9
> set cacheLogLevel=error
> set cachePath=/storage
> set cacheRam=60
> set cacheStreams=256
> set prefetch=0
> set blkSize=512k
>
> all.export /
> all.role server
> oss.localroot $cachePath
>
> xrd.port 32294
> ofs.osslib libXrdPss.so
> pss.cachelib libXrdFileCache.so
>
> pss.origin $rdtrGlobal:$rdtrGlobalPort
>
> pss.config streams 256 workers 16
> pss.setopt ConnectTimeout 30
> pss.setopt DebugLevel 3
> pss.setopt RequestTimeout 30
>
> xrootd.seclib /usr/lib64/libXrdSec.so
>
> pfc.diskusage $cacheLowWm $cacheHiWm
> pfc.ram ${cacheRam}g
>
> pfc.blocksize $blkSize
> pfc.prefetch $prefetch
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|