Print

Print


Well, frankly this is quite odd. My presumption is that the application that 
gets the errors is actually xrdcp, right? If so, then the amount of space it 
is asking to reserve exceeds the amount available. Given the figures below, 
that would indicate a serious bug in that calculation. Unfortunately, there 
is no debugging level one can set to display this information. So, can one 
trap the failing copy (if it is a copy) so we can see what it is actually 
doing? Is this a third party copy? If so, what is the actual command being 
used?

Andy

-----Original Message----- 
From: Adrian Sevcenco
Sent: Friday, June 16, 2017 2:29 PM
To: Andrew Hanushevsky
Cc: [log in to unmask]
Subject: Re: Unable to create : no space left on device

On 06/16/2017 01:34 PM, Andrew Hanushevsky wrote:
> Hi Yvan,
Hi everyone!
We have the same problem of "full space" even if there is enough space

the reporting done to the redirector shows this :

aliprod@rd: manager $ grep do_Load cmslog
170617 00:06:45 16327 server.2586:23@[::XXX.22]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=98% 188141MB load=0 mass=78
170617 00:06:45 16345 server.32460:20@[::XXX.27]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=85% 5846385MB load=0 mass=68
170617 00:06:45 25209 server.6851:18@[::XXX.24]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=98% 314084MB load=0 mass=78
170617 00:06:45 16564 server.21467:19@[::XXX.25]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=98% 291060MB load=0 mass=78
170617 00:06:45 29507 server.15118:22@[::XXX.23]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=96% 598157MB load=0 mass=76
170617 00:06:45 16323 server.25599:21@[::XXX.26]:1094 do_Load: cpu=0
net=0 xeq=0 mem=0 pag=0 dsk=98% 404141MB load=0 mass=78

and the xrootd space situation is like this:
[Saturday 17.06.17 00:17] adrian@sev : ~  $
xrd_status rd.MYDOMAIN
Xrootd cluster name is : ALICE::ISS::FILE and is running xrootd version
v4.5.0
Total space in xrootd cluster : rd.MYDOMAIN
Total space (GiB) :     471919.25
Free space (GiB) :      45949.40

xrootd storage server : storage04.MYDOMAIN:1094
Total space (GiB) :     40334.79
Free space (GiB) :      919.06

xrootd storage server : storage02.MYDOMAIN:1094
Total space (GiB) :     18333.92
Free space (GiB) :      367.12

xrootd storage server : storage05.MYDOMAIN:1094
Total space (GiB) :     40334.79
Free space (GiB) :      851.97

xrootd storage server : storage07.MYDOMAIN:1094
Total space (GiB) :     249087.66
Free space (GiB) :      39693.96

xrootd storage server : storage06.MYDOMAIN:1094
Total space (GiB) :     83485.25
Free space (GiB) :      2365.25

xrootd storage server : storage03.MYDOMAIN:1094
Total space (GiB) :     40342.84
Free space (GiB) :      1752.05

I would like to use all space available up to 32GB free on a partition

I tried using this on redirector :
# http://xrootd.org/doc/dev45/cms_config.htm#_Toc454223038
cms.space min 64g 32g

and on servers (even if it is the default):
oss.alloc 0 0 0

but i still got full disk messages ...

are there any other knobs for tweaking the usage of space?

Thank you!!
Adrian


>
> If that is the case, then the cms.space is set too low and it is selecting 
> servers who have "enough" space but the "enough" is not that much. You 
> will see in the log a periodic reporting of free space statistics from all 
> your servers. The lines look something like:
>
> 170616 01:17:28 26648 Node: xxxxxx.slac.stanford.edu load=0; cpu=0 net=0 
> inq=0 mem=0 pag=0 dsk=0 utl=0 shr=[100 73 0]
>
> could collect all of them for your servers and send tem to me (you hide 
> any sensitive info).
>
> Andy
>
> On Fri, 16 Jun 2017, Yvan Calas wrote:
>
>>> On 15 Jun 2017, at 18:53, Yvan Calas <[log in to unmask]> wrote:
>>>
>>> I would like to understand how to correctly set up the parameter 
>>> cms.space in XRootD. We currently observe error messages like this one 
>>> on our servers:
>>>
>>> 170615 18:42:06 285690 XrootdXeq: alisgm76.15597:26@[::xxx.xxx.xx.x] pub 
>>> IPv4 login as alisgm76
>>> 170615 18:42:06 285690 ofs_open: alisgm76.15597:26@[::xxx.xxx.xx.x] 
>>> Unable to create /03/01818/8cb7a7f4-51e9-11e7-a169-ef6caac907fd; no 
>>> space left on device
>>> 170615 18:42:06 285690 XrootdXeq: alisgm76.15597:26@[::xxx.xxx.xx.x] 
>>> disc 0:00:00
>>>
>>> In order to solve this issue, I reduced the recalc parameter to 1 
>>> minutes and try different thresholds for the "min" and "hwm" parameters 
>>> as described in [1]. Actually, we have:
>>>
>>> cms.space linger 0 recalc 1 min 100g 150g
>>>
>>> and the xrootd partition is as follow:
>>>
>>> Filesystem            Size  Used Avail Use% Mounted on
>>> /dev/mapper/datavg-data
>>>                      110T  110T  109G 100% /xrootd
>>>
>>> I tried to change the value of the hwm parameter (below and above 100g) 
>>> but without success, the error messages remain.
>>>
>>> Could you please tell me if there is something wrong with my setting?
>>
>> I would like to add that some of our servers have a lot of free 
>> (available) space, so they should be chosen by the redirector IMHO. Do 
>> you think this is a bug (we are currently running XRootD v4.2.3-3).
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>


-- 
----------------------------------------------
Adrian Sevcenco, Ph.D.                       |
Institute of Space Science - ISS, Romania    |
adrian.sevcenco at {cern.ch,spacescience.ro} |
----------------------------------------------


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1