Print

Print


Hi Sam,

Thhe problem is that the returned system error doesn't differentiate the 
two cases. We could add an additional check and manually differentiate it. 
If that is valuable to you please cut a ticket.

Andy


On Wed, 20 Jan 2021, Sam Skipsey wrote:

> Hi everyone,
>
> So the initial reply gave me an idea, and I did check this - the problem
> was outside of Xrootd as Matevz predicted.
> (The actual problem is that the metadata filesystem [the one that xrootd
> spaces need to put a directory hierarchy in to build their symlink
> redirections into the spaces themselves] had run out of inodes - it was
> full of empty directories left by ATLAS' very silly path hierarchy, and
> adding an additional endpoint into the cache had tipped it over into using
> up all the inodes available. I recreated the metadata filesystem (taking a
> copy of the contents) with four times as many inodes, and it's all working
> again.)
>
> I'm not sure this is very obvious as a failure mode from the logs - the
> whole "no space left on device" error implies that the spaces have run out
> of capacity, not the mandatory metadata/symlinks filesystem.
>
> Sam
>
> On Wed, 20 Jan 2021 at 21:09, Matevz Tadel <[log in to unmask]> wrote:
>
>> On 1/20/21 12:47 PM, Andrew Hanushevsky wrote:
>>>
>>>
>>> On Wed, 20 Jan 2021, Matevz Tadel wrote:
>>>
>>>> Andy, what would be the simplest way to reproduce this? Running
>> standalone
>>>> mini-server with localroot on a ssd partition and then writing in with
>> xrdcp?
>>>> Are there some tracing options that would help?
>>> That's likely the best way but it won't recreate the original
>> environment and
>>> that may be the problem. Imagine that the oldest file accounted for 30%
>> of the
>>> cache space. Well, purge will still purge that file and fall far below
>> the low
>>> water mark. Then it will just sit there until the space gets used up.
>> Sounds
>>> like a rational explanation?
>>
>> No, see my first reply. Opening a file and purge are completely separate.
>> So the
>> open error is a real error from the FS.
>>
>> Sam says all partitions are at 70% so this makes no sense at all ... open
>> shouldn't even be bother about the lack of space. Oh, unless there is
>> something
>> strange going on with directory entries and maximum number is reached
>> there,
>> somehow.
>>
>> Sam, what FS are you using? Do you do some parameter tuning?
>>
>> For what VO is this? Do they have a flat namespace like ATLAS, i.e.,
>> everything
>> gets cached into the same directory?
>>
>> In your test, did you try writing into the cache directory itself or
>> somewhere
>> else / top-level?
>>
>> Matevz
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1