Print

Print


Hi Alessandra and James,

I add Fabio in the loop.
I don't think that there is any problem to copy the tar ball at CC-IN2P3.
By the way what is this tar ball ? Is that the instance catalog or the 
imsim code ? If it is imsim, I thought that we decided to download it 
from github and to build it locally as it is very fast to do so.

Cheers,

Dominique

On 07/06/2019 16:30, Alessandra Forti wrote:
> We upgraded the system and changed the storage system configuration so 
> there might be other factors at play, but this was the first thing 
> that jumped out and until we reduce it we cannot know if other things 
> are affecting the responsiveness of the storage.
>
> Said that 1500 processes trying to access 1 file on 1 machine is not 
> healthy.
>
> cheers
> alessandra
>
> On 07/06/2019 15:25, PERRY James wrote:
>> Hi Alessandra,
>>
>> The site is CC. They didn't seem to want to mount the cvmfs repository
>> but maybe we could convince them to.
>>
>> I can download the file explicitly instead when required. Sorry, I
>> hadn't realised that this would put such a load on the system.
>>
>> Thanks,
>> James
>>
>>
>> On 07/06/2019 15:16, Alessandra Forti wrote:
>>> Hi James,
>>>
>>> Is there a reason why they can't mount it? Is it LAPP or CC?
>>>
>>> I would recommend that you don't use the software as an input but you
>>> download it explicitely from the job if you cannot find it in CVMFS.
>>> And/or the tarball should be copied to the French site storage closest
>>> to their nodes.
>>>
>>> The tarball on our storage was being accessed by 1500 processes
>>> concurrently on the same machine earlier today and I had already to
>>> replicate 3 times the file to try to spread the load on others.  I'm
>>> surprised you didn't have time outs.
>>>
>>> cheers
>>> alessandra
>>>
>>> On 07/06/2019 14:59, PERRY James wrote:
>>>> Hi Alessandra,
>>>>
>>>> We are mostly using CVMFS, but one of the compute nodes in France
>>>> doesn't mount our CVMFS repository so we need the tarball for that 
>>>> one.
>>>> Unfortunately because I can't predict when I submit a job whether it
>>>> will go to that node or not, all the jobs have the tarball listed 
>>>> as an
>>>> input file. I tried uploading copies to other storage elements as well
>>>> when I first put it on the grid, but at the time only Manchester was
>>>> working for me. I'm happy to discuss other solutions to this if it's
>>>> causing problems.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>>>> Hi James,
>>>>>
>>>>> can you let me know how you do software distribution? It seems you 
>>>>> have
>>>>> 1 single tarball on the Manchester storage that is creating a large
>>>>> amount of connections.
>>>>>
>>>>> They might be among the causes of the current load we are 
>>>>> experiencing.
>>>>> Manchester isn't running anything at the moment, so either those 
>>>>> are ill
>>>>> closed connections (could be) or the tar ball you have on the 
>>>>> manchester
>>>>> storage is the only source access by WNs at other sites in the UK.
>>>>>
>>>>> We always said that until the software was in development and LSST 
>>>>> run
>>>>> smaller scale the storage was fine, but it wouldn't work if too many
>>>>> jobs tried to access the same file on one storage. Have you thought
>>>>> about using cvmfs or at the very least replicate the tarball at other
>>>>> sites?
>>>>>
>>>>> thanks
>>>>>
>>>>> cheers
>>>>> alessandra
>>>>>
>>>> -- 
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> James Perry                Room 2.41, Bayes Centre
>>>> Software Architect         The University of Edinburgh
>>>> EPCC                       47 Potterrow
>>>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>

-- 
Dominique Boutigny

Laboratoire d'Annecy-le-Vieux de Physique des Particules
Université Savoie Mont Blanc - CNRS/IN2P3
Tel : 04.50.09.17.81 - 07.83.59.06.65
Bureau: 1-116

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1