Print

Print


James,

I propose we explore storing imSim tarball in Dirac and make several replicas, including one replica at one SE at CC-IN2P3.

Do you think that would be compatible with the mechanism that you use to submit and execute the jobs?

Cheers,
Fabio Hernandez
CNRS – IN2P3 computing centre <http://cc.in2p3.fr/> · Lyon (France)     ·     e-mail: [log in to unmask] <mailto:[log in to unmask]>     ·     tel: +33 4 78 93 08 80

> On 7 Jun 2019, at 16:52, PERRY James <[log in to unmask]> wrote:
> 
> Hi Dominique,
> 
> It's the ImSim code. I think someone recommended not to clone it from 
> GitHub as this can result in GitHub blacklisting the worker nodes if 
> they do this too many times, so I went with the tarball instead. We 
> could try it using GitHub directly if you think it would be safe.
> 
> Cheers,
> James
> 
> 
> On 07/06/2019 15:49, Dominique Boutigny wrote:
>> Hi Alessandra and James,
>> 
>> I add Fabio in the loop.
>> I don't think that there is any problem to copy the tar ball at CC-IN2P3.
>> By the way what is this tar ball ? Is that the instance catalog or the 
>> imsim code ? If it is imsim, I thought that we decided to download it 
>> from github and to build it locally as it is very fast to do so.
>> 
>> Cheers,
>> 
>> Dominique
>> 
>> On 07/06/2019 16:30, Alessandra Forti wrote:
>>> We upgraded the system and changed the storage system configuration so 
>>> there might be other factors at play, but this was the first thing 
>>> that jumped out and until we reduce it we cannot know if other things 
>>> are affecting the responsiveness of the storage.
>>> 
>>> Said that 1500 processes trying to access 1 file on 1 machine is not 
>>> healthy.
>>> 
>>> cheers
>>> alessandra
>>> 
>>> On 07/06/2019 15:25, PERRY James wrote:
>>>> Hi Alessandra,
>>>> 
>>>> The site is CC. They didn't seem to want to mount the cvmfs repository
>>>> but maybe we could convince them to.
>>>> 
>>>> I can download the file explicitly instead when required. Sorry, I
>>>> hadn't realised that this would put such a load on the system.
>>>> 
>>>> Thanks,
>>>> James
>>>> 
>>>> 
>>>> On 07/06/2019 15:16, Alessandra Forti wrote:
>>>>> Hi James,
>>>>> 
>>>>> Is there a reason why they can't mount it? Is it LAPP or CC?
>>>>> 
>>>>> I would recommend that you don't use the software as an input but you
>>>>> download it explicitely from the job if you cannot find it in CVMFS.
>>>>> And/or the tarball should be copied to the French site storage closest
>>>>> to their nodes.
>>>>> 
>>>>> The tarball on our storage was being accessed by 1500 processes
>>>>> concurrently on the same machine earlier today and I had already to
>>>>> replicate 3 times the file to try to spread the load on others.  I'm
>>>>> surprised you didn't have time outs.
>>>>> 
>>>>> cheers
>>>>> alessandra
>>>>> 
>>>>> On 07/06/2019 14:59, PERRY James wrote:
>>>>>> Hi Alessandra,
>>>>>> 
>>>>>> We are mostly using CVMFS, but one of the compute nodes in France
>>>>>> doesn't mount our CVMFS repository so we need the tarball for that 
>>>>>> one.
>>>>>> Unfortunately because I can't predict when I submit a job whether it
>>>>>> will go to that node or not, all the jobs have the tarball listed 
>>>>>> as an
>>>>>> input file. I tried uploading copies to other storage elements as well
>>>>>> when I first put it on the grid, but at the time only Manchester was
>>>>>> working for me. I'm happy to discuss other solutions to this if it's
>>>>>> causing problems.
>>>>>> 
>>>>>> Thanks,
>>>>>> James
>>>>>> 
>>>>>> 
>>>>>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>>>>>> Hi James,
>>>>>>> 
>>>>>>> can you let me know how you do software distribution? It seems you 
>>>>>>> have
>>>>>>> 1 single tarball on the Manchester storage that is creating a large
>>>>>>> amount of connections.
>>>>>>> 
>>>>>>> They might be among the causes of the current load we are 
>>>>>>> experiencing.
>>>>>>> Manchester isn't running anything at the moment, so either those 
>>>>>>> are ill
>>>>>>> closed connections (could be) or the tar ball you have on the 
>>>>>>> manchester
>>>>>>> storage is the only source access by WNs at other sites in the UK.
>>>>>>> 
>>>>>>> We always said that until the software was in development and LSST 
>>>>>>> run
>>>>>>> smaller scale the storage was fine, but it wouldn't work if too many
>>>>>>> jobs tried to access the same file on one storage. Have you thought
>>>>>>> about using cvmfs or at the very least replicate the tarball at other
>>>>>>> sites?
>>>>>>> 
>>>>>>> thanks
>>>>>>> 
>>>>>>> cheers
>>>>>>> alessandra
>>>>>>> 
>>>>>> -- 
>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>> James Perry                Room 2.41, Bayes Centre
>>>>>> Software Architect         The University of Edinburgh
>>>>>> EPCC                       47 Potterrow
>>>>>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>> Scotland, with registration number SC005336.
>>> 
>> 
> 
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> James Perry                Room 2.41, Bayes Centre
> Software Architect         The University of Edinburgh
> EPCC                       47 Potterrow
> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1