Print

Print


Hi Dominique,

It's the ImSim code. I think someone recommended not to clone it from 
GitHub as this can result in GitHub blacklisting the worker nodes if 
they do this too many times, so I went with the tarball instead. We 
could try it using GitHub directly if you think it would be safe.

Cheers,
James


On 07/06/2019 15:49, Dominique Boutigny wrote:
> Hi Alessandra and James,
> 
> I add Fabio in the loop.
> I don't think that there is any problem to copy the tar ball at CC-IN2P3.
> By the way what is this tar ball ? Is that the instance catalog or the 
> imsim code ? If it is imsim, I thought that we decided to download it 
> from github and to build it locally as it is very fast to do so.
> 
> Cheers,
> 
> Dominique
> 
> On 07/06/2019 16:30, Alessandra Forti wrote:
>> We upgraded the system and changed the storage system configuration so 
>> there might be other factors at play, but this was the first thing 
>> that jumped out and until we reduce it we cannot know if other things 
>> are affecting the responsiveness of the storage.
>>
>> Said that 1500 processes trying to access 1 file on 1 machine is not 
>> healthy.
>>
>> cheers
>> alessandra
>>
>> On 07/06/2019 15:25, PERRY James wrote:
>>> Hi Alessandra,
>>>
>>> The site is CC. They didn't seem to want to mount the cvmfs repository
>>> but maybe we could convince them to.
>>>
>>> I can download the file explicitly instead when required. Sorry, I
>>> hadn't realised that this would put such a load on the system.
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On 07/06/2019 15:16, Alessandra Forti wrote:
>>>> Hi James,
>>>>
>>>> Is there a reason why they can't mount it? Is it LAPP or CC?
>>>>
>>>> I would recommend that you don't use the software as an input but you
>>>> download it explicitely from the job if you cannot find it in CVMFS.
>>>> And/or the tarball should be copied to the French site storage closest
>>>> to their nodes.
>>>>
>>>> The tarball on our storage was being accessed by 1500 processes
>>>> concurrently on the same machine earlier today and I had already to
>>>> replicate 3 times the file to try to spread the load on others.  I'm
>>>> surprised you didn't have time outs.
>>>>
>>>> cheers
>>>> alessandra
>>>>
>>>> On 07/06/2019 14:59, PERRY James wrote:
>>>>> Hi Alessandra,
>>>>>
>>>>> We are mostly using CVMFS, but one of the compute nodes in France
>>>>> doesn't mount our CVMFS repository so we need the tarball for that 
>>>>> one.
>>>>> Unfortunately because I can't predict when I submit a job whether it
>>>>> will go to that node or not, all the jobs have the tarball listed 
>>>>> as an
>>>>> input file. I tried uploading copies to other storage elements as well
>>>>> when I first put it on the grid, but at the time only Manchester was
>>>>> working for me. I'm happy to discuss other solutions to this if it's
>>>>> causing problems.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> can you let me know how you do software distribution? It seems you 
>>>>>> have
>>>>>> 1 single tarball on the Manchester storage that is creating a large
>>>>>> amount of connections.
>>>>>>
>>>>>> They might be among the causes of the current load we are 
>>>>>> experiencing.
>>>>>> Manchester isn't running anything at the moment, so either those 
>>>>>> are ill
>>>>>> closed connections (could be) or the tar ball you have on the 
>>>>>> manchester
>>>>>> storage is the only source access by WNs at other sites in the UK.
>>>>>>
>>>>>> We always said that until the software was in development and LSST 
>>>>>> run
>>>>>> smaller scale the storage was fine, but it wouldn't work if too many
>>>>>> jobs tried to access the same file on one storage. Have you thought
>>>>>> about using cvmfs or at the very least replicate the tarball at other
>>>>>> sites?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>> cheers
>>>>>> alessandra
>>>>>>
>>>>> -- 
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> James Perry                Room 2.41, Bayes Centre
>>>>> Software Architect         The University of Edinburgh
>>>>> EPCC                       47 Potterrow
>>>>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry                Room 2.41, Bayes Centre
Software Architect         The University of Edinburgh
EPCC                       47 Potterrow
Tel: +44 131 650 5173      Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1