Print

Print


Hi Fabio,

I don't understand what is the problem with using the gridpp CVMFS 
repository? I think it is available everywhere in EGI as it should be 
part of the EGI CVMFS configuration rpm.  I can certainly see it also 
from lxplus (CERN)

aforti@lxplus783> ls /cvmfs/gridpp.egi.eu/lsst
sims_2_8_0  sims_2_9_0  sims_w_2019_10_1

cheers
alessandra

On 07/06/2019 16:06, Fabio Hernandez wrote:
> James,
>
> I propose we explore storing imSim tarball in Dirac and make several 
> replicas, including one replica at one SE at CC-IN2P3.
>
> Do you think that would be compatible with the mechanism that you use 
> to submit and execute the jobs?
>
> Cheers,
> Fabio Hernandez
> CNRS – IN2P3 computing centre <http://cc.in2p3.fr> · Lyon (France)     
> ·     e-mail: [log in to unmask] <mailto:[log in to unmask]>     ·     tel: 
> +33 4 78 93 08 80
>
>> On 7 Jun 2019, at 16:52, PERRY James <[log in to unmask] 
>> <mailto:[log in to unmask]>> wrote:
>>
>> Hi Dominique,
>>
>> It's the ImSim code. I think someone recommended not to clone it from
>> GitHub as this can result in GitHub blacklisting the worker nodes if
>> they do this too many times, so I went with the tarball instead. We
>> could try it using GitHub directly if you think it would be safe.
>>
>> Cheers,
>> James
>>
>>
>> On 07/06/2019 15:49, Dominique Boutigny wrote:
>>> Hi Alessandra and James,
>>>
>>> I add Fabio in the loop.
>>> I don't think that there is any problem to copy the tar ball at 
>>> CC-IN2P3.
>>> By the way what is this tar ball ? Is that the instance catalog or the
>>> imsim code ? If it is imsim, I thought that we decided to download it
>>> from github and to build it locally as it is very fast to do so.
>>>
>>> Cheers,
>>>
>>> Dominique
>>>
>>> On 07/06/2019 16:30, Alessandra Forti wrote:
>>>> We upgraded the system and changed the storage system configuration so
>>>> there might be other factors at play, but this was the first thing
>>>> that jumped out and until we reduce it we cannot know if other things
>>>> are affecting the responsiveness of the storage.
>>>>
>>>> Said that 1500 processes trying to access 1 file on 1 machine is not
>>>> healthy.
>>>>
>>>> cheers
>>>> alessandra
>>>>
>>>> On 07/06/2019 15:25, PERRY James wrote:
>>>>> Hi Alessandra,
>>>>>
>>>>> The site is CC. They didn't seem to want to mount the cvmfs repository
>>>>> but maybe we could convince them to.
>>>>>
>>>>> I can download the file explicitly instead when required. Sorry, I
>>>>> hadn't realised that this would put such a load on the system.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> On 07/06/2019 15:16, Alessandra Forti wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> Is there a reason why they can't mount it? Is it LAPP or CC?
>>>>>>
>>>>>> I would recommend that you don't use the software as an input but you
>>>>>> download it explicitely from the job if you cannot find it in CVMFS.
>>>>>> And/or the tarball should be copied to the French site storage 
>>>>>> closest
>>>>>> to their nodes.
>>>>>>
>>>>>> The tarball on our storage was being accessed by 1500 processes
>>>>>> concurrently on the same machine earlier today and I had already to
>>>>>> replicate 3 times the file to try to spread the load on others.  I'm
>>>>>> surprised you didn't have time outs.
>>>>>>
>>>>>> cheers
>>>>>> alessandra
>>>>>>
>>>>>> On 07/06/2019 14:59, PERRY James wrote:
>>>>>>> Hi Alessandra,
>>>>>>>
>>>>>>> We are mostly using CVMFS, but one of the compute nodes in France
>>>>>>> doesn't mount our CVMFS repository so we need the tarball for that
>>>>>>> one.
>>>>>>> Unfortunately because I can't predict when I submit a job whether it
>>>>>>> will go to that node or not, all the jobs have the tarball listed
>>>>>>> as an
>>>>>>> input file. I tried uploading copies to other storage elements 
>>>>>>> as well
>>>>>>> when I first put it on the grid, but at the time only Manchester was
>>>>>>> working for me. I'm happy to discuss other solutions to this if it's
>>>>>>> causing problems.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> James
>>>>>>>
>>>>>>>
>>>>>>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>>>>>>> Hi James,
>>>>>>>>
>>>>>>>> can you let me know how you do software distribution? It seems you
>>>>>>>> have
>>>>>>>> 1 single tarball on the Manchester storage that is creating a large
>>>>>>>> amount of connections.
>>>>>>>>
>>>>>>>> They might be among the causes of the current load we are
>>>>>>>> experiencing.
>>>>>>>> Manchester isn't running anything at the moment, so either those
>>>>>>>> are ill
>>>>>>>> closed connections (could be) or the tar ball you have on the
>>>>>>>> manchester
>>>>>>>> storage is the only source access by WNs at other sites in the UK.
>>>>>>>>
>>>>>>>> We always said that until the software was in development and LSST
>>>>>>>> run
>>>>>>>> smaller scale the storage was fine, but it wouldn't work if too 
>>>>>>>> many
>>>>>>>> jobs tried to access the same file on one storage. Have you thought
>>>>>>>> about using cvmfs or at the very least replicate the tarball at 
>>>>>>>> other
>>>>>>>> sites?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> alessandra
>>>>>>>>
>>>>>>> --
>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>> James Perry                Room 2.41, Bayes Centre
>>>>>>> Software Architect         The University of Edinburgh
>>>>>>> EPCC                       47 Potterrow
>>>>>>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>
>>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> James Perry                Room 2.41, Bayes Centre
>> Software Architect         The University of Edinburgh
>> EPCC                       47 Potterrow
>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>

-- 
Respect is a rational process. \\//
For Ur-Fascism, disagreement is treason. (U. Eco)


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1