James, I propose we explore storing imSim tarball in Dirac and make several replicas, including one replica at one SE at CC-IN2P3. Do you think that would be compatible with the mechanism that you use to submit and execute the jobs? Cheers, Fabio Hernandez CNRS – IN2P3 computing centre <http://cc.in2p3.fr/> · Lyon (France) · e-mail: [log in to unmask] <mailto:[log in to unmask]> · tel: +33 4 78 93 08 80 > On 7 Jun 2019, at 16:52, PERRY James <[log in to unmask]> wrote: > > Hi Dominique, > > It's the ImSim code. I think someone recommended not to clone it from > GitHub as this can result in GitHub blacklisting the worker nodes if > they do this too many times, so I went with the tarball instead. We > could try it using GitHub directly if you think it would be safe. > > Cheers, > James > > > On 07/06/2019 15:49, Dominique Boutigny wrote: >> Hi Alessandra and James, >> >> I add Fabio in the loop. >> I don't think that there is any problem to copy the tar ball at CC-IN2P3. >> By the way what is this tar ball ? Is that the instance catalog or the >> imsim code ? If it is imsim, I thought that we decided to download it >> from github and to build it locally as it is very fast to do so. >> >> Cheers, >> >> Dominique >> >> On 07/06/2019 16:30, Alessandra Forti wrote: >>> We upgraded the system and changed the storage system configuration so >>> there might be other factors at play, but this was the first thing >>> that jumped out and until we reduce it we cannot know if other things >>> are affecting the responsiveness of the storage. >>> >>> Said that 1500 processes trying to access 1 file on 1 machine is not >>> healthy. >>> >>> cheers >>> alessandra >>> >>> On 07/06/2019 15:25, PERRY James wrote: >>>> Hi Alessandra, >>>> >>>> The site is CC. They didn't seem to want to mount the cvmfs repository >>>> but maybe we could convince them to. >>>> >>>> I can download the file explicitly instead when required. Sorry, I >>>> hadn't realised that this would put such a load on the system. >>>> >>>> Thanks, >>>> James >>>> >>>> >>>> On 07/06/2019 15:16, Alessandra Forti wrote: >>>>> Hi James, >>>>> >>>>> Is there a reason why they can't mount it? Is it LAPP or CC? >>>>> >>>>> I would recommend that you don't use the software as an input but you >>>>> download it explicitely from the job if you cannot find it in CVMFS. >>>>> And/or the tarball should be copied to the French site storage closest >>>>> to their nodes. >>>>> >>>>> The tarball on our storage was being accessed by 1500 processes >>>>> concurrently on the same machine earlier today and I had already to >>>>> replicate 3 times the file to try to spread the load on others. I'm >>>>> surprised you didn't have time outs. >>>>> >>>>> cheers >>>>> alessandra >>>>> >>>>> On 07/06/2019 14:59, PERRY James wrote: >>>>>> Hi Alessandra, >>>>>> >>>>>> We are mostly using CVMFS, but one of the compute nodes in France >>>>>> doesn't mount our CVMFS repository so we need the tarball for that >>>>>> one. >>>>>> Unfortunately because I can't predict when I submit a job whether it >>>>>> will go to that node or not, all the jobs have the tarball listed >>>>>> as an >>>>>> input file. I tried uploading copies to other storage elements as well >>>>>> when I first put it on the grid, but at the time only Manchester was >>>>>> working for me. I'm happy to discuss other solutions to this if it's >>>>>> causing problems. >>>>>> >>>>>> Thanks, >>>>>> James >>>>>> >>>>>> >>>>>> On 07/06/2019 14:52, Alessandra Forti wrote: >>>>>>> Hi James, >>>>>>> >>>>>>> can you let me know how you do software distribution? It seems you >>>>>>> have >>>>>>> 1 single tarball on the Manchester storage that is creating a large >>>>>>> amount of connections. >>>>>>> >>>>>>> They might be among the causes of the current load we are >>>>>>> experiencing. >>>>>>> Manchester isn't running anything at the moment, so either those >>>>>>> are ill >>>>>>> closed connections (could be) or the tar ball you have on the >>>>>>> manchester >>>>>>> storage is the only source access by WNs at other sites in the UK. >>>>>>> >>>>>>> We always said that until the software was in development and LSST >>>>>>> run >>>>>>> smaller scale the storage was fine, but it wouldn't work if too many >>>>>>> jobs tried to access the same file on one storage. Have you thought >>>>>>> about using cvmfs or at the very least replicate the tarball at other >>>>>>> sites? >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> cheers >>>>>>> alessandra >>>>>>> >>>>>> -- >>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> James Perry Room 2.41, Bayes Centre >>>>>> Software Architect The University of Edinburgh >>>>>> EPCC 47 Potterrow >>>>>> Tel: +44 131 650 5173 Edinburgh, EH8 9BT >>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>> >> > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > James Perry Room 2.41, Bayes Centre > Software Architect The University of Edinburgh > EPCC 47 Potterrow > Tel: +44 131 650 5173 Edinburgh, EH8 9BT > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the LSST-DESC-GRID list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1