Hi Fabio, I don't understand what is the problem with using the gridpp CVMFS repository? I think it is available everywhere in EGI as it should be part of the EGI CVMFS configuration rpm. I can certainly see it also from lxplus (CERN) aforti@lxplus783> ls /cvmfs/gridpp.egi.eu/lsst sims_2_8_0 sims_2_9_0 sims_w_2019_10_1 cheers alessandra On 07/06/2019 16:06, Fabio Hernandez wrote: > James, > > I propose we explore storing imSim tarball in Dirac and make several > replicas, including one replica at one SE at CC-IN2P3. > > Do you think that would be compatible with the mechanism that you use > to submit and execute the jobs? > > Cheers, > Fabio Hernandez > CNRS – IN2P3 computing centre <http://cc.in2p3.fr> · Lyon (France) > · e-mail: [log in to unmask] <mailto:[log in to unmask]> · tel: > +33 4 78 93 08 80 > >> On 7 Jun 2019, at 16:52, PERRY James <[log in to unmask] >> <mailto:[log in to unmask]>> wrote: >> >> Hi Dominique, >> >> It's the ImSim code. I think someone recommended not to clone it from >> GitHub as this can result in GitHub blacklisting the worker nodes if >> they do this too many times, so I went with the tarball instead. We >> could try it using GitHub directly if you think it would be safe. >> >> Cheers, >> James >> >> >> On 07/06/2019 15:49, Dominique Boutigny wrote: >>> Hi Alessandra and James, >>> >>> I add Fabio in the loop. >>> I don't think that there is any problem to copy the tar ball at >>> CC-IN2P3. >>> By the way what is this tar ball ? Is that the instance catalog or the >>> imsim code ? If it is imsim, I thought that we decided to download it >>> from github and to build it locally as it is very fast to do so. >>> >>> Cheers, >>> >>> Dominique >>> >>> On 07/06/2019 16:30, Alessandra Forti wrote: >>>> We upgraded the system and changed the storage system configuration so >>>> there might be other factors at play, but this was the first thing >>>> that jumped out and until we reduce it we cannot know if other things >>>> are affecting the responsiveness of the storage. >>>> >>>> Said that 1500 processes trying to access 1 file on 1 machine is not >>>> healthy. >>>> >>>> cheers >>>> alessandra >>>> >>>> On 07/06/2019 15:25, PERRY James wrote: >>>>> Hi Alessandra, >>>>> >>>>> The site is CC. They didn't seem to want to mount the cvmfs repository >>>>> but maybe we could convince them to. >>>>> >>>>> I can download the file explicitly instead when required. Sorry, I >>>>> hadn't realised that this would put such a load on the system. >>>>> >>>>> Thanks, >>>>> James >>>>> >>>>> >>>>> On 07/06/2019 15:16, Alessandra Forti wrote: >>>>>> Hi James, >>>>>> >>>>>> Is there a reason why they can't mount it? Is it LAPP or CC? >>>>>> >>>>>> I would recommend that you don't use the software as an input but you >>>>>> download it explicitely from the job if you cannot find it in CVMFS. >>>>>> And/or the tarball should be copied to the French site storage >>>>>> closest >>>>>> to their nodes. >>>>>> >>>>>> The tarball on our storage was being accessed by 1500 processes >>>>>> concurrently on the same machine earlier today and I had already to >>>>>> replicate 3 times the file to try to spread the load on others. I'm >>>>>> surprised you didn't have time outs. >>>>>> >>>>>> cheers >>>>>> alessandra >>>>>> >>>>>> On 07/06/2019 14:59, PERRY James wrote: >>>>>>> Hi Alessandra, >>>>>>> >>>>>>> We are mostly using CVMFS, but one of the compute nodes in France >>>>>>> doesn't mount our CVMFS repository so we need the tarball for that >>>>>>> one. >>>>>>> Unfortunately because I can't predict when I submit a job whether it >>>>>>> will go to that node or not, all the jobs have the tarball listed >>>>>>> as an >>>>>>> input file. I tried uploading copies to other storage elements >>>>>>> as well >>>>>>> when I first put it on the grid, but at the time only Manchester was >>>>>>> working for me. I'm happy to discuss other solutions to this if it's >>>>>>> causing problems. >>>>>>> >>>>>>> Thanks, >>>>>>> James >>>>>>> >>>>>>> >>>>>>> On 07/06/2019 14:52, Alessandra Forti wrote: >>>>>>>> Hi James, >>>>>>>> >>>>>>>> can you let me know how you do software distribution? It seems you >>>>>>>> have >>>>>>>> 1 single tarball on the Manchester storage that is creating a large >>>>>>>> amount of connections. >>>>>>>> >>>>>>>> They might be among the causes of the current load we are >>>>>>>> experiencing. >>>>>>>> Manchester isn't running anything at the moment, so either those >>>>>>>> are ill >>>>>>>> closed connections (could be) or the tar ball you have on the >>>>>>>> manchester >>>>>>>> storage is the only source access by WNs at other sites in the UK. >>>>>>>> >>>>>>>> We always said that until the software was in development and LSST >>>>>>>> run >>>>>>>> smaller scale the storage was fine, but it wouldn't work if too >>>>>>>> many >>>>>>>> jobs tried to access the same file on one storage. Have you thought >>>>>>>> about using cvmfs or at the very least replicate the tarball at >>>>>>>> other >>>>>>>> sites? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> cheers >>>>>>>> alessandra >>>>>>>> >>>>>>> -- >>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>> James Perry Room 2.41, Bayes Centre >>>>>>> Software Architect The University of Edinburgh >>>>>>> EPCC 47 Potterrow >>>>>>> Tel: +44 131 650 5173 Edinburgh, EH8 9BT >>>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>> The University of Edinburgh is a charitable body, registered in >>>>>>> Scotland, with registration number SC005336. >>>> >>> >> >> -- >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> James Perry Room 2.41, Bayes Centre >> Software Architect The University of Edinburgh >> EPCC 47 Potterrow >> Tel: +44 131 650 5173 Edinburgh, EH8 9BT >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > -- Respect is a rational process. \\// For Ur-Fascism, disagreement is treason. (U. Eco) ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the LSST-DESC-GRID list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1