Hi Dominique, It's the ImSim code. I think someone recommended not to clone it from GitHub as this can result in GitHub blacklisting the worker nodes if they do this too many times, so I went with the tarball instead. We could try it using GitHub directly if you think it would be safe. Cheers, James On 07/06/2019 15:49, Dominique Boutigny wrote: > Hi Alessandra and James, > > I add Fabio in the loop. > I don't think that there is any problem to copy the tar ball at CC-IN2P3. > By the way what is this tar ball ? Is that the instance catalog or the > imsim code ? If it is imsim, I thought that we decided to download it > from github and to build it locally as it is very fast to do so. > > Cheers, > > Dominique > > On 07/06/2019 16:30, Alessandra Forti wrote: >> We upgraded the system and changed the storage system configuration so >> there might be other factors at play, but this was the first thing >> that jumped out and until we reduce it we cannot know if other things >> are affecting the responsiveness of the storage. >> >> Said that 1500 processes trying to access 1 file on 1 machine is not >> healthy. >> >> cheers >> alessandra >> >> On 07/06/2019 15:25, PERRY James wrote: >>> Hi Alessandra, >>> >>> The site is CC. They didn't seem to want to mount the cvmfs repository >>> but maybe we could convince them to. >>> >>> I can download the file explicitly instead when required. Sorry, I >>> hadn't realised that this would put such a load on the system. >>> >>> Thanks, >>> James >>> >>> >>> On 07/06/2019 15:16, Alessandra Forti wrote: >>>> Hi James, >>>> >>>> Is there a reason why they can't mount it? Is it LAPP or CC? >>>> >>>> I would recommend that you don't use the software as an input but you >>>> download it explicitely from the job if you cannot find it in CVMFS. >>>> And/or the tarball should be copied to the French site storage closest >>>> to their nodes. >>>> >>>> The tarball on our storage was being accessed by 1500 processes >>>> concurrently on the same machine earlier today and I had already to >>>> replicate 3 times the file to try to spread the load on others. I'm >>>> surprised you didn't have time outs. >>>> >>>> cheers >>>> alessandra >>>> >>>> On 07/06/2019 14:59, PERRY James wrote: >>>>> Hi Alessandra, >>>>> >>>>> We are mostly using CVMFS, but one of the compute nodes in France >>>>> doesn't mount our CVMFS repository so we need the tarball for that >>>>> one. >>>>> Unfortunately because I can't predict when I submit a job whether it >>>>> will go to that node or not, all the jobs have the tarball listed >>>>> as an >>>>> input file. I tried uploading copies to other storage elements as well >>>>> when I first put it on the grid, but at the time only Manchester was >>>>> working for me. I'm happy to discuss other solutions to this if it's >>>>> causing problems. >>>>> >>>>> Thanks, >>>>> James >>>>> >>>>> >>>>> On 07/06/2019 14:52, Alessandra Forti wrote: >>>>>> Hi James, >>>>>> >>>>>> can you let me know how you do software distribution? It seems you >>>>>> have >>>>>> 1 single tarball on the Manchester storage that is creating a large >>>>>> amount of connections. >>>>>> >>>>>> They might be among the causes of the current load we are >>>>>> experiencing. >>>>>> Manchester isn't running anything at the moment, so either those >>>>>> are ill >>>>>> closed connections (could be) or the tar ball you have on the >>>>>> manchester >>>>>> storage is the only source access by WNs at other sites in the UK. >>>>>> >>>>>> We always said that until the software was in development and LSST >>>>>> run >>>>>> smaller scale the storage was fine, but it wouldn't work if too many >>>>>> jobs tried to access the same file on one storage. Have you thought >>>>>> about using cvmfs or at the very least replicate the tarball at other >>>>>> sites? >>>>>> >>>>>> thanks >>>>>> >>>>>> cheers >>>>>> alessandra >>>>>> >>>>> -- >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> James Perry Room 2.41, Bayes Centre >>>>> Software Architect The University of Edinburgh >>>>> EPCC 47 Potterrow >>>>> Tel: +44 131 650 5173 Edinburgh, EH8 9BT >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >> > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ James Perry Room 2.41, Bayes Centre Software Architect The University of Edinburgh EPCC 47 Potterrow Tel: +44 131 650 5173 Edinburgh, EH8 9BT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the LSST-DESC-GRID list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1