Hi Alessandra and James, I add Fabio in the loop. I don't think that there is any problem to copy the tar ball at CC-IN2P3. By the way what is this tar ball ? Is that the instance catalog or the imsim code ? If it is imsim, I thought that we decided to download it from github and to build it locally as it is very fast to do so. Cheers, Dominique On 07/06/2019 16:30, Alessandra Forti wrote: > We upgraded the system and changed the storage system configuration so > there might be other factors at play, but this was the first thing > that jumped out and until we reduce it we cannot know if other things > are affecting the responsiveness of the storage. > > Said that 1500 processes trying to access 1 file on 1 machine is not > healthy. > > cheers > alessandra > > On 07/06/2019 15:25, PERRY James wrote: >> Hi Alessandra, >> >> The site is CC. They didn't seem to want to mount the cvmfs repository >> but maybe we could convince them to. >> >> I can download the file explicitly instead when required. Sorry, I >> hadn't realised that this would put such a load on the system. >> >> Thanks, >> James >> >> >> On 07/06/2019 15:16, Alessandra Forti wrote: >>> Hi James, >>> >>> Is there a reason why they can't mount it? Is it LAPP or CC? >>> >>> I would recommend that you don't use the software as an input but you >>> download it explicitely from the job if you cannot find it in CVMFS. >>> And/or the tarball should be copied to the French site storage closest >>> to their nodes. >>> >>> The tarball on our storage was being accessed by 1500 processes >>> concurrently on the same machine earlier today and I had already to >>> replicate 3 times the file to try to spread the load on others. I'm >>> surprised you didn't have time outs. >>> >>> cheers >>> alessandra >>> >>> On 07/06/2019 14:59, PERRY James wrote: >>>> Hi Alessandra, >>>> >>>> We are mostly using CVMFS, but one of the compute nodes in France >>>> doesn't mount our CVMFS repository so we need the tarball for that >>>> one. >>>> Unfortunately because I can't predict when I submit a job whether it >>>> will go to that node or not, all the jobs have the tarball listed >>>> as an >>>> input file. I tried uploading copies to other storage elements as well >>>> when I first put it on the grid, but at the time only Manchester was >>>> working for me. I'm happy to discuss other solutions to this if it's >>>> causing problems. >>>> >>>> Thanks, >>>> James >>>> >>>> >>>> On 07/06/2019 14:52, Alessandra Forti wrote: >>>>> Hi James, >>>>> >>>>> can you let me know how you do software distribution? It seems you >>>>> have >>>>> 1 single tarball on the Manchester storage that is creating a large >>>>> amount of connections. >>>>> >>>>> They might be among the causes of the current load we are >>>>> experiencing. >>>>> Manchester isn't running anything at the moment, so either those >>>>> are ill >>>>> closed connections (could be) or the tar ball you have on the >>>>> manchester >>>>> storage is the only source access by WNs at other sites in the UK. >>>>> >>>>> We always said that until the software was in development and LSST >>>>> run >>>>> smaller scale the storage was fine, but it wouldn't work if too many >>>>> jobs tried to access the same file on one storage. Have you thought >>>>> about using cvmfs or at the very least replicate the tarball at other >>>>> sites? >>>>> >>>>> thanks >>>>> >>>>> cheers >>>>> alessandra >>>>> >>>> -- >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> James Perry Room 2.41, Bayes Centre >>>> Software Architect The University of Edinburgh >>>> EPCC 47 Potterrow >>>> Tel: +44 131 650 5173 Edinburgh, EH8 9BT >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> The University of Edinburgh is a charitable body, registered in >>>> Scotland, with registration number SC005336. > -- Dominique Boutigny Laboratoire d'Annecy-le-Vieux de Physique des Particules Université Savoie Mont Blanc - CNRS/IN2P3 Tel : 04.50.09.17.81 - 07.83.59.06.65 Bureau: 1-116 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the LSST-DESC-GRID list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1