Print

Print


Hi Alessandra,

The site is CC. They didn't seem to want to mount the cvmfs repository 
but maybe we could convince them to.

I can download the file explicitly instead when required. Sorry, I 
hadn't realised that this would put such a load on the system.

Thanks,
James


On 07/06/2019 15:16, Alessandra Forti wrote:
> Hi James,
> 
> Is there a reason why they can't mount it? Is it LAPP or CC?
> 
> I would recommend that you don't use the software as an input but you 
> download it explicitely from the job if you cannot find it in CVMFS. 
> And/or the tarball should be copied to the French site storage closest 
> to their nodes.
> 
> The tarball on our storage was being accessed by 1500 processes 
> concurrently on the same machine earlier today and I had already to 
> replicate 3 times the file to try to spread the load on others.  I'm 
> surprised you didn't have time outs.
> 
> cheers
> alessandra
> 
> On 07/06/2019 14:59, PERRY James wrote:
>> Hi Alessandra,
>>
>> We are mostly using CVMFS, but one of the compute nodes in France
>> doesn't mount our CVMFS repository so we need the tarball for that one.
>> Unfortunately because I can't predict when I submit a job whether it
>> will go to that node or not, all the jobs have the tarball listed as an
>> input file. I tried uploading copies to other storage elements as well
>> when I first put it on the grid, but at the time only Manchester was
>> working for me. I'm happy to discuss other solutions to this if it's
>> causing problems.
>>
>> Thanks,
>> James
>>
>>
>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>> Hi James,
>>>
>>> can you let me know how you do software distribution? It seems you have
>>> 1 single tarball on the Manchester storage that is creating a large
>>> amount of connections.
>>>
>>> They might be among the causes of the current load we are experiencing.
>>> Manchester isn't running anything at the moment, so either those are ill
>>> closed connections (could be) or the tar ball you have on the manchester
>>> storage is the only source access by WNs at other sites in the UK.
>>>
>>> We always said that until the software was in development and LSST run
>>> smaller scale the storage was fine, but it wouldn't work if too many
>>> jobs tried to access the same file on one storage. Have you thought
>>> about using cvmfs or at the very least replicate the tarball at other
>>> sites?
>>>
>>> thanks
>>>
>>> cheers
>>> alessandra
>>>
>> -- 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> James Perry                Room 2.41, Bayes Centre
>> Software Architect         The University of Edinburgh
>> EPCC                       47 Potterrow
>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> The University of Edinburgh is a charitable body, registered in 
>> Scotland, with registration number SC005336.
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry                Room 2.41, Bayes Centre
Software Architect         The University of Edinburgh
EPCC                       47 Potterrow
Tel: +44 131 650 5173      Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1