Hi Alessandra,
The site is CC. They didn't seem to want to mount the cvmfs repository
but maybe we could convince them to.
I can download the file explicitly instead when required. Sorry, I
hadn't realised that this would put such a load on the system.
Thanks,
James
On 07/06/2019 15:16, Alessandra Forti wrote:
> Hi James,
>
> Is there a reason why they can't mount it? Is it LAPP or CC?
>
> I would recommend that you don't use the software as an input but you
> download it explicitely from the job if you cannot find it in CVMFS.
> And/or the tarball should be copied to the French site storage closest
> to their nodes.
>
> The tarball on our storage was being accessed by 1500 processes
> concurrently on the same machine earlier today and I had already to
> replicate 3 times the file to try to spread the load on others. I'm
> surprised you didn't have time outs.
>
> cheers
> alessandra
>
> On 07/06/2019 14:59, PERRY James wrote:
>> Hi Alessandra,
>>
>> We are mostly using CVMFS, but one of the compute nodes in France
>> doesn't mount our CVMFS repository so we need the tarball for that one.
>> Unfortunately because I can't predict when I submit a job whether it
>> will go to that node or not, all the jobs have the tarball listed as an
>> input file. I tried uploading copies to other storage elements as well
>> when I first put it on the grid, but at the time only Manchester was
>> working for me. I'm happy to discuss other solutions to this if it's
>> causing problems.
>>
>> Thanks,
>> James
>>
>>
>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>> Hi James,
>>>
>>> can you let me know how you do software distribution? It seems you have
>>> 1 single tarball on the Manchester storage that is creating a large
>>> amount of connections.
>>>
>>> They might be among the causes of the current load we are experiencing.
>>> Manchester isn't running anything at the moment, so either those are ill
>>> closed connections (could be) or the tar ball you have on the manchester
>>> storage is the only source access by WNs at other sites in the UK.
>>>
>>> We always said that until the software was in development and LSST run
>>> smaller scale the storage was fine, but it wouldn't work if too many
>>> jobs tried to access the same file on one storage. Have you thought
>>> about using cvmfs or at the very least replicate the tarball at other
>>> sites?
>>>
>>> thanks
>>>
>>> cheers
>>> alessandra
>>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> James Perry Room 2.41, Bayes Centre
>> Software Architect The University of Edinburgh
>> EPCC 47 Potterrow
>> Tel: +44 131 650 5173 Edinburgh, EH8 9BT
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry Room 2.41, Bayes Centre
Software Architect The University of Edinburgh
EPCC 47 Potterrow
Tel: +44 131 650 5173 Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1
|