LISTSERV mailing list manager LISTSERV 16.5

Help for LSST-DESC-GRID Archives


LSST-DESC-GRID Archives

LSST-DESC-GRID Archives


LSST-DESC-GRID@LISTSERV.SLAC.STANFORD.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

LSST-DESC-GRID Home

LSST-DESC-GRID Home

LSST-DESC-GRID  June 2019

LSST-DESC-GRID June 2019

Subject:

Re: problems with LSST software tarball

From:

PERRY James <[log in to unmask]>

Reply-To:

Use of GRID computing resources within the Dark Energy Science Collaboration <[log in to unmask]>

Date:

Fri, 7 Jun 2019 14:52:10 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (1 lines)

Hi Dominique,

It's the ImSim code. I think someone recommended not to clone it from 
GitHub as this can result in GitHub blacklisting the worker nodes if 
they do this too many times, so I went with the tarball instead. We 
could try it using GitHub directly if you think it would be safe.

Cheers,
James


On 07/06/2019 15:49, Dominique Boutigny wrote:
> Hi Alessandra and James,
> 
> I add Fabio in the loop.
> I don't think that there is any problem to copy the tar ball at CC-IN2P3.
> By the way what is this tar ball ? Is that the instance catalog or the 
> imsim code ? If it is imsim, I thought that we decided to download it 
> from github and to build it locally as it is very fast to do so.
> 
> Cheers,
> 
> Dominique
> 
> On 07/06/2019 16:30, Alessandra Forti wrote:
>> We upgraded the system and changed the storage system configuration so 
>> there might be other factors at play, but this was the first thing 
>> that jumped out and until we reduce it we cannot know if other things 
>> are affecting the responsiveness of the storage.
>>
>> Said that 1500 processes trying to access 1 file on 1 machine is not 
>> healthy.
>>
>> cheers
>> alessandra
>>
>> On 07/06/2019 15:25, PERRY James wrote:
>>> Hi Alessandra,
>>>
>>> The site is CC. They didn't seem to want to mount the cvmfs repository
>>> but maybe we could convince them to.
>>>
>>> I can download the file explicitly instead when required. Sorry, I
>>> hadn't realised that this would put such a load on the system.
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On 07/06/2019 15:16, Alessandra Forti wrote:
>>>> Hi James,
>>>>
>>>> Is there a reason why they can't mount it? Is it LAPP or CC?
>>>>
>>>> I would recommend that you don't use the software as an input but you
>>>> download it explicitely from the job if you cannot find it in CVMFS.
>>>> And/or the tarball should be copied to the French site storage closest
>>>> to their nodes.
>>>>
>>>> The tarball on our storage was being accessed by 1500 processes
>>>> concurrently on the same machine earlier today and I had already to
>>>> replicate 3 times the file to try to spread the load on others.  I'm
>>>> surprised you didn't have time outs.
>>>>
>>>> cheers
>>>> alessandra
>>>>
>>>> On 07/06/2019 14:59, PERRY James wrote:
>>>>> Hi Alessandra,
>>>>>
>>>>> We are mostly using CVMFS, but one of the compute nodes in France
>>>>> doesn't mount our CVMFS repository so we need the tarball for that 
>>>>> one.
>>>>> Unfortunately because I can't predict when I submit a job whether it
>>>>> will go to that node or not, all the jobs have the tarball listed 
>>>>> as an
>>>>> input file. I tried uploading copies to other storage elements as well
>>>>> when I first put it on the grid, but at the time only Manchester was
>>>>> working for me. I'm happy to discuss other solutions to this if it's
>>>>> causing problems.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> On 07/06/2019 14:52, Alessandra Forti wrote:
>>>>>> Hi James,
>>>>>>
>>>>>> can you let me know how you do software distribution? It seems you 
>>>>>> have
>>>>>> 1 single tarball on the Manchester storage that is creating a large
>>>>>> amount of connections.
>>>>>>
>>>>>> They might be among the causes of the current load we are 
>>>>>> experiencing.
>>>>>> Manchester isn't running anything at the moment, so either those 
>>>>>> are ill
>>>>>> closed connections (could be) or the tar ball you have on the 
>>>>>> manchester
>>>>>> storage is the only source access by WNs at other sites in the UK.
>>>>>>
>>>>>> We always said that until the software was in development and LSST 
>>>>>> run
>>>>>> smaller scale the storage was fine, but it wouldn't work if too many
>>>>>> jobs tried to access the same file on one storage. Have you thought
>>>>>> about using cvmfs or at the very least replicate the tarball at other
>>>>>> sites?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>> cheers
>>>>>> alessandra
>>>>>>
>>>>> -- 
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> James Perry                Room 2.41, Bayes Centre
>>>>> Software Architect         The University of Edinburgh
>>>>> EPCC                       47 Potterrow
>>>>> Tel: +44 131 650 5173      Edinburgh, EH8 9BT
>>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry                Room 2.41, Bayes Centre
Software Architect         The University of Edinburgh
EPCC                       47 Potterrow
Tel: +44 131 650 5173      Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the LSST-DESC-GRID list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=LSST-DESC-GRID&A=1

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

June 2019
November 2017
July 2017
June 2017
May 2017

ATOM RSS1 RSS2



LISTSERV.SLAC.STANFORD.EDU

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager

Privacy Notice, Security Notice and Terms of Use