James,
I propose we explore storing imSim tarball in Dirac and make several replicas, including one replica at one SE at CC-IN2P3.
Do you think that would be compatible with the mechanism that you use to submit and execute the jobs?
Cheers,
Hi Dominique,It's the ImSim code. I think someone recommended not to clone it from GitHub as this can result in GitHub blacklisting the worker nodes if they do this too many times, so I went with the tarball instead. We could try it using GitHub directly if you think it would be safe.Cheers,JamesOn 07/06/2019 15:49, Dominique Boutigny wrote:Hi Alessandra and James,
I add Fabio in the loop.
I don't think that there is any problem to copy the tar ball at CC-IN2P3.
By the way what is this tar ball ? Is that the instance catalog or the
imsim code ? If it is imsim, I thought that we decided to download it
from github and to build it locally as it is very fast to do so.
Cheers,
Dominique
On 07/06/2019 16:30, Alessandra Forti wrote:
We upgraded the system and changed the storage system configuration so
there might be other factors at play, but this was the first thing
that jumped out and until we reduce it we cannot know if other things
are affecting the responsiveness of the storage.
Said that 1500 processes trying to access 1 file on 1 machine is not
healthy.
cheers
alessandra
On 07/06/2019 15:25, PERRY James wrote:
Hi Alessandra,
The site is CC. They didn't seem to want to mount the cvmfs repository
but maybe we could convince them to.
I can download the file explicitly instead when required. Sorry, I
hadn't realised that this would put such a load on the system.
Thanks,
James
On 07/06/2019 15:16, Alessandra Forti wrote:
Hi James,
Is there a reason why they can't mount it? Is it LAPP or CC?
I would recommend that you don't use the software as an input but you
download it explicitely from the job if you cannot find it in CVMFS.
And/or the tarball should be copied to the French site storage closest
to their nodes.
The tarball on our storage was being accessed by 1500 processes
concurrently on the same machine earlier today and I had already to
replicate 3 times the file to try to spread the load on others. I'm
surprised you didn't have time outs.
cheers
alessandra
On 07/06/2019 14:59, PERRY James wrote:
Hi Alessandra,
We are mostly using CVMFS, but one of the compute nodes in France
doesn't mount our CVMFS repository so we need the tarball for that
one.
Unfortunately because I can't predict when I submit a job whether it
will go to that node or not, all the jobs have the tarball listed
as an
input file. I tried uploading copies to other storage elements as well
when I first put it on the grid, but at the time only Manchester was
working for me. I'm happy to discuss other solutions to this if it's
causing problems.
Thanks,
James
On 07/06/2019 14:52, Alessandra Forti wrote:
Hi James,
can you let me know how you do software distribution? It seems you
have
1 single tarball on the Manchester storage that is creating a large
amount of connections.
They might be among the causes of the current load we are
experiencing.
Manchester isn't running anything at the moment, so either those
are ill
closed connections (could be) or the tar ball you have on the
manchester
storage is the only source access by WNs at other sites in the UK.
We always said that until the software was in development and LSST
run
smaller scale the storage was fine, but it wouldn't work if too many
jobs tried to access the same file on one storage. Have you thought
about using cvmfs or at the very least replicate the tarball at other
sites?
thanks
cheers
alessandra
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry Room 2.41, Bayes Centre
Software Architect The University of Edinburgh
EPCC 47 Potterrow
Tel: +44 131 650 5173 Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~James Perry Room 2.41, Bayes CentreSoftware Architect The University of EdinburghEPCC 47 PotterrowTel: +44 131 650 5173 Edinburgh, EH8 9BT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~