Hi Fabio,
I don't understand what is the problem with using the gridpp CVMFS
repository? I think it is available everywhere in EGI as it should
be part of the EGI CVMFS configuration rpm. I can certainly see it
also from lxplus (CERN)
aforti@lxplus783> ls /cvmfs/gridpp.egi.eu/lsst
sims_2_8_0 sims_2_9_0 sims_w_2019_10_1
cheers
alessandra
On 07/06/2019 16:06, Fabio Hernandez
wrote:
[log in to unmask]">
James,
I propose we explore storing imSim tarball in Dirac
and make several replicas, including one replica at one SE at
CC-IN2P3.
Do you think that would be compatible with the
mechanism that you use to submit and execute the jobs?
Cheers,
Hi Dominique,
It's the ImSim code. I think
someone recommended not to clone it from
GitHub as this can result in
GitHub blacklisting the worker nodes if
they do this too many times, so I
went with the tarball instead. We
could try it using GitHub directly
if you think it would be safe.
Cheers,
James
On 07/06/2019 15:49, Dominique
Boutigny wrote:
Hi Alessandra and James,
I add Fabio in the loop.
I don't think that there is any problem to copy the tar
ball at CC-IN2P3.
By the way what is this tar ball ? Is that the instance
catalog or the
imsim code ? If it is imsim, I thought that we decided
to download it
from github and to build it locally as it is very fast
to do so.
Cheers,
Dominique
On 07/06/2019 16:30, Alessandra Forti wrote:
We upgraded the system
and changed the storage system configuration so
there might be other factors at play, but this was the
first thing
that jumped out and until we reduce it we cannot know
if other things
are affecting the responsiveness of the storage.
Said that 1500 processes trying to access 1 file on 1
machine is not
healthy.
cheers
alessandra
On 07/06/2019 15:25, PERRY James wrote:
Hi Alessandra,
The site is CC. They didn't seem to want to mount
the cvmfs repository
but maybe we could convince them to.
I can download the file explicitly instead when
required. Sorry, I
hadn't realised that this would put such a load on
the system.
Thanks,
James
On 07/06/2019 15:16, Alessandra Forti wrote:
Hi James,
Is there a reason why they can't mount it? Is it
LAPP or CC?
I would recommend that you don't use the software
as an input but you
download it explicitely from the job if you cannot
find it in CVMFS.
And/or the tarball should be copied to the French
site storage closest
to their nodes.
The tarball on our storage was being accessed by
1500 processes
concurrently on the same machine earlier today and
I had already to
replicate 3 times the file to try to spread the
load on others. I'm
surprised you didn't have time outs.
cheers
alessandra
On 07/06/2019 14:59, PERRY James wrote:
Hi Alessandra,
We are mostly using CVMFS, but one of the
compute nodes in France
doesn't mount our CVMFS repository so we need
the tarball for that
one.
Unfortunately because I can't predict when I
submit a job whether it
will go to that node or not, all the jobs have
the tarball listed
as an
input file. I tried uploading copies to other
storage elements as well
when I first put it on the grid, but at the time
only Manchester was
working for me. I'm happy to discuss other
solutions to this if it's
causing problems.
Thanks,
James
On 07/06/2019 14:52, Alessandra Forti wrote:
Hi James,
can you let me know how you do software
distribution? It seems you
have
1 single tarball on the Manchester storage
that is creating a large
amount of connections.
They might be among the causes of the current
load we are
experiencing.
Manchester isn't running anything at the
moment, so either those
are ill
closed connections (could be) or the tar ball
you have on the
manchester
storage is the only source access by WNs at
other sites in the UK.
We always said that until the software was in
development and LSST
run
smaller scale the storage was fine, but it
wouldn't work if too many
jobs tried to access the same file on one
storage. Have you thought
about using cvmfs or at the very least
replicate the tarball at other
sites?
thanks
cheers
alessandra
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry Room 2.41, Bayes
Centre
Software Architect The University of
Edinburgh
EPCC 47 Potterrow
Tel: +44 131 650 5173 Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The University of Edinburgh is a charitable
body, registered in
Scotland, with registration number SC005336.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
James Perry Room
2.41, Bayes Centre
Software Architect The
University of Edinburgh
EPCC 47
Potterrow
Tel: +44 131 650 5173
Edinburgh, EH8 9BT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
Respect is a rational process. \\//
For Ur-Fascism, disagreement is treason. (U. Eco)