Print

Print


Dear Fabio and Yvan,

Thanks for your answers. Fabio, your overview of the situation is good.

Here some additional elements:

A. About the 3 IN2P3 machines, Qserv has been build, installed, configured as a mono-node, and tested, on only ccqserv001.
Indeed this step allow us to reach the first goal you're talking about.
Please note that this procedure is complex because it requires to setup an internet-free LSST distribution server in /sps/lsst/Qserv/distserver. This requires to build and package Qserv on a third-party machine, and then copy the distribution data to /sps/lsst/Qserv/distserver. This process has to be repeated each time we need to upgrade Qserv software on the cluster. (FYI, this procedure is documented here: http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/2014_11.0/devel/package-internet-free-distserver.html and here:  http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/2014_11.0/install/internet-free.htm).
Using an internet-free distribution server may be avoided:
- if we use RPM,
- or have access to
https://sw.lsstcorp.org/eupspkg/ on the node which build the software.

B. Now, I agree with you, we need to converge together in order to define an optimal deployment procedure.

Qserv team wants to avoid rebuilding the software on each node, but we can't use our standard eups install procedure for this goal, as eups doesn't provide a binary install procedure yet.
So here's two solutions:

1. I agree RPM would ease our work to manage Qserv versions on a production cluster, but I'm not sure this is the best way of
   deploying software on a development cluster. Nevertheless if you require RPM install, I accept with pleasure Yvan help.
    - Qserv team can also provide support if needed: [log in to unmask]
    - Qserv install documentation is here: http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/
    (Qserv packaging depends a lot on LSST distribution tools which evolve very quickly so the documentation is hard to maintain and may be broken on some minor points, but I would be very happy to provide support to Yvan and I can be available by phone: 650-926-2849). Please note that Qserv team may not maintain this RPM packaging procedure in the near future.

2. If building an RPM is proven to be complex (Qserv has more that 20 source dependencies and the full binary package size is more than one GB) and can be avoided for now, then an alternative is to:
    - build manually Qserv on one node (for example ccqserv001, preferably with access to https://sw.lsstcorp.org/eupspkg/, even over a temporary ssh tunnel),
    - rsync manually Qserv binaries from ccqserv001 to /sps/lsst/Qserv/stack
    - then, in Pupppet post-install script and for a given node, rsync Qserv binaries from /sps/lsst/Qserv/stack to the node, and launch the configuration procedure on the node.

Here some advantages for this solution:
- the shared file system would only be used for binaries deployment purpose, (I think that even with RPM we will have to use a shared repository in order to store/retrieve the RPMs). Please note that the shared file-system won't be used at runtime as Qserv binaries are deployed locally on each node.
- on our side we are confident in implementing it and we can provide support to write Puppet post-install scripts,
- the shared file-system load would be limited in case of software updates. Indeed /sps/lsst/Qserv/stack is a eups stack so it contains all Qserv/dependencies versions we have build on ccqserv001, and rsync would only transfer the binaries on the new Qserv component at each updates, not the entire stack, whereas each RPM might contains and re-install the entire Qserv stack (1GB) at each update. It is also more efficient to store a unique stack directory with multiple Qserv version than one RPM for each Qserv version as dependencies binaries may be duplicated in each RPM in the last case.
- This procedure doesn't require root access for installing or upgrading a Qserv version so with can upgrade the code on a set of nodes without requiring your help. This is not a good feature for a production
cluster, but I think it's very useful for a development cluster.
- This procedure would allow us to easily override, and without privilegied access, a node install in order to perform debugging operations. The environment would be nearly the same than the build environment on our development machines so we would be able to rebuild from source the code on one node and use debugging/monitoring tools.

This procedure is quite near of what was used for the 300 nodes install last year (an improvment is that it doesn't require to rebuild the code on each node), and Qserv team was very happy with this solution, provided by CC-IN2P3 and LPC.

If you agree with this procedure, which doesn't require any additional development on your side, Qserv team is confident in implementing it and can provide you support to write Puppet post-install scripts. This procedure looks quite easy to implement, doesn't require root access in the post-install script, and futhermore I think the post-install scripts could be easily adusted to perform Qserv software update and re-configuration.
Please note that we still be interested by Yvan expertise on RPM packaging in the long-term, for production platforms.

Hope I was clear enough and you're also convinced about the second solution. We can set up a short meeting in order to converge better if you think I may have missed some important point.

Have a nice day,

Fabrice

On 11/13/2014 08:24 AM, Fabio Hernandez wrote:
[log in to unmask]" type="cite"> Dear Fabrice,

I think it is very important to make sure we all have the same view of the goals of this work. I will express the perspective of CC-IN2P3.

At CC-IN2P3 we provided 3 machines installed with a typical software environment, very much like the one the future DELL machines will be installed with. The objective was for the Qserv team to validate that the software is installable and works on top of that reference environment and identify the possibly missing dependencies. I understand from your previous e-mails that this goal is reached, but please feel free to correct me if I'm wrong.

Now, the next step is to package Qserv so that we (CC-IN2P3) can deploy it on a larger cluster using the current tools we use for that, which are based currently on Puppet. For this step, we strongly prefer Qserv to be packaged int the form of RPMs. We intend the software to be installed in the local disk of the machines in the cluster. In particular, I think it is not wise at all to rely on the existence of a shared file system among the nodes in the Qserv cluster for deployment purposes. I understand that Qserv is designed to be a shared-nothing database, so I'm assuming a shared file system is not needed at runtime, but again, I may be wrong.

From your message I understand that the Qserv team does not intend to provide RPMs. Yvan offered help to build the RPMs based on the information and experience you have so far gathered deploying Qserv on the 3 test machines. Yvan's offer still holds.

In summary, from our perspective the next steps would be:


Once we have reached that stage, any update to the Qserv would be delivered as a new release in the form of RPMs. At CC-IN2P3 we would take care of deploying those updates to the machines in the cluster.

We do intend to explore other techniques such as Docker containers but not before making sure we have reached the first milestone for us which is to have Qserv installed in the cluster using our current tools.

Best regards,


On 2014/11/12, at 23:07 , Fabrice Jammes <[log in to unmask]> wrote:

Hello,

Whereas it is planned in the long-term, Qserv team will not work on rpm packaging in the short-term.

With Andy Salnikov, from SLAC, we propose two strategies in order to install Qserv on the future DELL cluster:

Note that binaries used by the worker and the master are exactly the same, there total size with dependencies is around 1GB, but not everything is used.

1. On the master node, we can install the binaries in /sps/lsst/Qserv/stack, and then in the post-install script, configure the worker nodes to use these binaries.
But could you please confirm us that sps filesystem will be able to support the load of the 25-50 nodes using these binaries during Qserv execution?

2. We can build Qserv on the master node of the cluster, rsync the binaries to /sps/lsst/Qserv/stack and then each worker node would run a post-install script which would rsync the binaries from /sps to its local filesystem and then configure the node.

Which one of these solutions would you recommend?

Furthermore, do you think we could get an access to  LSST distribution server (https://sw.lsstcorp.org/eupspkg/) on the master node of the cluster? This would ease a lot our work.

Cheers,

Fabrice


Fabio Hernandez

CNRS – IN2P3 Computing Centre · Lyon (France)     ·     e-mail: [log in to unmask]     ·     tel: +33 4 78 93 08 80




Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1




Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1