Print

Print


Dear Fabio and Yvan,

Thanks for your answers. Fabio, your overview of the situation is good.

Here some additional elements:

A. About the 3 IN2P3 machines, Qserv has been build, installed, 
configured as a mono-node, and tested, on only ccqserv001.
Indeed this step allow us to reach the first goal you're talking about.
Please note that*this procedure is complex* because it requires to setup 
an internet-free LSST distribution server in /sps/lsst/Qserv/distserver. 
This requires to build and package Qserv on a third-party machine, and 
then copy the distribution data to /sps/lsst/Qserv/distserver. This 
process has to be repeated each time we need to upgrade Qserv software 
on the cluster. (FYI, this procedure is documented here: 
http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/2014_11.0/devel/package-internet-free-distserver.html 
<http://lsst-web.ncsa.illinois.edu/%7Efjammes/qserv-doc/2014_11.0/devel/package-internet-free-distserver.html> 
and here: 
http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/2014_11.0/install/internet-free.htm 
<http://lsst-web.ncsa.illinois.edu/%7Efjammes/qserv-doc/2014_11.0/install/internet-free.htm>).
*Using an internet-free distribution server may be avoided:
- if we use RPM,
- or have access to **https://sw.lsstcorp.org/eupspkg/**on the node 
which**build the software**.*

B. Now, I agree with you, we need to converge together in order to 
define an optimal deployment procedure.

Qserv team wants to avoid rebuilding the software on each node, but we 
can't use our standard eups install procedure for this goal, as eups 
doesn't provide a binary install procedure yet.
So here's two solutions:

1. I agree RPM would ease our work to manage Qserv versions on a 
production cluster, but I'm not sure this is the best way of
    deploying software on a development cluster. Nevertheless if you 
require RPM install, I accept with pleasure Yvan help.
     - Qserv team can also provide support if needed: 
[log in to unmask]
     - Qserv install documentation is here: 
http://lsst-web.ncsa.illinois.edu/~fjammes/qserv-doc/ 
<http://lsst-web.ncsa.illinois.edu/%7Efjammes/qserv-doc/>
     (Qserv packaging depends a lot on LSST distribution tools which 
evolve very quickly so the documentation is hard to maintain and may be 
broken on some minor points, but I would be very happy to provide 
support to Yvan and I can be available by phone: 650-926-2849). Please 
note that Qserv team may not maintain this RPM packaging procedure in 
the near future.

2. If building an RPM is proven to be complex (Qserv has more that 20 
source dependencies and the full binary package size is more than one 
GB) and can be avoided for now, then an alternative is to:
     - build manually Qserv on one node (for example ccqserv001, 
*preferably with access to **https://sw.lsstcorp.org/eupspkg/*, even 
over a temporary ssh tunnel),
     - rsync manually Qserv binaries from ccqserv001 to 
/sps/lsst/Qserv/stack
     - then, in Pupppet post-install script and for a given node, rsync 
Qserv binaries from /sps/lsst/Qserv/stack to the node, and launch the 
configuration procedure on the node.

Here some advantages for this solution:
- the shared file system would only be used for binaries deployment 
purpose, (I think that even with RPM we will have to use a shared 
repository in order to store/retrieve the RPMs). Please note that the 
shared file-system won't be used at runtime as Qserv binaries are 
deployed locally on each node.
- on our side we are confident in implementing it and we can provide 
support to write Puppet post-install scripts,
- the shared file-system load would be limited in case of software 
updates. Indeed /sps/lsst/Qserv/stack is a eups stack so it contains all 
Qserv/dependencies versions we have build on ccqserv001, and rsync would 
only transfer the binaries on the new Qserv component at each updates, 
not the entire stack, whereas each RPM might contains and re-install the 
entire Qserv stack (1GB) at each update. It is also more efficient to 
store a unique stack directory with multiple Qserv version than one RPM 
for each Qserv version as dependencies binaries may be duplicated in 
each RPM in the last case.
- This procedure doesn't require root access for installing or upgrading 
a Qserv version so with can upgrade the code on a set of nodes without 
requiring your help. This is not a good feature for a production
cluster, but I think it's very useful for a development cluster.
- This procedure would allow us to easily override, and without 
privilegied access, a node install in order to perform debugging 
operations. The environment would be nearly the same than the build 
environment on our development machines so we would be able to rebuild 
from source the code on one node and use debugging/monitoring tools.

This procedure is quite near of what was used for the 300 nodes install 
last year (an improvment is that it doesn't require to rebuild the code 
on each node), and Qserv team was very happy with this solution, 
provided by CC-IN2P3 and LPC.

If you agree with this procedure, which doesn't require any additional 
development on your side, Qserv team is confident in implementing it and 
can provide you support to write Puppet post-install scripts. This 
procedure looks quite easy to implement, doesn't require root access in 
the post-install script, and futhermore I think the post-install scripts 
could be easily adusted to perform Qserv software update and 
re-configuration.
Please note that we still be interested by Yvan expertise on RPM 
packaging in the long-term, for production platforms.

Hope I was clear enough and you're also convinced about the second 
solution. We can set up a short meeting in order to converge better if 
you think I may have missed some important point.

Have a nice day,

Fabrice

On 11/13/2014 08:24 AM, Fabio Hernandez wrote:
> Dear Fabrice,
>
> I think it is very important to make sure we all have the same view of 
> the goals of this work. I will express the perspective of CC-IN2P3.
>
> At CC-IN2P3 we provided 3 machines installed with a typical software 
> environment, very much like the one the future DELL machines will be 
> installed with. The objective was for the Qserv team to validate that 
> the software is installable and works on top of that reference 
> environment and identify the possibly missing dependencies. I 
> understand from your previous e-mails that this goal is reached, but 
> please feel free to correct me if I'm wrong.
>
> Now, the next step is to package Qserv so that we (CC-IN2P3) can 
> deploy it on a larger cluster using the current tools we use for that, 
> which are based currently on Puppet. For this step, we strongly prefer 
> Qserv to be packaged int the form of RPMs. We intend the software to 
> be installed in the local disk of the machines in the cluster. In 
> particular, I think it is not wise at all to rely on the existence of 
> a shared file system among the nodes in the Qserv cluster for 
> deployment purposes. I understand that Qserv is designed to be a 
> shared-nothing database, so I'm assuming a shared file system is not 
> needed at runtime, but again, I may be wrong.
>
> From your message I understand that the Qserv team does not intend to 
> provide RPMs. Yvan offered help to build the RPMs based on the 
> information and experience you have so far gathered deploying Qserv on 
> the 3 test machines. Yvan's offer still holds.
>
> In summary, from our perspective the next steps would be:
>
>   * With your contribution, package Qserv in the form of RPMs
>   * We at CC-IN2P3 will use those RPMs to validate that we can use
>     them to install Qserv with our existing tools
>
>
> Once we have reached that stage, any update to the Qserv would be 
> delivered as a new release in the form of RPMs. At CC-IN2P3 we would 
> take care of deploying those updates to the machines in the cluster.
>
> We do intend to explore other techniques such as Docker containers but 
> not before making sure we have reached the first milestone for us 
> which is to have Qserv installed in the cluster using our current tools.
>
> Best regards,
>
>
> On 2014/11/12, at 23:07 , Fabrice Jammes <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
>
>> Hello,
>>
>> Whereas it is planned in the long-term, Qserv team will not work on 
>> rpm packaging in the short-term.
>>
>> With Andy Salnikov, from SLAC, we propose two strategies in order to 
>> install Qserv on the future DELL cluster:
>>
>> Note that binaries used by the worker and the master are exactly the 
>> same, there total size with dependencies is around 1GB, but not 
>> everything is used.
>>
>> 1. On the master node, we can install the binaries in 
>> /sps/lsst/Qserv/stack, and then in the post-install script, configure 
>> the worker nodes to use these binaries.
>> But could you please confirm us that sps filesystem will be able to 
>> support the load of the 25-50 nodes using these binaries during Qserv 
>> execution?
>>
>> 2. We can build Qserv on the master node of the cluster, rsync the 
>> binaries to /sps/lsst/Qserv/stack and then each worker node would run 
>> a post-install script which would rsync the binaries from /sps to its 
>> local filesystem and then configure the node.
>>
>> Which one of these solutions would you recommend?
>>
>> Furthermore, do you think we could get an access to  LSST 
>> distribution server (https://sw.lsstcorp.org/eupspkg/) on the master 
>> node of the cluster? This would ease a lot our work.
>>
>> Cheers,
>>
>> Fabrice
>>
>
> Fabio Hernandez
>
> CNRS – IN2P3 Computing Centre <http://cc.in2p3.fr> · Lyon (France)     
> ·     e-mail: [log in to unmask] <mailto:[log in to unmask]>     ·     tel: 
> +33 4 78 93 08 80
>
>
>
> ------------------------------------------------------------------------
>
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the QSERV-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
>


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1