Print

Print


Serge at al,

I am trying to implement code to avoid race conditions in zookeeper.
I figured I'd create an ephemeral node "/LOCKS/<dbName>" and proceed
with sensitive things only after creating such node successfully.
However, I have the impression zookeeper will allow multiple jobs
to create the same node, even in synchronous mode. Here is a test
I'm using (see below), if I run it concurrently (just two instances),
I'm typically seeing a collision (pasted below).

Can you have a look, think about it, and let's discuss tomorrow.
Thanks!


===================================

import os
import socket
import time
from random import randint


from kazoo.client import KazooClient
from kazoo.exceptions import NodeExistsError, NoNodeError

def sleepABit():
     v = randint(1,100) / 1000.0
     print "sleep ", v
     time.sleep(v)


def createIt(zk, k, v):
     while True:
         try:
             print "create ", v
             zk.create(k, v, ephemeral=True, makepath=True)
         except:
             print "create failed"
             sleepABit()
         finally:
             print "create ok"
             return


k = "/LOCKS/x"
zk = KazooClient(hosts="127.0.0.1:12181")
zk.start()

for i in range(0,100):
     v = str(socket.gethostbyname(socket.gethostname())) + '_' + 
str(os.getpid()) + '_' + str(i)

     createIt(zk, k, v)

     sleepABit()

     d, s = zk.get(k)
     print "got ", d

     print "delete"
     zk.delete(k)

     print "---"


=====================

create  141.142.225.179_8831_29
create ok
sleep  0.04
got  141.142.225.179_8839_16
delete
Traceback (most recent call last):
   File "quickTest.py", line 45, in <module>
     zk.delete(k)
   File 
"/usr/local/home/becla/qserv/1/stack/Linux64/kazoo/1.3.1/lib/python/kazoo-1.3.1-py2.6.egg/kazoo/client.py", 
line 1159, in delete
     return self.delete_async(path, version).get()
   File 
"/usr/local/home/becla/qserv/1/stack/Linux64/kazoo/1.3.1/lib/python/kazoo-1.3.1-py2.6.egg/kazoo/handlers/threading.py", 
line 107, in get
     raise self._exception
kazoo.exceptions.NoNodeError: ((), {})


See? the node has pid 8831 and successfully created the node,
but the other job (pid 8839) managed to create it OK as well
before job with pid 8831 deleted it.

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1