Print

Print


I have just finished reviewing DM-3618 so it's up tto you to decide what
you like better. Personally I'd prefer not to restart mysql (it's a slow
process with innodb tables).

Cheers,
Andy


From: Becla, Jacek
Sent: Thursday, August 27, 2015 11:06 PM
To: Salnikov, Andrei A. <[log in to unmask]>
Cc: qserv-l <[log in to unmask]>
Subject: Re: [QSERV-L] memoryLockDb troubles

Andy

I confirm that uncommenting these two lines helped. So, do you recommend to keep that
and forget about my commit 68b3ddc8f ?

Thanks
Jacek



On Aug 27, 2015, at 10:46 PM, Salnikov, Andrei A. <[log in to unmask]<mailto:[log in to unmask]>> wrote:

I think that conclusion for DM-2930 was that if worker (xrootd) is
restarted than mysql server needs to be restarted as well. This is
not how we do things today in integration test, we only restart workers
and not mysql.

We should probably re-discuss how to make things more reliable, in
the meantime you can fix it by un-commenting "TRUNCATE TABLE ..."
statement in core/modules/wdb/ChunkResource.cc<http://chunkresource.cc/> (lines 324-325).

Cheers,
Andy


From: [log in to unmask]<mailto:[log in to unmask]> [mailto:[log in to unmask]] On Behalf Of Becla, Jacek
Sent: Thursday, August 27, 2015 8:55 PM
To: qserv-l <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Re: [QSERV-L] memoryLockDb troubles

For the record, I have a fix in DM-3618

John, FYI, AndyS is restarting xrootd in wmgr because apparently that is the only way to force xrootd to refresh chunk inventory (at the moment)

Jacek



On Aug 27, 2015, at 7:39 PM, Becla, Jacek <[log in to unmask]<mailto:[log in to unmask]>> wrote:

John

Your latest code is giving me troubles. Integration test killed xrootd, the tail of the log is similar to what you observed intermittently:

[2015-08-27T21:32:13.084-0500] [0x7f92c2df8720] INFO  root (build/xrdsvc/SsiService.cc<http://ssiservice.cc/>:142) - Cleaning up scratchDb: qservScratch.
[2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc<http://chunkresource.cc/>:302) - memLockStatus LOCKED_OTHER wrong uid. Expected 29962 got 28572 err=
[2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] WARN  root (build/wdb/ChunkResource.cc<http://chunkresource.cc/>:316) - Memory tables were not released cleanly! LockStatus=LOCKED_OTHER
[2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc<http://chunkresource.cc/>:272) - execLockSql CREATE DATABASE IF NOT EXISTS q_memoryLockDb;CREATE TABLE IF NOT EXISTS q_memoryLockDb.memoryLockTbl ( keyId INT UNIQUE, uid INT ) ENGINE = MEMORY;
[2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] DEBUG root (build/wdb/ChunkResource.cc<http://chunkresource.cc/>:272) - execLockSql INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )
[2015-08-27T21:32:13.086-0500] [0x7f92c2df8720] ERROR root (build/wdb/ChunkResource.cc<http://chunkresource.cc/>:373) - Lock failed, exiting. query=INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 ) err=Error 1062: Duplicate entry '1' for key 'keyId' Unable to execute query: INSERT INTO q_memoryLockDb.memoryLockTbl (keyId, uid) VALUES(1, 29962 )


I saved full log here:

/home/becla/qserv-run/2015_08/var/log/worker/xrootd.log

I hope we will be able to resolve it very soon, if we won’t, backup plan:
a) back off all changes
b) disable the code that is causing xrootd to die while you investigate

I’ll try to debug it tonight

Jacek

________________________________
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1


________________________________
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1