LISTSERV 16.5 - QSERV-L Archives

I'm still looking into it. The worker log looks ok for 5299. But the czar side, it just kind of disappears :-/

On 11/24/15 12:27, Fabrice Jammes wrote:

[log in to unmask]" type="cite"> Hi John,

Thanks for your answer,

Please let me know of you need additional support on cc-in2p3 cluster (higher log level, add of new code or debug tool, ...)
35 TB data may also not be consistent, but nothing in the log show that.

FYI, you can log as root in any container using:

sudo -u qserv docker exec -u root bash

Feel free to play with it, it's very easy to re-install from scratch.

An other thing, I didn't stop Qserv cleanly (using Qserv stop), but I stopped docker containers instead, so I think IN-Memory tables weren't destroyed on the workers side and I have next warning message at the top of the worker xrootd.log:

[2015-11-24T15:06:09.049Z] [0x7ffbd5c8a740] WARN root (core/modules/wdb/ChunkResource.cc:295) - memLockStatus unexpected results, assuming LOCKED_OTHER. err=Error 0: Expecting one row, found no rows

[2015-11-24T15:06:09.049Z] [0x7ffbd5c8a740] WARN root (core/modules/wdb/ChunkResource.cc:317) - Memory tables were not released cleanly! LockStatus=LOCKED_OTHER
[2015-11-24T15:06:09.049Z] [0x7ffbd5c8a740] DEBUG root (core/modules/wdb/ChunkResource.cc:273) - execLockSql CREATE DATABASE IF NOT EXISTS q_memoryLockDb;CREATE TABLE IF NOT EXISTS q_memoryLockDb.memoryLockTbl (
keyId INT UNIQUE, uid INT ) ENGINE = MEMORY;

So I think the MySQL In-Memory data was deleted when stopping container, but not In-Memory tables. I don't think this warning could cause the error, but I'll try to remove it by adding clean Qserv stop procedure.

Cheers,

On 11/24/2015 07:25 PM, John Gates wrote:

[log in to unmask]" type="cite"> It looks like all the tasks on ccqserv148 have completed, and it is expected that all the threads in the pool will block on getCmd() as there are no Tasks in the scheduler to run. For this query, there were 321 tasks which all seem to have run to completion. The last line in the worker log is

INFO root (core/modules/xrdsvc/SsiSession.cc:153) - RequestFinished type=isStream

That seems pretty reasonable. I've logged into a couple of other workers (147 and 140) and their log files look very similar. The thing that bothers me about all three logs is that the SsiSession is not being destroyed (I think it might be waiting on something from the czar but I need to look into it).

For a quick check, I did

/opt/shmux/bin/shmux -c "sudo -u qserv docker exec -t qserv tail /qserv/run/var/log/worker/xrootd.log" ccqserv{126..146} > ~/junk

and the ends of all the log files look very similar (despite being mashed together by shmux) They all end with at least one RequestFinished and none of them appear to be deleteing SsiSession.

On 11/24/15 08:23, Fabrice Jammes wrote:

[log in to unmask]" type="cite">
On 11/19/2015 12:14 AM, Gates, John H wrote:

[log in to unmask]" type="cite">
Hi Fabrice,

I'd like to test things out on the cluster but I want to make sure that I'm not going to step on your toes. I also want to make sure that it's working before I get into.

-John

Hi John

A recent Qserv version seems to run on ccqserv125...ccqserv149.
It works fine for small query but stall for large queries, on the worker side it seems...

I tried to diagnose it in this ticket description:
https://jira.lsstcorp.org/browse/DM-4407

Feel free to ask if you need information about connecting to docker machine.

I use:

ssh ccqservXXX
sudo -u qserv bash
docker exec -it qserv bash
# then I'm log in the container
ls /qserv/run/var/log
ps x
ls /qserv/data
...

Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1

Use REPLY-ALL to reply to list

To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1