Print

Print


I did a grep for ‘openConnection’ and it is only called in the db init.

I can do some testing but I would probably need to make a clone db on my local machine so I don’t mess up production at SLAC or JLAB.

> On Dec 7, 2017, at 1:25 PM, Maurik Holtrop <[log in to unmask]> wrote:
> 
> That seems to me to be the right way to do this. So then I have no explanation for all those open connection, except perhaps that these is a lot going in in the “do conditions init” step and so this whole thing takes a bit longer than expected. 
> Are we sure no-one snuck in a bit of code that opens the DB again elsewhere?
> 
>> On Dec 7, 2017, at 4:23 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:
>> 
>> The code in the db manager is
>> 
>> // Open the database connection.
>> this.openConnection();
>> 
>> // do conditions init
>> …
>> 
>> // Close the connection.
>> this.closeConnection();
>> 
>> The connection is only opened in the setDetector method of the conditions manager, not anywhere else, and it is closed after it is used.
>> 
>> So I am not sure what is going on to be honest...
>> 
>>> On Dec 7, 2017, at 1:20 PM, Maurik Holtrop <[log in to unmask]> wrote:
>>> 
>>> I agree that the code needs to keep trying, at least for some time, to connect to the DB, instead of crash right away. I don’t think that this is where the problem is.  The DB reports that there are 900+ jobs that have a connection open, but the connection is in  “sleeping” state”. What is up with that?
>>> 
>>> On the other hand, logging into the database did not have any delay associated with it. If the DB had too many connections open so it could not service our jobs, I would expect I could not connect with the mysql utility either. So I am not sure what is going on. Just reporting what I saw.
>>> 
>>> 
>>>> On Dec 7, 2017, at 4:15 PM, McCormick, Jeremy I. <[log in to unmask]> wrote:
>>>> 
>>>> The connection is supposed to be shutdown after the conditions are initialized for the run.
>>>> 
>>>> There was some code added awhile ago (not by me) that waits for an open connection by looping.  Perhaps these jobs are just waiting for an open db connection because we are running so many concurrent jobs?
>>>> 
>>>>> On Dec 7, 2017, at 9:37 AM, Maurik Holtrop <[log in to unmask]> wrote:
>>>>> 
>>>>> Looks like a bug.
>>>>> 
>>>>> The SHOW PROCESSLIST command to mysql show that there are 949 connections to the DB right now, most of them not doing anything but connecting to the hps_conditions database. Thus, I would conclude that the conditions database code is NOT closing its connections correctly.
>>>>> 
>>>>> Jeremy, this is your code, can you please look into this?
>>>>> 
>>>>> Best,
>>>>> 	Maurik
>>>>> 
>>>>> 
>>>>>> On Dec 7, 2017, at 10:58 AM, Rafayel Paremuzyan <[log in to unmask]> wrote:
>>>>>> 
>>>>>> HI, in pass7, looking into job statues, I see many jobs have several hours of wall time, while cpu time is only few minutes, or even less than a minute.
>>>>>> 
>>>>>> I just did ccpr and asked, if they can help to investigate this,
>>>>>> but also I want to make sure, if this is not related to the DB.
>>>>>> 
>>>>>> Looking into output log files, and calculating printouts of connection openning and connection closing,
>>>>>> it shows 13 connection openings and 8 closings.
>>>>>> 
>>>>>> ifarm1402> grep -i DatabaseConditionsManager hps_005782.252_4.0.1_Recon.err | grep Opening | wc
>>>>>> 13 117 1599
>>>>>> ifarm1402> grep -i DatabaseConditionsManager hps_005782.252_4.0.1_Recon.err | grep Closing | wc
>>>>>> 8 72 1032 ifarm1402>
>>>>>> 
>>>>>> Can someone of experts comment on this? whether all connection close and opens are printed in the log file?
>>>>>> 
>>>>>> You can have a look for example this log file
>>>>>> /lustre/expphy/work/hallb/hps/data/engrun2015/pass7/hps_005782.252_4.0.1_Recon.err
>>>>>> 
>>>>>> Rafo
>>>>>> 
>>>>>> ########################################################################
>>>>>> Use REPLY-ALL to reply to list
>>>>>> 
>>>>>> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>>>> 
>>>>> ########################################################################
>>>>> Use REPLY-ALL to reply to list
>>>>> 
>>>>> To unsubscribe from the HPS-SOFTWARE list, click the following link:
>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1
>>>> 
>>> 
>> 
> 


########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the HPS-SOFTWARE list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=HPS-SOFTWARE&A=1