Hi Marian,
When the cmsd "dies". If it get unresponsive, a logfile and gcore would
help. If it crashes, a logfile and core file would help here as well. As
in all cases, send the config file along with it. For the core files, a
simple url to where I could fetch it from would work rather han sending
teh actual core file.
Andy
On Wed, 6 Apr 2016, Marian Zvada wrote:
> Hi Lukasz,
>
> sorry I missed this email... We left it under radar, my suspect is also we
> had tons (more than usual load) of client requests coming at the same time...
> I said we left if under radar meaning our admin will watch for this
> particular behavior and when it occurs again I'll try to do some quick
> parsing through the xrootd logs to see who is doing what at that moment.
>
> Since then I didn't hear it occurred again. Though, we had other issues in
> last days, cmsd dying under big load etc, but that's something for the other
> to open and discuss. Let's see what we find out, though.
>
> Thanks,
> Marian
>
> On 3/25/16 5:15 PM, Lukasz Janyst wrote:
>> Hi Marian,
>>
>> has this issue been resolved? If so, I would very much like to know how.
>>
>> Cheers,
>> Lukasz
>>
>> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote:
>>> Hi Marian,
>>>
>>> the only other not-completely-improbable explanation is that a massive
>>> number of clients tries to connect within a time slice of one RTT.
>>>
>>> Cheers,
>>> Lukasz
>>>
>>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote:
>>>> Hi Lukasz,
>>>>
>>>> thanks for feedback, yep, it looks like more the system-wide scalability
>>>> issue which might or might not be connected to any bug in xrootd.
>>>> Though, the xrootd is the service hammered by something here which needs
>>>> attention, too.
>>>>
>>>> We'll watch closely SYNs on the UNL host and try to debug live when this
>>>> occurs again.
>>>>
>>>> Thanks,
>>>> Marian
>>>>
>>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote:
>>>>> One way to debug this would be to run wireshark to see where the bogus
>>>>> SYN packets are coming from.
>>>>>
>>>>> Lukasz
>>>>>
>>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote:
>>>>>> Isn't it a sign of either a DOS attack or a network problem? I would
>>>>>> guess that a restart of the service helps because, by closing the
>>>>>> listening socket, you close the corresponding kernel SYN queue.
>>>>>>
>>>>>> Lukasz
>>>>>>
>>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote:
>>>>>>> Hi Folks,
>>>>>>>
>>>>>>> we're seeing these two types of kernel messages which are obviously
>>>>>>> connected to xrootd process on US regional redirectors running on the
>>>>>>> port 1094:
>>>>>>>
>>>>>>> ---
>>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies.
>>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies.
>>>>>>> ---
>>>>>>>
>>>>>>> This is happening intermittently on both US regional redirectors
>>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS aliased
>>>>>>> host
>>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically occurs
>>>>>>> in
>>>>>>> syslog when redirector is giving very long waits for access to files
>>>>>>> through xrootd.
>>>>>>>
>>>>>>> Simple restart of service bring response time back to normal. We also
>>>>>>> didn't notice any significant increase in use of memory nor cpu on the
>>>>>>> machines itself so we're wondering if anyone from the list or
>>>>>>> developers
>>>>>>> may explain if this is something to worry about. It is also hard to
>>>>>>> catch so maybe if you have any idea what to watch next time and record
>>>>>>> (besides core file) that'll help. Luckily, we at least know when we're
>>>>>>> getting warning state of the xrootd-fallback SAM test this 'flooding'
>>>>>>> is
>>>>>>> likely happening again...
>>>>>>>
>>>>>>> FNAL and UNL regional redirectors run xrootd-4.3.0-0.rc3.el6.x86_64
>>>>>>> and
>>>>>>> along slowness seen and odd kernel records in system logs there is
>>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do you
>>>>>>> know
>>>>>>> which specific xrootd process chain might trigger this kernel errors?
>>>>>>>
>>>>>>> Any feedback is very welcome!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marian
>>>>>>>
>>>>>>> ########################################################################
>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>
>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|