Print

Print


Hi Marian,

When the cmsd "dies". If it get unresponsive, a logfile and gcore would 
help. If it crashes, a logfile and core file would help here as well. As 
in all cases, send the config file along with it. For the core files, a 
simple url to where I could fetch it from would work rather han sending 
teh actual core file.

Andy

On Wed, 6 Apr 2016, Marian Zvada wrote:

> Hi Lukasz,
>
> sorry I missed this email... We left it under radar, my suspect is also we 
> had tons (more than usual load) of client requests coming at the same time... 
> I said we left if under radar meaning our admin will watch for this 
> particular behavior and when it occurs again I'll try to do some quick 
> parsing through the xrootd logs to see who is doing what at that moment.
>
> Since then I didn't hear it occurred again. Though, we had other issues in 
> last days, cmsd dying under big load etc, but that's something for the other 
> to open and discuss. Let's see what we find out, though.
>
> Thanks,
> Marian
>
> On 3/25/16 5:15 PM, Lukasz Janyst wrote:
>> Hi Marian,
>> 
>> has this issue been resolved? If so, I would very much like to know how.
>> 
>> Cheers,
>>     Lukasz
>> 
>> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote:
>>> Hi Marian,
>>> 
>>> the only other not-completely-improbable explanation is that a massive
>>> number of clients tries to connect within a time slice of one RTT.
>>> 
>>> Cheers,
>>>     Lukasz
>>> 
>>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote:
>>>> Hi Lukasz,
>>>> 
>>>> thanks for feedback, yep, it looks like more the system-wide scalability
>>>> issue which might or might not be connected to any bug in xrootd.
>>>> Though, the xrootd is the service hammered by something here which needs
>>>> attention, too.
>>>> 
>>>> We'll watch closely SYNs on the UNL host and try to debug live when this
>>>> occurs again.
>>>> 
>>>> Thanks,
>>>> Marian
>>>> 
>>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote:
>>>>> One way to debug this would be to run wireshark to see where the bogus
>>>>> SYN packets are coming from.
>>>>>
>>>>>      Lukasz
>>>>> 
>>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote:
>>>>>> Isn't it a sign of either a DOS attack or a network problem? I would
>>>>>> guess that a restart of the service helps because, by closing the
>>>>>> listening socket, you close the corresponding kernel SYN queue.
>>>>>>
>>>>>>      Lukasz
>>>>>> 
>>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote:
>>>>>>> Hi Folks,
>>>>>>> 
>>>>>>> we're seeing these two types of kernel messages which are obviously
>>>>>>> connected to xrootd process on US regional redirectors running on the
>>>>>>> port 1094:
>>>>>>> 
>>>>>>> ---
>>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies.
>>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies.
>>>>>>> ---
>>>>>>> 
>>>>>>> This is happening intermittently on both US regional redirectors
>>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS aliased 
>>>>>>> host
>>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically occurs 
>>>>>>> in
>>>>>>> syslog when redirector is giving very long waits for access to files
>>>>>>> through xrootd.
>>>>>>> 
>>>>>>> Simple restart of service bring response time back to normal. We also
>>>>>>> didn't notice any significant increase in use of memory nor cpu on the
>>>>>>> machines itself so we're wondering if anyone from the list or 
>>>>>>> developers
>>>>>>> may explain if this is something to worry about. It is also hard to
>>>>>>> catch so maybe if you have any idea what to watch next time and record
>>>>>>> (besides core file) that'll help. Luckily, we at least know when we're
>>>>>>> getting warning state of the xrootd-fallback SAM test this 'flooding' 
>>>>>>> is
>>>>>>> likely happening again...
>>>>>>> 
>>>>>>> FNAL and UNL regional redirectors run xrootd-4.3.0-0.rc3.el6.x86_64 
>>>>>>> and
>>>>>>> along slowness seen and odd kernel records in system logs there is
>>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do you 
>>>>>>> know
>>>>>>> which specific xrootd process chain might trigger this kernel errors?
>>>>>>> 
>>>>>>> Any feedback is very welcome!
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Marian
>>>>>>> 
>>>>>>> ########################################################################
>>>>>>> Use REPLY-ALL to reply to list
>>>>>>> 
>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-L list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1