Hi Andy,
yes, we keep in mind generate core file whenever we have an party event
cmds forced restarts. At Nebraska we have also motd set on US redirector
with welcome message: REMINDER TO SELF: generate core file
:)
-Marian
PS: Btw, we had crash of cmsd on global CERN redirectors couple days
ago, I'm trying to find where Brian left the core file to send you for
further analysis...
On 4/7/16 5:26 PM, Andrew Hanushevsky wrote:
> Hi Marian,
>
> When the cmsd "dies". If it get unresponsive, a logfile and gcore would
> help. If it crashes, a logfile and core file would help here as well. As
> in all cases, send the config file along with it. For the core files, a
> simple url to where I could fetch it from would work rather han sending
> teh actual core file.
>
> Andy
>
> On Wed, 6 Apr 2016, Marian Zvada wrote:
>
>> Hi Lukasz,
>>
>> sorry I missed this email... We left it under radar, my suspect is
>> also we had tons (more than usual load) of client requests coming at
>> the same time... I said we left if under radar meaning our admin will
>> watch for this particular behavior and when it occurs again I'll try
>> to do some quick parsing through the xrootd logs to see who is doing
>> what at that moment.
>>
>> Since then I didn't hear it occurred again. Though, we had other
>> issues in last days, cmsd dying under big load etc, but that's
>> something for the other to open and discuss. Let's see what we find
>> out, though.
>>
>> Thanks,
>> Marian
>>
>> On 3/25/16 5:15 PM, Lukasz Janyst wrote:
>>> Hi Marian,
>>>
>>> has this issue been resolved? If so, I would very much like to know how.
>>>
>>> Cheers,
>>> Lukasz
>>>
>>> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote:
>>>> Hi Marian,
>>>>
>>>> the only other not-completely-improbable explanation is that a massive
>>>> number of clients tries to connect within a time slice of one RTT.
>>>>
>>>> Cheers,
>>>> Lukasz
>>>>
>>>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote:
>>>>> Hi Lukasz,
>>>>>
>>>>> thanks for feedback, yep, it looks like more the system-wide
>>>>> scalability
>>>>> issue which might or might not be connected to any bug in xrootd.
>>>>> Though, the xrootd is the service hammered by something here which
>>>>> needs
>>>>> attention, too.
>>>>>
>>>>> We'll watch closely SYNs on the UNL host and try to debug live when
>>>>> this
>>>>> occurs again.
>>>>>
>>>>> Thanks,
>>>>> Marian
>>>>>
>>>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote:
>>>>>> One way to debug this would be to run wireshark to see where the
>>>>>> bogus
>>>>>> SYN packets are coming from.
>>>>>>
>>>>>> Lukasz
>>>>>>
>>>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote:
>>>>>>> Isn't it a sign of either a DOS attack or a network problem? I would
>>>>>>> guess that a restart of the service helps because, by closing the
>>>>>>> listening socket, you close the corresponding kernel SYN queue.
>>>>>>>
>>>>>>> Lukasz
>>>>>>>
>>>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote:
>>>>>>>> Hi Folks,
>>>>>>>>
>>>>>>>> we're seeing these two types of kernel messages which are obviously
>>>>>>>> connected to xrootd process on US regional redirectors running
>>>>>>>> on the
>>>>>>>> port 1094:
>>>>>>>>
>>>>>>>> ---
>>>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies.
>>>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies.
>>>>>>>> ---
>>>>>>>>
>>>>>>>> This is happening intermittently on both US regional redirectors
>>>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS
>>>>>>>> aliased host
>>>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically
>>>>>>>> occurs in
>>>>>>>> syslog when redirector is giving very long waits for access to
>>>>>>>> files
>>>>>>>> through xrootd.
>>>>>>>>
>>>>>>>> Simple restart of service bring response time back to normal. We
>>>>>>>> also
>>>>>>>> didn't notice any significant increase in use of memory nor cpu
>>>>>>>> on the
>>>>>>>> machines itself so we're wondering if anyone from the list or
>>>>>>>> developers
>>>>>>>> may explain if this is something to worry about. It is also hard to
>>>>>>>> catch so maybe if you have any idea what to watch next time and
>>>>>>>> record
>>>>>>>> (besides core file) that'll help. Luckily, we at least know when
>>>>>>>> we're
>>>>>>>> getting warning state of the xrootd-fallback SAM test this
>>>>>>>> 'flooding' is
>>>>>>>> likely happening again...
>>>>>>>>
>>>>>>>> FNAL and UNL regional redirectors run
>>>>>>>> xrootd-4.3.0-0.rc3.el6.x86_64 and
>>>>>>>> along slowness seen and odd kernel records in system logs there is
>>>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do
>>>>>>>> you know
>>>>>>>> which specific xrootd process chain might trigger this kernel
>>>>>>>> errors?
>>>>>>>>
>>>>>>>> Any feedback is very welcome!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marian
>>>>>>>>
>>>>>>>> ########################################################################
>>>>>>>>
>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>
>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
|