Print

Print


Hi Andy,

yes, we keep in mind generate core file whenever we have an party event 
cmds forced restarts. At Nebraska we have also motd set on US redirector 
with welcome message: REMINDER TO SELF: generate core file

:)

-Marian

PS: Btw, we had crash of cmsd on global CERN redirectors couple days 
ago, I'm trying to find where Brian left the core file to send you for 
further analysis...

On 4/7/16 5:26 PM, Andrew Hanushevsky wrote:
> Hi Marian,
>
> When the cmsd "dies". If it get unresponsive, a logfile and gcore would
> help. If it crashes, a logfile and core file would help here as well. As
> in all cases, send the config file along with it. For the core files, a
> simple url to where I could fetch it from would work rather han sending
> teh actual core file.
>
> Andy
>
> On Wed, 6 Apr 2016, Marian Zvada wrote:
>
>> Hi Lukasz,
>>
>> sorry I missed this email... We left it under radar, my suspect is
>> also we had tons (more than usual load) of client requests coming at
>> the same time... I said we left if under radar meaning our admin will
>> watch for this particular behavior and when it occurs again I'll try
>> to do some quick parsing through the xrootd logs to see who is doing
>> what at that moment.
>>
>> Since then I didn't hear it occurred again. Though, we had other
>> issues in last days, cmsd dying under big load etc, but that's
>> something for the other to open and discuss. Let's see what we find
>> out, though.
>>
>> Thanks,
>> Marian
>>
>> On 3/25/16 5:15 PM, Lukasz Janyst wrote:
>>> Hi Marian,
>>>
>>> has this issue been resolved? If so, I would very much like to know how.
>>>
>>> Cheers,
>>>     Lukasz
>>>
>>> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote:
>>>> Hi Marian,
>>>>
>>>> the only other not-completely-improbable explanation is that a massive
>>>> number of clients tries to connect within a time slice of one RTT.
>>>>
>>>> Cheers,
>>>>     Lukasz
>>>>
>>>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote:
>>>>> Hi Lukasz,
>>>>>
>>>>> thanks for feedback, yep, it looks like more the system-wide
>>>>> scalability
>>>>> issue which might or might not be connected to any bug in xrootd.
>>>>> Though, the xrootd is the service hammered by something here which
>>>>> needs
>>>>> attention, too.
>>>>>
>>>>> We'll watch closely SYNs on the UNL host and try to debug live when
>>>>> this
>>>>> occurs again.
>>>>>
>>>>> Thanks,
>>>>> Marian
>>>>>
>>>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote:
>>>>>> One way to debug this would be to run wireshark to see where the
>>>>>> bogus
>>>>>> SYN packets are coming from.
>>>>>>
>>>>>>      Lukasz
>>>>>>
>>>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote:
>>>>>>> Isn't it a sign of either a DOS attack or a network problem? I would
>>>>>>> guess that a restart of the service helps because, by closing the
>>>>>>> listening socket, you close the corresponding kernel SYN queue.
>>>>>>>
>>>>>>>      Lukasz
>>>>>>>
>>>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote:
>>>>>>>> Hi Folks,
>>>>>>>>
>>>>>>>> we're seeing these two types of kernel messages which are obviously
>>>>>>>> connected to xrootd process on US regional redirectors running
>>>>>>>> on the
>>>>>>>> port 1094:
>>>>>>>>
>>>>>>>> ---
>>>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies.
>>>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies.
>>>>>>>> ---
>>>>>>>>
>>>>>>>> This is happening intermittently on both US regional redirectors
>>>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS
>>>>>>>> aliased host
>>>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically
>>>>>>>> occurs in
>>>>>>>> syslog when redirector is giving very long waits for access to
>>>>>>>> files
>>>>>>>> through xrootd.
>>>>>>>>
>>>>>>>> Simple restart of service bring response time back to normal. We
>>>>>>>> also
>>>>>>>> didn't notice any significant increase in use of memory nor cpu
>>>>>>>> on the
>>>>>>>> machines itself so we're wondering if anyone from the list or
>>>>>>>> developers
>>>>>>>> may explain if this is something to worry about. It is also hard to
>>>>>>>> catch so maybe if you have any idea what to watch next time and
>>>>>>>> record
>>>>>>>> (besides core file) that'll help. Luckily, we at least know when
>>>>>>>> we're
>>>>>>>> getting warning state of the xrootd-fallback SAM test this
>>>>>>>> 'flooding' is
>>>>>>>> likely happening again...
>>>>>>>>
>>>>>>>> FNAL and UNL regional redirectors run
>>>>>>>> xrootd-4.3.0-0.rc3.el6.x86_64 and
>>>>>>>> along slowness seen and odd kernel records in system logs there is
>>>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do
>>>>>>>> you know
>>>>>>>> which specific xrootd process chain might trigger this kernel
>>>>>>>> errors?
>>>>>>>>
>>>>>>>> Any feedback is very welcome!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marian
>>>>>>>>
>>>>>>>> ########################################################################
>>>>>>>>
>>>>>>>> Use REPLY-ALL to reply to list
>>>>>>>>
>>>>>>>> To unsubscribe from the XROOTD-L list, click the following link:
>>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-L list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1
>>

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1