Hi Marian, When the cmsd "dies". If it get unresponsive, a logfile and gcore would help. If it crashes, a logfile and core file would help here as well. As in all cases, send the config file along with it. For the core files, a simple url to where I could fetch it from would work rather han sending teh actual core file. Andy On Wed, 6 Apr 2016, Marian Zvada wrote: > Hi Lukasz, > > sorry I missed this email... We left it under radar, my suspect is also we > had tons (more than usual load) of client requests coming at the same time... > I said we left if under radar meaning our admin will watch for this > particular behavior and when it occurs again I'll try to do some quick > parsing through the xrootd logs to see who is doing what at that moment. > > Since then I didn't hear it occurred again. Though, we had other issues in > last days, cmsd dying under big load etc, but that's something for the other > to open and discuss. Let's see what we find out, though. > > Thanks, > Marian > > On 3/25/16 5:15 PM, Lukasz Janyst wrote: >> Hi Marian, >> >> has this issue been resolved? If so, I would very much like to know how. >> >> Cheers, >> Lukasz >> >> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote: >>> Hi Marian, >>> >>> the only other not-completely-improbable explanation is that a massive >>> number of clients tries to connect within a time slice of one RTT. >>> >>> Cheers, >>> Lukasz >>> >>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote: >>>> Hi Lukasz, >>>> >>>> thanks for feedback, yep, it looks like more the system-wide scalability >>>> issue which might or might not be connected to any bug in xrootd. >>>> Though, the xrootd is the service hammered by something here which needs >>>> attention, too. >>>> >>>> We'll watch closely SYNs on the UNL host and try to debug live when this >>>> occurs again. >>>> >>>> Thanks, >>>> Marian >>>> >>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote: >>>>> One way to debug this would be to run wireshark to see where the bogus >>>>> SYN packets are coming from. >>>>> >>>>> Lukasz >>>>> >>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote: >>>>>> Isn't it a sign of either a DOS attack or a network problem? I would >>>>>> guess that a restart of the service helps because, by closing the >>>>>> listening socket, you close the corresponding kernel SYN queue. >>>>>> >>>>>> Lukasz >>>>>> >>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote: >>>>>>> Hi Folks, >>>>>>> >>>>>>> we're seeing these two types of kernel messages which are obviously >>>>>>> connected to xrootd process on US regional redirectors running on the >>>>>>> port 1094: >>>>>>> >>>>>>> --- >>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies. >>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies. >>>>>>> --- >>>>>>> >>>>>>> This is happening intermittently on both US regional redirectors >>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS aliased >>>>>>> host >>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically occurs >>>>>>> in >>>>>>> syslog when redirector is giving very long waits for access to files >>>>>>> through xrootd. >>>>>>> >>>>>>> Simple restart of service bring response time back to normal. We also >>>>>>> didn't notice any significant increase in use of memory nor cpu on the >>>>>>> machines itself so we're wondering if anyone from the list or >>>>>>> developers >>>>>>> may explain if this is something to worry about. It is also hard to >>>>>>> catch so maybe if you have any idea what to watch next time and record >>>>>>> (besides core file) that'll help. Luckily, we at least know when we're >>>>>>> getting warning state of the xrootd-fallback SAM test this 'flooding' >>>>>>> is >>>>>>> likely happening again... >>>>>>> >>>>>>> FNAL and UNL regional redirectors run xrootd-4.3.0-0.rc3.el6.x86_64 >>>>>>> and >>>>>>> along slowness seen and odd kernel records in system logs there is >>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do you >>>>>>> know >>>>>>> which specific xrootd process chain might trigger this kernel errors? >>>>>>> >>>>>>> Any feedback is very welcome! >>>>>>> >>>>>>> Thanks, >>>>>>> Marian >>>>>>> >>>>>>> ######################################################################## >>>>>>> Use REPLY-ALL to reply to list >>>>>>> >>>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > > ######################################################################## > Use REPLY-ALL to reply to list > > To unsubscribe from the XROOTD-L list, click the following link: > https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 > ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1