Hi Andy, yes, we keep in mind generate core file whenever we have an party event cmds forced restarts. At Nebraska we have also motd set on US redirector with welcome message: REMINDER TO SELF: generate core file :) -Marian PS: Btw, we had crash of cmsd on global CERN redirectors couple days ago, I'm trying to find where Brian left the core file to send you for further analysis... On 4/7/16 5:26 PM, Andrew Hanushevsky wrote: > Hi Marian, > > When the cmsd "dies". If it get unresponsive, a logfile and gcore would > help. If it crashes, a logfile and core file would help here as well. As > in all cases, send the config file along with it. For the core files, a > simple url to where I could fetch it from would work rather han sending > teh actual core file. > > Andy > > On Wed, 6 Apr 2016, Marian Zvada wrote: > >> Hi Lukasz, >> >> sorry I missed this email... We left it under radar, my suspect is >> also we had tons (more than usual load) of client requests coming at >> the same time... I said we left if under radar meaning our admin will >> watch for this particular behavior and when it occurs again I'll try >> to do some quick parsing through the xrootd logs to see who is doing >> what at that moment. >> >> Since then I didn't hear it occurred again. Though, we had other >> issues in last days, cmsd dying under big load etc, but that's >> something for the other to open and discuss. Let's see what we find >> out, though. >> >> Thanks, >> Marian >> >> On 3/25/16 5:15 PM, Lukasz Janyst wrote: >>> Hi Marian, >>> >>> has this issue been resolved? If so, I would very much like to know how. >>> >>> Cheers, >>> Lukasz >>> >>> On Thu, Mar 17, 2016, at 11:06, Lukasz Janyst wrote: >>>> Hi Marian, >>>> >>>> the only other not-completely-improbable explanation is that a massive >>>> number of clients tries to connect within a time slice of one RTT. >>>> >>>> Cheers, >>>> Lukasz >>>> >>>> On Wed, Mar 16, 2016, at 20:58, Marian Zvada wrote: >>>>> Hi Lukasz, >>>>> >>>>> thanks for feedback, yep, it looks like more the system-wide >>>>> scalability >>>>> issue which might or might not be connected to any bug in xrootd. >>>>> Though, the xrootd is the service hammered by something here which >>>>> needs >>>>> attention, too. >>>>> >>>>> We'll watch closely SYNs on the UNL host and try to debug live when >>>>> this >>>>> occurs again. >>>>> >>>>> Thanks, >>>>> Marian >>>>> >>>>> On 3/16/16 5:34 AM, Lukasz Janyst wrote: >>>>>> One way to debug this would be to run wireshark to see where the >>>>>> bogus >>>>>> SYN packets are coming from. >>>>>> >>>>>> Lukasz >>>>>> >>>>>> On Wed, Mar 16, 2016, at 11:27, Lukasz Janyst wrote: >>>>>>> Isn't it a sign of either a DOS attack or a network problem? I would >>>>>>> guess that a restart of the service helps because, by closing the >>>>>>> listening socket, you close the corresponding kernel SYN queue. >>>>>>> >>>>>>> Lukasz >>>>>>> >>>>>>> On Wed, Mar 16, 2016, at 00:39, Marian Zvada wrote: >>>>>>>> Hi Folks, >>>>>>>> >>>>>>>> we're seeing these two types of kernel messages which are obviously >>>>>>>> connected to xrootd process on US regional redirectors running >>>>>>>> on the >>>>>>>> port 1094: >>>>>>>> >>>>>>>> --- >>>>>>>> kernel: TCPv6: Possible SYN flooding on port 1094. Sending cookies. >>>>>>>> kernel: possible SYN flooding on port 1094. Sending cookies. >>>>>>>> --- >>>>>>>> >>>>>>>> This is happening intermittently on both US regional redirectors >>>>>>>> cmsxrootd1.fnal.gov and xrootd.unl.edu. Both are behind DNS >>>>>>>> aliased host >>>>>>>> cmsxrootd.fnal.gov. We're pretty confident that this typically >>>>>>>> occurs in >>>>>>>> syslog when redirector is giving very long waits for access to >>>>>>>> files >>>>>>>> through xrootd. >>>>>>>> >>>>>>>> Simple restart of service bring response time back to normal. We >>>>>>>> also >>>>>>>> didn't notice any significant increase in use of memory nor cpu >>>>>>>> on the >>>>>>>> machines itself so we're wondering if anyone from the list or >>>>>>>> developers >>>>>>>> may explain if this is something to worry about. It is also hard to >>>>>>>> catch so maybe if you have any idea what to watch next time and >>>>>>>> record >>>>>>>> (besides core file) that'll help. Luckily, we at least know when >>>>>>>> we're >>>>>>>> getting warning state of the xrootd-fallback SAM test this >>>>>>>> 'flooding' is >>>>>>>> likely happening again... >>>>>>>> >>>>>>>> FNAL and UNL regional redirectors run >>>>>>>> xrootd-4.3.0-0.rc3.el6.x86_64 and >>>>>>>> along slowness seen and odd kernel records in system logs there is >>>>>>>> nothing obvious in the xrootd and cmsd logs to report. Maybe do >>>>>>>> you know >>>>>>>> which specific xrootd process chain might trigger this kernel >>>>>>>> errors? >>>>>>>> >>>>>>>> Any feedback is very welcome! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Marian >>>>>>>> >>>>>>>> ######################################################################## >>>>>>>> >>>>>>>> Use REPLY-ALL to reply to list >>>>>>>> >>>>>>>> To unsubscribe from the XROOTD-L list, click the following link: >>>>>>>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >> >> ######################################################################## >> Use REPLY-ALL to reply to list >> >> To unsubscribe from the XROOTD-L list, click the following link: >> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1 >> ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-L list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-L&A=1