During the last `XRootD` package upgrade, the services were down after the upgrade with `journald` having logged: ``` Feb 24 04:49:24 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed. Feb 24 04:49:24 systemd[1]: Stopping XRootD xrootd daemon instance grid... Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid. Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:24 systemd[1]: [log in to unmask] failed. Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart. Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid. Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:24 systemd[1]: [log in to unmask] failed. Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart. Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid. Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:25 systemd[1]: [log in to unmask] failed. Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart. Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid. Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:25 systemd[1]: [log in to unmask] failed. Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart. Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid. Feb 24 04:49:26 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:26 systemd[1]: [log in to unmask] failed. Feb 24 04:49:26 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart. Feb 24 04:49:26 systemd[1]: Stopped XRootD xrootd daemon instance grid. Feb 24 04:49:26 systemd[1]: start request repeated too quickly for [log in to unmask] Feb 24 04:49:26 systemd[1]: Failed to start XRootD xrootd daemon instance grid. Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state. Feb 24 04:49:26 systemd[1]: [log in to unmask] failed. ``` This is expected behaviour with `RestartSec=0` during upgrade of all the library packages, since for 1-2 seconds, the library versions may mismatch and `xrootd` (or potentially also `cmsd`) may segfault trying to load them. Since `systemd` enters the `start request repeated too quickly` state, the service remains in `stopped` / `crashed` state afterwards and will not autorestart anymore, until it is manually restarted. So depending on the scale of the update (e.g. to the new major version `5.1.0`), the operator (or his/her configuration management) has to manually revive the service. I wonder if `RestartSec=5` (or something similar) would be more "resilient" during upgrades? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/xrootd/xrootd/issues/1410 ######################################################################## Use REPLY-ALL to reply to list To unsubscribe from the XROOTD-DEV list, click the following link: https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1