Print

Print


During the last `XRootD` package upgrade, the services were down after the upgrade with `journald` having logged:
```
Feb 24 04:49:24 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed.
Feb 24 04:49:24 systemd[1]: Stopping XRootD xrootd daemon instance grid...
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:26 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:26 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: start request repeated too quickly for [log in to unmask]
Feb 24 04:49:26 systemd[1]: Failed to start XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.
```

This is expected behaviour with `RestartSec=0` during upgrade of all the library packages, since for 1-2 seconds, the library versions may mismatch and `xrootd` (or potentially also `cmsd`) may segfault trying to load them. 
Since `systemd` enters the `start request repeated too quickly` state, the service remains in `stopped` / `crashed` state afterwards and will not autorestart anymore, until it is manually restarted. So depending on the scale of the update (e.g. to the new major version `5.1.0`), the operator (or his/her configuration management) has to manually revive the service. 

I wonder if `RestartSec=5` (or something similar) would be more "resilient" during upgrades? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/xrootd/xrootd/issues/1410

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1