During the last XRootD
package upgrade, the services were down after the upgrade with journald
having logged:
Feb 24 04:49:24 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed.
Feb 24 04:49:24 systemd[1]: Stopping XRootD xrootd daemon instance grid...
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:26 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:26 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: start request repeated too quickly for [log in to unmask]
Feb 24 04:49:26 systemd[1]: Failed to start XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.
This is expected behaviour with RestartSec=0
during upgrade of all the library packages, since for 1-2 seconds, the library versions may mismatch and xrootd
(or potentially also cmsd
) may segfault trying to load them.
Since systemd
enters the start request repeated too quickly
state, the service remains in stopped
/ crashed
state afterwards and will not autorestart anymore, until it is manually restarted. So depending on the scale of the update (e.g. to the new major version 5.1.0
), the operator (or his/her configuration management) has to manually revive the service.
I wonder if RestartSec=5
(or something similar) would be more "resilient" during upgrades?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1