During the last XRootD package upgrade, the services were down after the upgrade with journald having logged:

Feb 24 04:49:24 systemd[1]: Current command vanished from the unit file, execution of the command list won't be resumed.
Feb 24 04:49:24 systemd[1]: Stopping XRootD xrootd daemon instance grid...
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:24 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:24 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:24 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:24 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:24 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:25 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:25 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:25 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:25 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:25 systemd[1]: Started XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: [log in to unmask]: main process exited, code=killed, status=11/SEGV
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.
Feb 24 04:49:26 systemd[1]: [log in to unmask] has no holdoff time, scheduling restart.
Feb 24 04:49:26 systemd[1]: Stopped XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: start request repeated too quickly for [log in to unmask]
Feb 24 04:49:26 systemd[1]: Failed to start XRootD xrootd daemon instance grid.
Feb 24 04:49:26 systemd[1]: Unit [log in to unmask] entered failed state.
Feb 24 04:49:26 systemd[1]: [log in to unmask] failed.

This is expected behaviour with RestartSec=0 during upgrade of all the library packages, since for 1-2 seconds, the library versions may mismatch and xrootd (or potentially also cmsd) may segfault trying to load them.
Since systemd enters the start request repeated too quickly state, the service remains in stopped / crashed state afterwards and will not autorestart anymore, until it is manually restarted. So depending on the scale of the update (e.g. to the new major version 5.1.0), the operator (or his/her configuration management) has to manually revive the service.

I wonder if RestartSec=5 (or something similar) would be more "resilient" during upgrades?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/xrootd/xrootd/issues/1410", "url": "https://github.com/xrootd/xrootd/issues/1410", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1