Note: I wrote an article on Medium that explains how to create a service, and how to avoid this particular issue: Creating a Linux service with systemd.
Original question:
I’m using systemd to keep a worker script working at all times:
[Unit] Description=My worker After=mysqld.service [Service] Type=simple Restart=always ExecStart=/path/to/script [Install] WantedBy=multi-user.target
Although the restart works fine if the script exits normally after a few minutes, I’ve noticed that if it repeatedly fails to execute on startup, systemd will just give up trying to start it:
Jun 14 11:10:31 localhost systemd[1]: test.service: Main process exited, code=exited, status=1/FAILURE Jun 14 11:10:31 localhost systemd[1]: test.service: Unit entered failed state. Jun 14 11:10:31 localhost systemd[1]: test.service: Failed with result 'exit-code'. Jun 14 11:10:31 localhost systemd[1]: test.service: Service hold-off time over, scheduling restart. Jun 14 11:10:31 localhost systemd[1]: test.service: Start request repeated too quickly. Jun 14 11:10:31 localhost systemd[1]: Failed to start My worker. Jun 14 11:10:31 localhost systemd[1]: test.service: Unit entered failed state. Jun 14 11:10:31 localhost systemd[1]: test.service: Failed with result 'start-limit'.
Similarly, if my worker script fails several times with an exit status of 255, systemd gives up trying to restart it:
Jun 14 11:25:51 localhost systemd[1]: test.service: Failed with result 'exit-code'. Jun 14 11:25:51 localhost systemd[1]: test.service: Service hold-off time over, scheduling restart. Jun 14 11:25:51 localhost systemd[1]: test.service: Start request repeated too quickly. Jun 14 11:25:51 localhost systemd[1]: Failed to start My worker. Jun 14 11:25:51 localhost systemd[1]: test.service: Unit entered failed state. Jun 14 11:25:51 localhost systemd[1]: test.service: Failed with result 'start-limit'.
Is there a way to force systemd to always retry after a few seconds?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I would like to extend Rahul’s answer a bit.
systemd tries to restart multiple times (StartLimitBurst) and stops trying if the attempt count is reached within StartLimitIntervalSec. Both options belong to the [unit] section.
The default delay between executions is 100ms (RestartSec) which causes the rate limit to be reached very fast.
systemd won’t attempt any more automatic restarts ever for units with Restart policy defined:
Note that units which are configured for
Restart=and which reach the
start limit are not attempted to be restarted anymore; however, they
may still be restarted manually at a later point, from which point on,
the restart logic is again activated.
Rahul’s answer helps, because the longer delay prevents reaching the error counter within the StartLimitIntervalSec time. The correct answer is to set both RestartSec and StartLimitBurst to reasonable values though.
Method 2
Yes, there is. You can specify to retry after x seconds under [Service] section,
[Service] Type=simple Restart=always RestartSec=3 ExecStart=/path/to/script
After saving the file you need to reload the daemon configurations to ensure systemd is aware of the new file,
systemctl daemon-reload
then restart the service to enable changes,
systemctl restart test
As you have requested, Looking at the documentation,
Restart=on-failure
sounds like a decent recommendation.
Method 3
systemd gives up trying to restart it
No. systemd gives up trying to restart it for a little while. This is clearly shown in the log that you supply:
Jun 14 11:25:51 localhost systemd[1]: test.service: Failed with result 'start-limit'.
This is rate limiting kicking in.
The length of the little while is specified in the service unit, using the StartLimitIntervalSec= setting. The number of starts that are needed within that interval to trigger the rate limiting mechanism are specified via the StartLimitBurst= setting. If nothing on your system differs from vanilla systemd, including the defaults for these two settings, then it is 5 times within 10 seconds.
StartLimitIntervalSec=0 disables rate limiting, so systemd will retry forever rather than giving up. But making your service either not exit so often, or idle enough between exits and restarts that it does not exceed the rate limiting threshold, is a better approach.
Note that rate limiting does not care how your service exited. It triggers on the number of attempts to start/restart it, irrespective of their cause.
Further reading
- Lennart Poettering (2013-10-07).
systemd.unit. systemd manual pages. freedesktop.org. - Systemd’s StartLimitIntervalSec and StartLimitBurst never work
Method 4
If your service is not restarting after reboot please insure you have enabled it in-before:
sudo systemctl enable your.service
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0