Cron only occasionally sends e-mail on output and errors

On Debian 8.1, I’m using a Bash feature to detect whether the stackoverflow.com website is reachable:

(echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

This is Bash-specific and will not work in sh, the default shell of cron.

If we, on purpose, try the script in sh, we get:

$ /bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent

Hence, if I only put the following in my personal crontab (without setting SHELL to /bin/bash) via crontab -e, I expect that once per minute, the script will be executed, and I therefore expect to also get the above error sent per mail once per minute:

* * * * * (echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

And indeed, exactly as expected, we see from /var/log/syslog that the entry is executed once per minute:

# sudo grep stackoverflow /var/log/syslog
Aug 24 18:58:01 localhost CRON[13719]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
Aug 24 18:59:01 localhost CRON[13723]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
Aug 24 19:00:01 localhost CRON[13727]: (mat) CMD ((echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable")
...

During the last ~2 hours, this was executed more than 120 times already, as I can verify with piping the output to wc -l.

However, from these >120 times the shell command (to repeat: the shell command is invalid for /bin/sh) has been executed, I only got three e-mails:

The first one at 19:10:01, the second at 20:15:01, and the third at 20:57:01.

The content of all three mails reads exactly as expected and contains exactly the error message that is to be expected from running the script in an incompatible shell (on purpose). For example, the second mail I received reads (and the other two are virtually identical):

From [email protected]  Mon Aug 24 20:15:01 2015
From: [email protected] (Cron Daemon)
To: [email protected]
Subject: Cron  (echo >/dev/tcp/stackoverflow.com/80)&>/dev/null || echo "stackoverflow unreachable"
...

/bin/sh: 1: cannot create /dev/tcp/stackoverflow.com/80: Directory nonexistent`

From /var/log/mail.log, I see that these three mails were the only mails sent and received in the last hours.

Thus, where are the >100 additional mails we would expect to receive from cron due to the above output that is created by the erroneous script?

To summarize:

  1. Mail is configured correctly on this system, I can send and receive mails without problem with /usr/bin/sendmail.
  2. Cron is set up correctly, notices the task as expected and executes it precisely at the configured times. I have tried many other tasks and scheduling options, and cron executed them all exactly as expected.
  3. The script always writes output (see below) and we thus expect cron to send the output to me via mail for each invocation.
  4. The output is mailed to me only occasionally, and apparently ignored in most cases.

There are many ways to work around the obvious mistake that led to the above observations:

  1. I can set SHELL=/bin/bash in my crontab.
  2. I can create a heartbeat.sh with #!/bin/bash, and invoke that.
  3. I can invoke the script with /bin/bash -c ... within crontab.
  4. etc., all fixing the mistake of using a Bash-specific feature within sh.

However, all of this does not address the core issue of this question, which is that in this case, cron does not reliably send mails even though the script always creates output.

I have verified that the script always creates output by creating wrong.sh (which again on purpose uses the unsuitable /bin/sh shell, to produce the same error that cron should see):

#!/bin/sh
(echo >/dev/tcp/stackoverflow.com/80) &>/dev/null || echo "stackoverflow unreachable"

Now I can invoke the script in a loop and see if there ever is a case where it finishes without creating output. Using Bash:

$ while true; do [[ -n $(./wrong.sh 2>&1 ) ]]; echo  $?; done | grep -v 0

Even in thousands of invocations, I could not reproduce a case where the script finishes without creating output.

What may be the cause of this unpredictable behaviour? Can anyone reproduce this? To me, it looks like there may be a race condition where cron can miss a script’s output, possibly primarily involving cases where the error stems from the shell itself. Thank you!

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Upon further testing, I suspect the & is messing with your results. As you point out, &>/dev/null is bash syntax, not sh syntax. As a result, sh is creating a subshell and backgrounding it. Sure, the subshell’s echo creates stderr, but my theory is that:

  1. cron is not catching the subshell’s stderr, and
  2. the backgrounding of the subshell always completes successfully, thus bypassing your || echo ....

… causing the cron job to have no output and thus no mail. Based on my reading of the vixie-cron source, it would seem that the job’s stderr and stdout would be captured by cron, but it must be getting lost by the subshell.

Test it yourself in a /bin/sh environment (assuming you do not have a file named ‘bar’ here):

(grep foo bar) &
echo $?

Method 2

I could reproduce the phenomenon on Ubuntu 15.04 with the following crontab:

* * * * * { echo job 0; } & sleep 5
* * * * * { echo job 1; } &
* * * * * { sleep 5; echo job 2; } &

I got mails from cron with job 0 every minute, mails with job 1 occasionally (5-6 times in last 10 minutes), no mails with job 2.

So it seems cron waits for the child process’s exit, then send a mail with all stdout/stderr output it can slurp at that time. Delayed output from orphaned grandchild process is simply discarded.

Method 3

In addition to the comments above, I wonder whether a slightly simpler explanation might be possible.

Recall that any shell will expand/process the command line before it does anything. As a result when you expand with sh the ‘&’ terminates the command line and tries to put it to background (and by default at a lower priority) and will not correctly assign stdin/stdout/stderr depending on what has been redirected. So maybe the “race condition” is on how fast the shell (sh) processes the line which will clearly depend on the load on your system (and subsequently how it then interacts with cron).


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x