Where can I see a list of kernel killed processes?

Is there some way I can check which of my processes the kernel has killed? Sometimes I log onto my server and find that something that should’ve run all night just stopped 8 hours in and I’m unsure if it’s the applications doing or the kernels.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If the kernel killed a process (because the system ran out of memory), there will be a kernel log message. Check in /var/log/kern.log (on Debian/Ubuntu, other distributions might send kernel logs to a different file, but usually under /var/log under Linux).

Note that if the OOM-killer (out-of-memory killer) triggered, it means you don’t have enough virtual memory. Add more swap (or perhaps more RAM).

Some process crashes are recorded in kernel logs as well (e.g. segmentation faults).

If the processes were started from cron, you should have a mail with error messages. If the processes were started from a shell in a terminal, check the errors in that terminal. Run the process in screen to see the terminal again in the morning. This might not help if the OOM-killer triggered, because it might have killed the cron or screen process as well; but if you ran into the OOM-killer, that’s the problem you need to fix.

Method 2

Process Accounting could help here.

In brief:

apt-get install acct

Then try commands like:
lastcomm
sa

or on Ubuntu:
lastcomm -f /var/log/account/pacct
sa /var/log/account/pacct

See:

UPDATE

Strangely, the pacct file has information about exit status, but neither lastcomm nor sa seem to print it.

So as far as I can see, you’d have to write your own C program to access the information.

UPDATE 2

Here’s a version that prints the exit code.

The last two fields are “S” for signaled and “E” for exited, followed by the signal number or exit status.

So in your case, you’re probably looking for “S 15” meaning it got a SIGTERM.

sleep                X mikel    stdin      0.00 secs Fri Mar 25 20:15 S  15

Compared to “E 0” which means the process exited without an error.
true                   mikel    stdin      0.00 secs Fri Mar 25 20:16 E   0

Only minimally tested.

Method 3

sudo service –status-all

This command will tell you the what are the services are currently running and which are not started or stopped..


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments