After a recent upgrade to Fedora 15, I’m finding that a number of tools are failing with errors along the lines of:
tail: inotify resources exhausted tail: inotify cannot be used, reverting to polling
It’s not just tail that’s reporting problems with inotify, either. Is there any way to interrogate the kernel to find out what process or processes are consuming the inotify resources? The current inotify-related sysctl settings look like this:
fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 8192 fs.inotify.max_queued_events = 16384
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
It seems that if the process creates inotify instance via inotify_init(), the resulting file that represents filedescriptor in the /proc filesystem is a symlink to (non-existing) ‘anon_inode:inotify’ file.
$ cd /proc/5317/fd $ ls -l total 0 lrwx------ 1 puzel users 64 Jun 24 10:36 0 -> /dev/pts/25 lrwx------ 1 puzel users 64 Jun 24 10:36 1 -> /dev/pts/25 lrwx------ 1 puzel users 64 Jun 24 10:36 2 -> /dev/pts/25 lr-x------ 1 puzel users 64 Jun 24 10:36 3 -> anon_inode:inotify lr-x------ 1 puzel users 64 Jun 24 10:36 4 -> anon_inode:inotify
Unless I misunderstood the concept, the following command should show you list of processes (their representation in /proc), sorted by number of inotify instances they use.
$ for foo in /proc/*/fd/*; do readlink -f $foo; done | grep inotify | sort | uniq -c | sort -nr
Finding the culprits
Via the comments below @markkcowan mentioned this:
$ find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -exec sh -c 'cat $(dirname {})/../cmdline; echo ""' ; 2>/dev/null
Method 2
As @Jonathan Kamens said, you are probably running out of watches. I have a premade script, inotify-consumers, that lists the top offenders for you (a newer version also lists the username owning the process, see below):
$ time inotify-consumers
INOTIFY
WATCHER
COUNT PID CMD
----------------------------------------
6688 27262 /home/dvlpr/apps/WebStorm-2018.3.4/WebStorm-183.5429.34/bin/fsnotifier64
411 27581 node /home/dvlpr/dev/kiwi-frontend/node_modules/.bin/webpack --config config/webpack.dev.js
79 1541 /usr/lib/gnome-settings-daemon/gsd-xsettings
30 1664 /usr/lib/gvfs/gvfsd-trash --spawner :1.22 /org/gtk/gvfs/exec_spaw/0
14 1630 /usr/bin/gnome-software --gapplication-service
....
7489 watches TOTAL COUNT
real 0m0.099s
user 0m0.042s
sys 0m0.062s
Here you quickly see why the default limit of 8K watches is too little on a development machine, as just WebStorm instance quickly maxes this when encountering a node_modules folder with thousands of folders. Add a webpack watcher to guarantee problems …
Even though it was much faster than the other alternatives when I made it initially, Simon Matter added some speed enhancements for heavily loaded Big Iron Linux (hundreds of cores) that sped it up immensely, taking it down from ten minutes (!) to 15 seconds on his monster rig.
How to use
inotify-consumers --help 😊 To get it on your machine, just copy the contents of the script and put it somewhere in your $PATH, like /usr/local/bin. Alternatively, if you trust this stranger on the net, you can avoid copying it and pipe it into bash over http:
$ curl -s https://raw.githubusercontent.com/fatso83/dotfiles/master/utils/scripts/inotify-consumers | bash
INOTIFY
WATCHER
COUNT PID USER COMMAND
--------------------------------------
3044 3933 myuser node /usr/local/bin/tsserver
2965 3941 myuser /usr/local/bin/node /home/myuser/.config/coc/extensions/node_modules/coc-tsserver/bin/tsserverForkStart /hom
979 3954 myuser /usr/local/bin/node /home/myuser/.config/coc/extensions/node_modules/coc-tsserver/node_modules/typescript/li
1 7473 myuser /usr/local/bin/node --no-warnings /home/myuser/dev/dotfiles/common-setup/vim/dotvim/plugged/coc.nvim/build/i
1 3899 myuser /usr/local/bin/node --no-warnings /home/myuser/dev/dotfiles/common-setup/vim/dotvim/plugged/coc.nvim/build/i
6990 watches TOTAL COUNT
How does it work?
For reference, the main content of the script is simply this (inspired by this answer)
find /proc/*/fd
-lname anon_inode:inotify
-printf '%hinfo/%fn' 2>/dev/null
| xargs grep -c '^inotify'
| sort -n -t: -k2 -r
Changing the limits
In case you are wondering how to increase the limits
$ inotify-consumers --limits Current limits ------------- fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 524288 Changing settings permanently ----------------------------- echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf sudo sysctl -p # re-read config
Method 3
You are probably running out of inotify watches rather than instances. To find out who’s creating a lot of watches:
- Enable tracing of watch adds:
$ echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_exit_inotify_add_watch/enable`
- Verify if
tracing_onis s to 1:
$ cat /sys/kernel/debug/tracing/tracing_on 0 $ echo 1 > /sys/kernel/debug/tracing/tracing_on
- Restart the processes with inotify instances (determined as described in Petr Uzel’s answer) that you suspect of creating a lot of watches; and
- Setup ftrace
$ cat /sys/kernel/debug/tracing/current_tracer nop $ cat /sys/kernel/debug/tracing/set_ftrace_filter #### all functions enabled #### $ echo function > /sys/kernel/debug/tracing/current_tracer $ echo SyS_inotify_add_watch > /sys/kernel/debug/tracing/set_ftrace_filter
- Read the file
/sys/kernel/debug/tracing/traceto watch how many watches are created and by which processes.
When you’re done, make sure to echo 0 into the enable file (and the tracing_on file if you had to enable that as well) to turn off tracing so you won’t incur the performance hit of continuing to trace.
NOTE: In older versions of the Linux kernel the /sys endpoint used to be called tracing_enabled, however it’s now called tracing_on. If you find you’re on an older edition of the kernel change /sys/kernel/debug/tracing/tracing_on to /sys/kernel/debug/tracing/tracing_enabled.
Method 4
To trace which processes consume inotify watches (not instances) you can use the dynamic ftrace feature of the kernel if it is enabled in your kernel.
The kernel option you need is CONFIG_DYNAMIC_FTRACE.
First mount the debugfs filesystem if it is not already mounted.
mount -t debugfs nodev /sys/kernel/debug
Go under the tracing subdirectory of this debugfs directory
cd /sys/kernel/debug/tracing
Enable tracing of function calls
echo function > current_tracer
Filter only SyS_inotify_add_watch system calls
echo SyS_inotify_add_watch > set_ftrace_filter
Clear the trace ring buffer if it wasn’t empty
echo > trace
Enable tracing if it is not already enabled
echo 1 > tracing_on
Restart the suspected process (in my case it was crashplan, a backup application)
Watch the inotify_watch being exhausted
wc -l trace cat trace
Done
Method 5
I ran into this problem, and none of these answers give you the answer of “how many watches is each process currently using?” The one-liners all give you how many instances are open, which is only part of the story, and the trace stuff is only useful to see new watches being opened.
TL;DR: This will get you a file with a list of open inotify instances and the number of watches they have, along with the pids and binaries that spawned them, sorted in descending order by watch count:
sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); exe=$(sudo readlink $(dirname $(dirname $fdi))/exe); echo -e $count"t"$fdi"t"$exe; done | sort -nr > watches
That’s a big ball of mess, so here’s how I got there. To start, I ran a tail on a test file, and looked at the fd’s it opened:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e78d88828ba7808b86839493888982">[email protected]</a>:~$ tail -f test > /dev/null & [3] 22734 <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e3898c868fa38c939bd2">[email protected]</a>:~$ ls -ahltr /proc/22734/fd total 0 dr-xr-xr-x 9 joel joel 0 Feb 22 22:34 .. dr-x------ 2 joel joel 0 Feb 22 22:34 . lr-x------ 1 joel joel 64 Feb 22 22:35 4 -> anon_inode:inotify lr-x------ 1 joel joel 64 Feb 22 22:35 3 -> /home/joel/test lrwx------ 1 joel joel 64 Feb 22 22:35 2 -> /dev/pts/2 l-wx------ 1 joel joel 64 Feb 22 22:35 1 -> /dev/null lrwx------ 1 joel joel 64 Feb 22 22:35 0 -> /dev/pts/2
So, 4 is the fd we want to investigate. Let’s see what’s in the fdinfo for that:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9ff5f0faf3dff0efe7ae">[email protected]</a>:~$ cat /proc/22734/fdinfo/4 pos: 0 flags: 00 mnt_id: 11 inotify wd:1 ino:15f51d sdev:ca00003 mask:c06 ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:1df51500a75e538c
That looks like a entry for the watch at the bottom!
Let’s try something with more watches, this time with the inotifywait utility, just watching whatever is in /tmp:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="80eaefe5ecc0e7ece1e4f3f4efeee5">[email protected]</a>:~$ inotifywait /tmp/* & [4] 27862 <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e3898c868fa3848f828790978c8d86">[email protected]</a>:~$ Setting up watches. Watches established. <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d777278715d7a717c796e69727378">[email protected]</a>:~$ ls -ahtlr /proc/27862/fd | grep inotify lr-x------ 1 joel joel 64 Feb 22 22:41 3 -> anon_inode:inotify <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a8c2c7cdc4e8cfc4c9ccdbdcc7c6cd">[email protected]</a>:~$ cat /proc/27862/fdinfo/3 pos: 0 flags: 00 mnt_id: 11 inotify wd:6 ino:7fdc sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:dc7f0000551e9d88 inotify wd:5 ino:7fcb sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:cb7f00005b1f9d88 inotify wd:4 ino:7fcc sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:cc7f00006a1d9d88 inotify wd:3 ino:7fc6 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:c67f00005d1d9d88 inotify wd:2 ino:7fc7 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:c77f0000461d9d88 inotify wd:1 ino:7fd7 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:d77f00000053c98b
Aha! More entries! So we should have six things in /tmp then:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e983868c85a9869991d8">[email protected]</a>:~$ ls /tmp/ | wc -l 6
Excellent. My new inotifywait has one entry in its fd list (which is what the other one-liners here are counting), but six entries in its fdinfo file. So we can figure out how many watches a given fd for a given process is using by consulting its fdinfo file. Now to put it together with some of the above to grab a list of processes that have notify watches open and use that to count the entries in each fdinfo. This is similar to above, so I’ll just dump the one-liner here:
sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); echo -e $count"t"$fdi; done
There’s some thick stuff in here, but the basics are that I use awk to build an fdinfo path from the lsof output, grabbing the pid and fd number, stripping the u/r/w flag from the latter. Then for each constructed fdinfo path, I count the number of inotify lines and output the count and the pid.
It would be nice if I had what processes these pids represent in the same place though, right? I thought so. So, in a particularly messy bit, I settled on calling dirname twice on the fdinfo path to get pack to /proc/<pid>, adding /exe to it, and then running readlink on that to get the exe name of the process. Throw that in there as well, sort it by number of watches, and redirect it to a file for safe-keeping and we get:
sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); exe=$(sudo readlink $(dirname $(dirname $fdi))/exe); echo -e $count"t"$fdi"t"$exe; done | sort -n > watches
Running that without sudo to just show my processes I launched above, I get:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4f25202a230f28232e2b3c3b20212a">[email protected]</a>:~$ cat watches 6 /proc/4906/fdinfo/3 /usr/bin/inotifywait 1 /proc/22734/fdinfo/4 /usr/bin/tail
Perfect! A list of processes, fd’s, and how many watches each is using, which is exactly what I needed.
Method 6
find /proc/*/fd/* -type l -lname 'anon_inode:inotify' 2>/dev/null | cut -f 1-4 -d'/' | sort | uniq -c | sort -nr
Method 7
Just wrote an C++ app to help track down inotify information. Should be able to display summary information along with files and directories watched.
https://github.com/mikesart/inotify-info
Hopefully should help track down what the limits are and where they’re being hit.
Method 8
I have modified the script present in above to show the list of processes those are consuming inotify resources:
ps -p `find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print | sed s/'^/proc/'/''/ | sed s/'/fd.*$'/''/`
I think there is a way to replace my double sed.
Yes. Use either
cut -f 3 -d '/'
or
sed -e 's/^/proc/([0-9]*)/.*/1'
and you’ll only get the pid.
Also, if you add
2> /dev/null
in the find, you’ll get rid of any pesky error lines thrown by find. So this would work:
ps -p $(find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print 2> /dev/null | sed -e 's/^/proc/([0-9]*)/.*/1/')
Method 9
We needed to run this script against a fleet of servers so we wrote an Ansible playbook to accomplish this. It adapts several of the concepts in the others answers into a single playbook which will run the commands necessary to generate a report showing inotify watcher usage by PID.
$ ansible-playbook
-i systems-inventory/cluster2.lab1
playbooks/show_inotify_watcher_cnt.yml
-l ocp-master-02a.lab1*
NOTE: There’s an example usage of this playbook is a comment towards the end of the playbook below.
$ cat show_inotify_watcher_cnt.yml
###########################################################
# References
###########################################################
# - https://stackoverflow.com/questions/40230184/how-to-do-multiline-shell-script-in-ansible
###########################################################
- hosts: compute infra masters
tasks:
- shell:
cmd: |
cat <<EOF > /tmp/inotify_cnt.sh
#!/bin/bash
## Get the procs sorted by the number of inotify watchers
##
## From `man find`:
## %h Leading directories of file's name (all but the last element).
## If the file name contains no slashes (since it is in the current directory)
## the %h specifier expands to `.'.
## %f File's name with any leading directories removed (only the last element).
lines=$(
find /proc/*/fd
-lname anon_inode:inotify
-printf '%hinfo/%fn' 2>/dev/null
| xargs grep -c '^inotify'
| sort -n -t: -k2 -r
)
printf "n%10sn" "INOTIFY"
printf "%10sn" "WATCHER"
printf "%10s %5s %sn" " COUNT " "PID" "CMD"
printf -- "----------------------------------------n"
for line in $lines; do
watcher_count=$(echo $line | sed -e 's/.*://')
pid=$(echo $line | sed -e 's//proc/([0-9]*)/.*/1/')
cmdline=$(ps --columns 120 -o command -h -p $pid)
printf "%8d %7d %sn" "$watcher_count" "$pid" "$cmdline"
done
EOF
become: yes
- file:
dest: /tmp/inotify_cnt.sh
mode: a+x
- shell: /tmp/inotify_cnt.sh
become: yes
register: output
- debug:
var: output.stdout_lines
- shell: |
sysctl fs.inotify
become: yes
register: output
- debug:
var: output.stdout_lines
##########
# USAGE
##########
## $ ansible-playbook -i systems-inventory/cluster2.lab1 playbooks/show_inotify_watcher_cnt.yml -l ocp-master-02a.lab1*
##
## PLAY [compute infra masters] *****************************************************************************************************************************************************
##
## TASK [Gathering Facts] ***********************************************************************************************************************************************************
## ok: [ocp-master-02a.lab1.mydomclec.local]
##
## TASK [shell] *********************************************************************************************************************************************************************
## changed: [ocp-master-02a.lab1.mydomclec.local]
##
## TASK [file] **********************************************************************************************************************************************************************
## ok: [ocp-master-02a.lab1.mydomclec.local]
##
## TASK [shell] *********************************************************************************************************************************************************************
## changed: [ocp-master-02a.lab1.mydomclec.local]
##
## TASK [debug] *********************************************************************************************************************************************************************
## ok: [ocp-master-02a.lab1.mydomclec.local] => {
## "output.stdout_lines": [
## "",
## " INOTIFY",
## " WATCHER",
## " COUNT PID CMD",
## "----------------------------------------",
## " 957 6553 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
## " 11 856 /usr/lib/systemd/systemd-udevd",
## " 11 1457 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid",
## " 10 1471 /usr/sbin/rpc.gssd",
## " 5 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
## " 5 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
## " 5 1508 /usr/sbin/NetworkManager --no-daemon",
## " 4 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
## " 4 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
## " 4 1508 /usr/sbin/NetworkManager --no-daemon",
## " 4 1378 /usr/lib/polkit-1/polkitd --no-debug",
## " 3 4211 tail --follow=name /var/log/openvswitch/ovs-vswitchd.log /var/log/openvswitch/ovsdb-server.log",
## " 3 1970 /usr/sbin/crond -n",
## " 3 1378 /usr/lib/polkit-1/polkitd --no-debug",
## " 2 1893 /usr/sbin/rsyslogd -n",
## " 2 1389 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation",
## " 1 9012 virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
## " 1 9012 virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
## " 1 9012 virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
## " 1 9012 virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
## " 1 6553 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
## " 1 6553 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
## " 1 1 /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
## " 1 1388 /usr/sbin/sssd -i --logger=files",
## " 1 1359 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_de",
## " 1 1347 /usr/sbin/abrtd -d -s",
## " 0 6553 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-"
## ]
## }
##
## TASK [shell] *********************************************************************************************************************************************************************
## changed: [ocp-master-02a.lab1.mydomclec.local]
##
## TASK [debug] *********************************************************************************************************************************************************************
## ok: [ocp-master-02a.lab1.mydomclec.local] => {
## "output.stdout_lines": [
## "fs.inotify.max_queued_events = 16384",
## "fs.inotify.max_user_instances = 128",
## "fs.inotify.max_user_watches = 65536"
## ]
## }
##
## PLAY RECAP ***********************************************************************************************************************************************************************
## ocp-master-02a.lab1.mydomclec.local : ok=7 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
##
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0