Who’s consuming my inotify resources?

After a recent upgrade to Fedora 15, I’m finding that a number of tools are failing with errors along the lines of:

tail: inotify resources exhausted
tail: inotify cannot be used, reverting to polling

It’s not just tail that’s reporting problems with inotify, either. Is there any way to interrogate the kernel to find out what process or processes are consuming the inotify resources? The current inotify-related sysctl settings look like this:

fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 8192
fs.inotify.max_queued_events = 16384

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

It seems that if the process creates inotify instance via inotify_init(), the resulting file that represents filedescriptor in the /proc filesystem is a symlink to (non-existing) ‘anon_inode:inotify’ file.

$ cd /proc/5317/fd
$ ls -l
total 0
lrwx------ 1 puzel users 64 Jun 24 10:36 0 -> /dev/pts/25
lrwx------ 1 puzel users 64 Jun 24 10:36 1 -> /dev/pts/25
lrwx------ 1 puzel users 64 Jun 24 10:36 2 -> /dev/pts/25
lr-x------ 1 puzel users 64 Jun 24 10:36 3 -> anon_inode:inotify
lr-x------ 1 puzel users 64 Jun 24 10:36 4 -> anon_inode:inotify

Unless I misunderstood the concept, the following command should show you list of processes (their representation in /proc), sorted by number of inotify instances they use.

$ for foo in /proc/*/fd/*; do readlink -f $foo; done | grep inotify | sort | uniq -c | sort -nr

Finding the culprits

Via the comments below @markkcowan mentioned this:

$ find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -exec sh -c 'cat $(dirname {})/../cmdline; echo ""' ; 2>/dev/null

Method 2

As @Jonathan Kamens said, you are probably running out of watches. I have a premade script, inotify-consumers, that lists the top offenders for you (a newer version also lists the username owning the process, see below):

$ time inotify-consumers  

   INOTIFY
   WATCHER
    COUNT     PID     CMD
----------------------------------------
    6688    27262  /home/dvlpr/apps/WebStorm-2018.3.4/WebStorm-183.5429.34/bin/fsnotifier64
     411    27581  node /home/dvlpr/dev/kiwi-frontend/node_modules/.bin/webpack --config config/webpack.dev.js
      79     1541  /usr/lib/gnome-settings-daemon/gsd-xsettings
      30     1664  /usr/lib/gvfs/gvfsd-trash --spawner :1.22 /org/gtk/gvfs/exec_spaw/0
      14     1630  /usr/bin/gnome-software --gapplication-service
    ....

    7489  watches TOTAL COUNT

real    0m0.099s
user    0m0.042s
sys 0m0.062s

Here you quickly see why the default limit of 8K watches is too little on a development machine, as just WebStorm instance quickly maxes this when encountering a node_modules folder with thousands of folders. Add a webpack watcher to guarantee problems …

Even though it was much faster than the other alternatives when I made it initially, Simon Matter added some speed enhancements for heavily loaded Big Iron Linux (hundreds of cores) that sped it up immensely, taking it down from ten minutes (!) to 15 seconds on his monster rig.

How to use

inotify-consumers --help 😊 To get it on your machine, just copy the contents of the script and put it somewhere in your $PATH, like /usr/local/bin. Alternatively, if you trust this stranger on the net, you can avoid copying it and pipe it into bash over http:

$ curl -s https://raw.githubusercontent.com/fatso83/dotfiles/master/utils/scripts/inotify-consumers | bash 

       INOTIFY
       WATCHER
        COUNT     PID USER     COMMAND
    --------------------------------------
        3044   3933 myuser node /usr/local/bin/tsserver
        2965   3941 myuser /usr/local/bin/node /home/myuser/.config/coc/extensions/node_modules/coc-tsserver/bin/tsserverForkStart /hom
         979   3954 myuser /usr/local/bin/node /home/myuser/.config/coc/extensions/node_modules/coc-tsserver/node_modules/typescript/li
           1   7473 myuser /usr/local/bin/node --no-warnings /home/myuser/dev/dotfiles/common-setup/vim/dotvim/plugged/coc.nvim/build/i
           1   3899 myuser /usr/local/bin/node --no-warnings /home/myuser/dev/dotfiles/common-setup/vim/dotvim/plugged/coc.nvim/build/i

        6990  watches TOTAL COUNT

How does it work?

For reference, the main content of the script is simply this (inspired by this answer)

find /proc/*/fd 
    -lname anon_inode:inotify 
    -printf '%hinfo/%fn' 2>/dev/null 
    
    | xargs grep -c '^inotify'  
    | sort -n -t: -k2 -r

Changing the limits

In case you are wondering how to increase the limits

$ inotify-consumers --limits 

Current limits
-------------
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 524288


Changing settings permanently
-----------------------------
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p # re-read config

Method 3

You are probably running out of inotify watches rather than instances. To find out who’s creating a lot of watches:

  1. Enable tracing of watch adds:
$ echo 1 > /sys/kernel/debug/tracing/events/syscalls/sys_exit_inotify_add_watch/enable`
  1. Verify if tracing_on is s to 1:
$ cat /sys/kernel/debug/tracing/tracing_on
0
$ echo 1 > /sys/kernel/debug/tracing/tracing_on
  1. Restart the processes with inotify instances (determined as described in Petr Uzel’s answer) that you suspect of creating a lot of watches; and
  2. Setup ftrace
$ cat /sys/kernel/debug/tracing/current_tracer
nop

$ cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####

$ echo function              > /sys/kernel/debug/tracing/current_tracer
$ echo SyS_inotify_add_watch > /sys/kernel/debug/tracing/set_ftrace_filter
  1. Read the file /sys/kernel/debug/tracing/trace to watch how many watches are created and by which processes.

When you’re done, make sure to echo 0 into the enable file (and the tracing_on file if you had to enable that as well) to turn off tracing so you won’t incur the performance hit of continuing to trace.

NOTE: In older versions of the Linux kernel the /sys endpoint used to be called tracing_enabled, however it’s now called tracing_on. If you find you’re on an older edition of the kernel change /sys/kernel/debug/tracing/tracing_on to /sys/kernel/debug/tracing/tracing_enabled.

Method 4

To trace which processes consume inotify watches (not instances) you can use the dynamic ftrace feature of the kernel if it is enabled in your kernel.

The kernel option you need is CONFIG_DYNAMIC_FTRACE.

First mount the debugfs filesystem if it is not already mounted.

mount -t debugfs nodev /sys/kernel/debug

Go under the tracing subdirectory of this debugfs directory

cd /sys/kernel/debug/tracing

Enable tracing of function calls

echo function > current_tracer

Filter only SyS_inotify_add_watch system calls

echo SyS_inotify_add_watch > set_ftrace_filter

Clear the trace ring buffer if it wasn’t empty

echo > trace

Enable tracing if it is not already enabled

echo 1 > tracing_on

Restart the suspected process (in my case it was crashplan, a backup application)

Watch the inotify_watch being exhausted

wc -l trace
cat trace

Done

Method 5

I ran into this problem, and none of these answers give you the answer of “how many watches is each process currently using?” The one-liners all give you how many instances are open, which is only part of the story, and the trace stuff is only useful to see new watches being opened.

TL;DR: This will get you a file with a list of open inotify instances and the number of watches they have, along with the pids and binaries that spawned them, sorted in descending order by watch count:

sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); exe=$(sudo readlink $(dirname $(dirname $fdi))/exe); echo -e $count"t"$fdi"t"$exe; done | sort -nr > watches

That’s a big ball of mess, so here’s how I got there. To start, I ran a tail on a test file, and looked at the fd’s it opened:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e78d88828ba7808b86839493888982">[email protected]</a>:~$ tail -f test > /dev/null &
[3] 22734
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e3898c868fa38c939bd2">[email protected]</a>:~$ ls -ahltr /proc/22734/fd
total 0
dr-xr-xr-x 9 joel joel  0 Feb 22 22:34 ..
dr-x------ 2 joel joel  0 Feb 22 22:34 .
lr-x------ 1 joel joel 64 Feb 22 22:35 4 -> anon_inode:inotify
lr-x------ 1 joel joel 64 Feb 22 22:35 3 -> /home/joel/test
lrwx------ 1 joel joel 64 Feb 22 22:35 2 -> /dev/pts/2
l-wx------ 1 joel joel 64 Feb 22 22:35 1 -> /dev/null
lrwx------ 1 joel joel 64 Feb 22 22:35 0 -> /dev/pts/2

So, 4 is the fd we want to investigate. Let’s see what’s in the fdinfo for that:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9ff5f0faf3dff0efe7ae">[email protected]</a>:~$ cat /proc/22734/fdinfo/4
pos:    0
flags:  00
mnt_id: 11
inotify wd:1 ino:15f51d sdev:ca00003 mask:c06 ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:1df51500a75e538c

That looks like a entry for the watch at the bottom!

Let’s try something with more watches, this time with the inotifywait utility, just watching whatever is in /tmp:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="80eaefe5ecc0e7ece1e4f3f4efeee5">[email protected]</a>:~$ inotifywait /tmp/* &
[4] 27862
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e3898c868fa3848f828790978c8d86">[email protected]</a>:~$ Setting up watches.
Watches established.
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d777278715d7a717c796e69727378">[email protected]</a>:~$ ls -ahtlr /proc/27862/fd | grep inotify
lr-x------ 1 joel joel 64 Feb 22 22:41 3 -> anon_inode:inotify
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a8c2c7cdc4e8cfc4c9ccdbdcc7c6cd">[email protected]</a>:~$ cat /proc/27862/fdinfo/3
pos:    0
flags:  00
mnt_id: 11
inotify wd:6 ino:7fdc sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:dc7f0000551e9d88
inotify wd:5 ino:7fcb sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:cb7f00005b1f9d88
inotify wd:4 ino:7fcc sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:cc7f00006a1d9d88
inotify wd:3 ino:7fc6 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:c67f00005d1d9d88
inotify wd:2 ino:7fc7 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:c77f0000461d9d88
inotify wd:1 ino:7fd7 sdev:ca00003 mask:fff ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:d77f00000053c98b

Aha! More entries! So we should have six things in /tmp then:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e983868c85a9869991d8">[email protected]</a>:~$ ls /tmp/ | wc -l
6

Excellent. My new inotifywait has one entry in its fd list (which is what the other one-liners here are counting), but six entries in its fdinfo file. So we can figure out how many watches a given fd for a given process is using by consulting its fdinfo file. Now to put it together with some of the above to grab a list of processes that have notify watches open and use that to count the entries in each fdinfo. This is similar to above, so I’ll just dump the one-liner here:

sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); echo -e $count"t"$fdi; done

There’s some thick stuff in here, but the basics are that I use awk to build an fdinfo path from the lsof output, grabbing the pid and fd number, stripping the u/r/w flag from the latter. Then for each constructed fdinfo path, I count the number of inotify lines and output the count and the pid.

It would be nice if I had what processes these pids represent in the same place though, right? I thought so. So, in a particularly messy bit, I settled on calling dirname twice on the fdinfo path to get pack to /proc/<pid>, adding /exe to it, and then running readlink on that to get the exe name of the process. Throw that in there as well, sort it by number of watches, and redirect it to a file for safe-keeping and we get:

sudo lsof | awk '/anon_inode/ { gsub(/[urw]$/,"",$4); print "/proc/"$2"/fdinfo/"$4; }' | while read fdi; do count=$(sudo grep -c inotify $fdi); exe=$(sudo readlink $(dirname $(dirname $fdi))/exe); echo -e $count"t"$fdi"t"$exe; done | sort -n > watches

Running that without sudo to just show my processes I launched above, I get:

<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4f25202a230f28232e2b3c3b20212a">[email protected]</a>:~$ cat watches 
6   /proc/4906/fdinfo/3 /usr/bin/inotifywait
1   /proc/22734/fdinfo/4    /usr/bin/tail

Perfect! A list of processes, fd’s, and how many watches each is using, which is exactly what I needed.

Method 6

find /proc/*/fd/* -type l -lname 'anon_inode:inotify' 2>/dev/null | cut -f 1-4 -d'/' |  sort | uniq -c  | sort -nr

Method 7

Just wrote an C++ app to help track down inotify information. Should be able to display summary information along with files and directories watched.

https://github.com/mikesart/inotify-info

Hopefully should help track down what the limits are and where they’re being hit.

Method 8

I have modified the script present in above to show the list of processes those are consuming inotify resources:

ps -p `find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print | sed s/'^/proc/'/''/ | sed s/'/fd.*$'/''/`

I think there is a way to replace my double sed.


Yes. Use either

cut -f 3 -d '/'

or

sed -e 's/^/proc/([0-9]*)/.*/1'

and you’ll only get the pid.
Also, if you add

2> /dev/null

in the find, you’ll get rid of any pesky error lines thrown by find. So this would work:

ps -p $(find /proc/*/fd/* -type l -lname 'anon_inode:inotify' -print 2> /dev/null | sed -e 's/^/proc/([0-9]*)/.*/1/')

Method 9

We needed to run this script against a fleet of servers so we wrote an Ansible playbook to accomplish this. It adapts several of the concepts in the others answers into a single playbook which will run the commands necessary to generate a report showing inotify watcher usage by PID.

$ ansible-playbook 
    -i systems-inventory/cluster2.lab1 
    playbooks/show_inotify_watcher_cnt.yml 
    -l ocp-master-02a.lab1*

NOTE: There’s an example usage of this playbook is a comment towards the end of the playbook below.

$ cat show_inotify_watcher_cnt.yml
###########################################################
# References
###########################################################
# - https://stackoverflow.com/questions/40230184/how-to-do-multiline-shell-script-in-ansible
###########################################################


- hosts: compute infra masters
  tasks:
    - shell:
        cmd: |
            cat <<EOF > /tmp/inotify_cnt.sh
            #!/bin/bash

            ## Get the procs sorted by the number of inotify watchers
            ##
            ## From `man find`: 
            ##    %h     Leading directories of file's name (all but the last element).  
            ##           If the file name contains no slashes (since it is in the current directory) 
            ##           the %h specifier expands to `.'.
            ##    %f     File's name with any leading directories removed (only the last element).
            lines=$(
                find /proc/*/fd 
                -lname anon_inode:inotify 
                -printf '%hinfo/%fn' 2>/dev/null 
                
                | xargs grep -c '^inotify'  
                | sort -n -t: -k2 -r 
                )

            printf "n%10sn" "INOTIFY"
            printf "%10sn" "WATCHER"
            printf "%10s  %5s     %sn" " COUNT " "PID" "CMD"
            printf -- "----------------------------------------n"
            for line in $lines; do
                watcher_count=$(echo $line | sed -e 's/.*://')
                pid=$(echo $line | sed -e 's//proc/([0-9]*)/.*/1/')
                cmdline=$(ps --columns 120 -o command -h -p $pid) 
                printf "%8d  %7d  %sn" "$watcher_count" "$pid" "$cmdline"
            done
            EOF
      become: yes

    - file: 
        dest: /tmp/inotify_cnt.sh
        mode: a+x

    - shell: /tmp/inotify_cnt.sh
      become: yes
      register: output

    - debug:
        var: output.stdout_lines

    - shell: |
        sysctl fs.inotify
      become: yes
      register: output

    - debug:
        var: output.stdout_lines


##########
# USAGE
##########
## $ ansible-playbook -i systems-inventory/cluster2.lab1 playbooks/show_inotify_watcher_cnt.yml -l ocp-master-02a.lab1*
##   
##   PLAY [compute infra masters] *****************************************************************************************************************************************************
##   
##   TASK [Gathering Facts] ***********************************************************************************************************************************************************
##   ok: [ocp-master-02a.lab1.mydomclec.local]
##   
##   TASK [shell] *********************************************************************************************************************************************************************
##   changed: [ocp-master-02a.lab1.mydomclec.local]
##   
##   TASK [file] **********************************************************************************************************************************************************************
##   ok: [ocp-master-02a.lab1.mydomclec.local]
##   
##   TASK [shell] *********************************************************************************************************************************************************************
##   changed: [ocp-master-02a.lab1.mydomclec.local]
##   
##   TASK [debug] *********************************************************************************************************************************************************************
##   ok: [ocp-master-02a.lab1.mydomclec.local] => {
##       "output.stdout_lines": [
##           "",
##           "   INOTIFY",
##           "   WATCHER",
##           "    COUNT     PID     CMD",
##           "----------------------------------------",
##           "     957     6553  /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
##           "      11      856  /usr/lib/systemd/systemd-udevd",
##           "      11     1457  /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid",
##           "      10     1471  /usr/sbin/rpc.gssd",
##           "       5        1  /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
##           "       5        1  /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
##           "       5     1508  /usr/sbin/NetworkManager --no-daemon",
##           "       4        1  /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
##           "       4        1  /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
##           "       4     1508  /usr/sbin/NetworkManager --no-daemon",
##           "       4     1378  /usr/lib/polkit-1/polkitd --no-debug",
##           "       3     4211  tail --follow=name /var/log/openvswitch/ovs-vswitchd.log /var/log/openvswitch/ovsdb-server.log",
##           "       3     1970  /usr/sbin/crond -n",
##           "       3     1378  /usr/lib/polkit-1/polkitd --no-debug",
##           "       2     1893  /usr/sbin/rsyslogd -n",
##           "       2     1389  /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation",
##           "       1     9012  virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
##           "       1     9012  virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
##           "       1     9012  virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
##           "       1     9012  virt-handler -v 3 --port 8443 --hostname-override ocp-master-02a.lab1.mydomclec.local --pod-ip-address 172.20.0.201",
##           "       1     6553  /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
##           "       1     6553  /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-",
##           "       1        1  /usr/lib/systemd/systemd --switched-root --system --deserialize 22",
##           "       1     1388  /usr/sbin/sssd -i --logger=files",
##           "       1     1359  /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_de",
##           "       1     1347  /usr/sbin/abrtd -d -s",
##           "       0     6553  /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-"
##       ]
##   }
##   
##   TASK [shell] *********************************************************************************************************************************************************************
##   changed: [ocp-master-02a.lab1.mydomclec.local]
##   
##   TASK [debug] *********************************************************************************************************************************************************************
##   ok: [ocp-master-02a.lab1.mydomclec.local] => {
##       "output.stdout_lines": [
##           "fs.inotify.max_queued_events = 16384",
##           "fs.inotify.max_user_instances = 128",
##           "fs.inotify.max_user_watches = 65536"
##       ]
##   }
##   
##   PLAY RECAP ***********************************************************************************************************************************************************************
##   ocp-master-02a.lab1.mydomclec.local : ok=7    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
##


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x