How can I use find to generate a list of directories which contain the most numbers of files. I’d like the list to be from highest to lowest. I’d only like the listing to go 1 level deep, and I’d typically run this command from the top of my filesystem, i.e. /.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
UPDATE: I did all of that below, which is cool, but I came up with a better way of sorting directories by inode use:
du --inodes -S | sort -rh | sed -n
'1,50{/^.{71}/s/^(.{30}).*(.{37})$/1...2/;p}'
And if you want to stay in the same filesystem you do:
du --inodes -xS
Here’s some example output:
15K /usr/share/man/man3 4.0K /usr/lib 3.6K /usr/bin 2.4K /usr/share/man/man1 1.9K /usr/share/fonts/75dpi ... 519 /usr/lib/python2.7/site-packages/bzrlib 516 /usr/include/KDE 498 /usr/include/qt/QtCore 487 /usr/lib/modules/3.13.6-2-MANJARO/build/include/config 484 /usr/src/linux-3.12.14-2-MANJARO/include/config
NOW WITH LS:
Several people mentioned they do not have up-to-date coreutils and the –inodes option is not available to them. So, here’s ls:
sudo ls -AiR1U ./ |
sed -rn '/^[./]/{h;n;};G;
s|^ *([0-9][0-9]*)[^0-9][^/]*([~./].*):|1:2|p' |
sort -t : -uk1.1,1n |
cut -d: -f2 | sort -V |
uniq -c |sort -rn | head -n10
This is providing me pretty much identical results to the du command:
DU:
15K /usr/share/man/man3 4.0K /usr/lib 3.6K /usr/bin 2.4K /usr/share/man/man1 1.9K /usr/share/fonts/75dpi 1.9K /usr/share/fonts/100dpi 1.9K /usr/share/doc/arch-wiki-markdown 1.6K /usr/share/fonts/TTF 1.6K /usr/share/dolphin-emu/sys/GameSettings 1.6K /usr/share/doc/efl/html
LS:
14686 /usr/share/man/man3: 4322 /usr/lib: 3653 /usr/bin: 2457 /usr/share/man/man1: 1897 /usr/share/fonts/100dpi: 1897 /usr/share/fonts/75dpi: 1890 /usr/share/doc/arch-wiki-markdown: 1613 /usr/include: 1575 /usr/share/doc/efl/html: 1556 /usr/share/dolphin-emu/sys/GameSettings:
I think the include thing just depends on which directory the program looks at first – because they’re the same files and hardlinked. Kinda like the thing above. I could be wrong about that though – and I welcome correction…
The underlying method to this is that I replace every one of ls‘s filenames with its containing directory name in sed. Following on from that… Well, I’m a little fuzzy myself. I’m fairly certain it’s accurately counting the files, as you can see here:
% _ls_i ~/test > 100 /home/mikeserv/test/realdir > 2 /home/mikeserv/test > 1 /home/mikeserv/test/linkdir
DU DEMO
% du --version > du (GNU coreutils) 8.22
Make a test directory:
% mkdir ~/test ; cd ~/test % du --inodes -S > 1 .
Some children directories:
% mkdir ./realdir ./linkdir % du --inodes -S > 1 ./realdir > 1 ./linkdir > 1 .
Make some files:
% printf 'touch ./realdir/file%sn' `seq 1 100` | . /dev/stdin % du --inodes -S > 101 ./realdir > 1 ./linkdir > 1 .
Some hardlinks:
% printf 'n="%s" ; ln ./realdir/file$n ./linkdir/link$nn' `seq 1 100` |
. /dev/stdin
% du --inodes -S
> 101 ./realdir
> 1 ./linkdir
> 1 .
Look at the hardlinks:
% cd ./linkdir % du --inodes -S > 101 % cd ../realdir % du --inodes -S > 101
They’re counted alone, but go one directory up…
% cd .. % du --inodes -S > 101 ./realdir > 1 ./linkdir > 1 .
Then I ran my ran script from below and:
> 100 /home/mikeserv/test/realdir > 100 /home/mikeserv/test/linkdir > 2 /home/mikeserv/test
And Graeme’s:
> 101 ./realdir > 101 ./linkdir > 3 ./
So I think this shows that the only way to count inodes is by inode. And because counting files means counting inodes, you cannot doubly count inodes – to count files accurately inodes cannot be counted more than once.
OLD:
I find this faster, and it’s portable:
sh <<-CMD
{ echo 'here='"$PWD"
printf 'cd "${here}/%s" 2>/dev/null && {
set --
for glob in ".[!.]*" "[!.]*" ; do
set -- $glob "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="715531">[email protected]</a>" &&
[ -e "./$1" ] || shift
done
printf "%%s\t%%s\n" $# "$PWD"
}n' $( find . -depth -type d 2>/dev/null )
} | . /dev/stdin |
sort -rn |
sed -n
'1,50{/^.{71}/s/^(.{30}).*(.{37})$/1...2/;p}'
CMD
It doesn’t have to -exec for every directory – it only uses the one shell process and one find. I have to get the set -- $glob right still to include .hidden files and all else, but it’s very close and very fast. You would just cd into whatever your root directory should be for the check and off you go.
Here’s a sample of my output run from /usr:
14684 /usr/share/man/man3 4322 /usr/lib 3650 /usr/bin 2454 /usr/share/man/man1 1897 /usr/share/fonts/75dpi ... 557 /usr/share/gtk-doc/html/gtk3 557 /usr/share/doc/elementary/latex 539 /usr/lib32/wine/fakedlls 534 /usr/lib/python2.7/site-packages/bzrlib 500 /usr/lib/python3.3/test
I also use sed at the bottom there to trim it to the top 50 results. head would be faster, of course, but I also trim each line if necessary:
... 159 /home/mikeserv/.config/hom...hhkdoolnlbekcfllmednbl/4.30_0/plugins 154 /home/mikeserv/.config/hom...odhpcledpamjachpmelml/1.3.11_0/js/ace ...
It’s crude, admittedly, but it was a thought. Another crude device I use is dumping 2>stderr for both find and cd into 2>/dev/null. It’s just cleaner than looking at permissions errors for directories I can’t read without root access – perhaps I should specify that to find. Well, it’s a work in progress.
Ok, so I did fix the shell globs like this:
for glob in ".[!.]*" "[!.]*" ; do
set -- $glob "<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="dcf89c">[email protected]</a>" &&
[ -e "./$1" ] || shift
done
I was actually going to ask a question on how it could be done, but as I was typing in the question title the site pointed me to a suggested related question where, lo and behold, Stephane had already weighed in. So that was convenient. Apparently [^.], while well-supported, is not portable and you have to use the !bang. I found that in Stephane’s comment there.
Anyway, just pulling in hidden files wasn’t enough though, obviously. So I have to set twice in order to avoid searching positionals for the literal $glob. Still, it doesn’t seem to affect performance at all, and it reliably adds every file in the directory.
Method 2
Using GNU tools:
find / -xdev -type d -print0 |
while IFS= read -d '' dir; do
echo "$(find "$dir" -maxdepth 1 -print0 | grep -zc .) $dir"
done |
sort -rn |
head -50
This uses two find commands. The first finds directories and pipes them to a while loop runs the next find for each directory. The second lists all the child files/directories in the first level while grep counts them. The grep allows -print0 to be used with the second find since wc does not have a -z equivalent. This stops filenames with a newline from being counted twice (although using wc and no -print0 wouldn’t make much difference).
The result of the second find is placed in the argument to echo so it and the directory name can easily be placed on the same line (the $(..) construct automatically trims the newline at the end of grep). Lines are then sorted by number and the 50 largest numbers shown with head.
Note that this will also include the top level directories of mount points. A simple way to get around this is to use a bind mount and then use the directory of the mount. To do this:
sudo mount --bind / /mnt
A more portable solution uses a different shell instance for each directory (also answered here):
find / -xdev -type d -exec sh -c '
echo "$(find "$0" | grep "^$0/[^/]*$" | wc -l) $0"' {} ; |
sort -rn |
head -50
Sample output:
9225 /var/lib/dpkg/info 6322 /usr/share/qt4/doc/html 4927 /usr/share/man/man3 2301 /usr/share/man/man1 2097 /usr/share/doc 2097 /usr/bin 1863 /usr/lib/x86_64-linux-gnu 1679 /var/cache/apt/archives 1628 /usr/share/qt4/doc/src/images 1614 /usr/share/qt4/doc/html/images 1308 /usr/share/scilab/modules/overloading/macros 1083 /usr/src/linux-headers-3.13-1-common/include/linux 1071 /usr/src/linux-headers-3.13-1-amd64/include/config 847 /usr/include/qt4/QtGui 774 /usr/include/qt4/Qt 709 /usr/share/man/man8 616 /usr/lib 611 /usr/share/icons/oxygen/32x32/actions 608 /usr/share/icons/oxygen/22x22/actions 598 /usr/share/icons/oxygen/16x16/actions 579 /usr/share/bash-completion/completions 574 /usr/share/icons/oxygen/48x48/actions 570 /usr/share/vim/vim74/syntax 546 /usr/share/scilab/modules/m2sci/macros/sci_files 531 /usr/lib/i386-linux-gnu/wine/wine 530 /usr/lib/i386-linux-gnu/wine/wine/fakedlls 496 /etc/ssl/certs 457 /usr/share/mime/application 454 /usr/share/man/man2 450 /usr/include/qt4/QtCore 443 /usr/lib/python2.7 419 /usr/src/linux-headers-3.13-1-common/include/uapi/linux 413 /usr/share/fonts/X11/misc 413 /usr/include/linux 375 /usr/share/man/man5 374 /usr/share/lintian/overrides 372 /usr/share/cmake-2.8/Modules 370 /usr/share/fonts/X11/75dpi 370 /usr/share/fonts/X11/100dpi 356 /usr/share/icons/gnome/24x24/actions 356 /usr/share/icons/gnome/22x22/actions 356 /usr/share/icons/gnome/16x16/actions 353 /usr/share/icons/gnome/48x48/actions 353 /usr/share/icons/gnome/32x32/actions 341 /usr/lib/ghc/ghc-7.6.3 326 /usr/sbin 324 /usr/share/scilab/modules/compatibility_functions/macros 324 /usr/share/scilab/modules/cacsd/macros 320 /usr/share/terminfo/a 319 /usr/share/i18n/locales
Method 3
Why not use something like KDirStat
Although it was originally written for KDE but it works fine with GNOME aswell
It gives you best view of number of file/dir and respective usage in GUI
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0