Assume there’s an image storage directory, say, ./photos/john_doe, within which there are multiple subdirectories, where many certain files reside (say, *.jpg). How can I calculate a summary size of those files below the john_doe branch?
I tried du -hs ./photos/john_doe/*/*.jpg, but this shows individual files only. Also, this tracks only the first nest level of the john_doe directory, like john_doe/june/, but skips john_doe/june/outrageous/.
So, how could I traverse the entire branch, summing up the size of the certain files?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$
If more than one invocation of du is required because the file list is very long, multiple totals will be reported and need to be summed.
Method 2
du -ch public_html/images/*.jpg | grep total 20M total
gives me the total usage of my .jpg files in this directory.
To deal with multiple directories you’d probably have to combine this with find somehow.
You might find du command examples useful (it also includes find)
Method 3
Primarily, you need two things:
- the
-coption todu, to tell it to produce a grand total; - either
**(activation instructions) orfind(example) or to traverse subdirectories.
du -ch -- **/*.jpg | tail -n 1
Method 4
The ultimate answer is:
{ find <DIR> -type f -name "*.<EXT>" -printf "%s+"; echo 0; } | bc
and even faster version, not limited by RAM, but that requires GNU AWK with bignum support:
find <DIR> -type f -name "*.<EXT>" -printf "%sn" | gawk -M '{t+=$1}END{print t}'
This version has the following features:
- all capabilities of
findto specify the files you’re looking for - supports millions of files
- other answers here are limited by the maximum length of the argument list
- spawns only 3 simple processes with a minimal pipe throughput
- many answers here spawn C+N processes, where C is some constant and N is the number of files
- doesn’t bother with string manipulation
- this version doesn’t do any grepping, or regexing
- well,
finddoes a simple wildcard matching of filenames
- optionally formats the sum into a human-readable form (eg.
5.5K,176.7M, …)- to do that append
| numfmt --to=si
- to do that append
Method 5
The answers given until now do not take into account that the file list passed from find to du may be so long that find automatically splits the list into chunks, resulting in multiple occurences of total.
You can either grep total (locale!) and sum up manually, or use a different command. AFAIK there are only two ways to get a grand total (in kilobytes) of all files found by find:
find . -type f -iname '*.jpg' -print0 | xargs -r0 du -a| awk '{sum+=$1} END {print sum}'
Explanation
find . -type f -iname '*.jpg' -print0: Find all files with the extension jpg regardless of case (i.e. *.jpg, *.JPG, *.Jpg…) and output them (null-terminated).
xargs -r0 du -a:
-r: Xargs would call the command even with no arguments passed, which -r prevents. -0 means null-terminated strings (not newline terminated).
awk '{sum+=$1} END {print sum}': Sum up the file sizes output by the previous command
And for reference, the other way would be
find . -type f -iname '*.jpg' -print0 | du -c --files0-from=-
Method 6
If the list of files is too big that it can’t be passed to a single invocation of du -c, on a GNU system, you can do:
find . -iname '*.jpg' -type f -printf '%bt%D:%in' | sort -u | cut -f1 | paste -sd+ - | bc
(size expressed in number of 512 byte blocks). Like du it tries to count hard links only once. If you don’t care about hardlinks, you can simplify it to:
(printf 0; find . -iname '*.jpg' -type f -printf +%b) | bc
If you want the size instead of disk usage, replace %b with %s. The size will then be expressed in bytes.
Method 7
The solutions mentioned so far are inefficient (exec is expensive) and require additional manual work to sum if the file list is long or they don’t work on Mac OS X. The following solution is very fast, should work on any system, and yields the total answer in GB (remove a /1024 if you want to see the total in MB):
find . -iname "*.jpg" -ls |perl -lane '$t += $F[6]; print $t/1024/1024/1024 . " GB"'
Method 8
Improving SHW’s great answer to make it work with any locale, like Zbyszek already pointed out in his comment:
LC_ALL=C find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$
Method 9
du naturally traverses the directory hierarchy and awk can perform the filtering so something like this may be sufficient:
du -ak | awk 'BEGIN {sum=0} /.jpg$/ {sum+=$1} END {print sum}'
This works without GNU.
Method 10
This is what worked for me.
find -type f -iname *.jpg -print0 | du -ch --files0-from=- | grep total$
Method 11
Another would be
ls -al <directory> | awk '{t+=$5}END{print t}}'
Assuming you’re looking in a single directory. If you want to look at the current directory and beneath that
ls -Ral <directory> | awk '{t+=$5}END{print t}}'
Method 12
Other alternative using stat rather than du
stat -L -c %s ** | awk '{s+=$1} END {printf "%.0fn", s}'
See Gilles answer about using **
Method 13
This is a mashup of several answers and comments that do what I need.
find . ( -iname "*.jpg" -o -iname "*.png" ) -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }'| numfmt --to=iec
findwill get all the files recursively-inameis for case INsensitive-oand parenthesis to look for multiple patternsdu -bcwill get the files’ size, sometimes in more than one call if there are many filesgrep totalwill get only thetotalline as given byducut -f1will take only the actual integer valuesawkwill sum them allnumfmtwill convert it to a human-readable format
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0