Find the total size of certain files within a directory branch

Assume there’s an image storage directory, say, ./photos/john_doe, within which there are multiple subdirectories, where many certain files reside (say, *.jpg). How can I calculate a summary size of those files below the john_doe branch?

I tried du -hs ./photos/john_doe/*/*.jpg, but this shows individual files only. Also, this tracks only the first nest level of the john_doe directory, like john_doe/june/, but skips john_doe/june/outrageous/.

So, how could I traverse the entire branch, summing up the size of the certain files?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Method 11

Method 12

Method 13

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

If more than one invocation of du is required because the file list is very long, multiple totals will be reported and need to be summed.

Method 2

du -ch public_html/images/*.jpg | grep total
20M total

gives me the total usage of my .jpg files in this directory.

To deal with multiple directories you’d probably have to combine this with find somehow.

You might find du command examples useful (it also includes find)

Method 3

Primarily, you need two things:

the -c option to du, to tell it to produce a grand total;
either ** (activation instructions) or find (example) or to traverse subdirectories.

du -ch -- **/*.jpg | tail -n 1

Method 4

The ultimate answer is:

{ find <DIR> -type f -name "*.<EXT>" -printf "%s+"; echo 0; } | bc

and even faster version, not limited by RAM, but that requires GNU AWK with bignum support:

find <DIR> -type f -name "*.<EXT>" -printf "%sn" | gawk -M '{t+=$1}END{print t}'

This version has the following features:

all capabilities of find to specify the files you’re looking for
supports millions of files
- other answers here are limited by the maximum length of the argument list
spawns only 3 simple processes with a minimal pipe throughput
- many answers here spawn C+N processes, where C is some constant and N is the number of files
doesn’t bother with string manipulation
- this version doesn’t do any grepping, or regexing
- well, find does a simple wildcard matching of filenames
optionally formats the sum into a human-readable form (eg. 5.5K, 176.7M, …)
- to do that append | numfmt --to=si

Method 5

The answers given until now do not take into account that the file list passed from find to du may be so long that find automatically splits the list into chunks, resulting in multiple occurences of total.

You can either grep total (locale!) and sum up manually, or use a different command. AFAIK there are only two ways to get a grand total (in kilobytes) of all files found by find:
find . -type f -iname '*.jpg' -print0 | xargs -r0 du -a| awk '{sum+=$1} END {print sum}'

Explanation
find . -type f -iname '*.jpg' -print0: Find all files with the extension jpg regardless of case (i.e. *.jpg, *.JPG, *.Jpg…) and output them (null-terminated).
xargs -r0 du -a:
-r: Xargs would call the command even with no arguments passed, which -r prevents. -0 means null-terminated strings (not newline terminated).
awk '{sum+=$1} END {print sum}': Sum up the file sizes output by the previous command

And for reference, the other way would be
find . -type f -iname '*.jpg' -print0 | du -c --files0-from=-

Method 6

If the list of files is too big that it can’t be passed to a single invocation of du -c, on a GNU system, you can do:

find . -iname '*.jpg' -type f -printf '%bt%D:%in' |
  sort -u | cut -f1 | paste -sd+ - | bc

(size expressed in number of 512 byte blocks). Like du it tries to count hard links only once. If you don’t care about hardlinks, you can simplify it to:

(printf 0; find . -iname '*.jpg' -type f -printf +%b) | bc

If you want the size instead of disk usage, replace %b with %s. The size will then be expressed in bytes.

Method 7

The solutions mentioned so far are inefficient (exec is expensive) and require additional manual work to sum if the file list is long or they don’t work on Mac OS X. The following solution is very fast, should work on any system, and yields the total answer in GB (remove a /1024 if you want to see the total in MB):
find . -iname "*.jpg" -ls |perl -lane '$t += $F[6]; print $t/1024/1024/1024 . " GB"'

Method 8

Improving SHW’s great answer to make it work with any locale, like Zbyszek already pointed out in his comment:

LC_ALL=C find ./photos/john_doe -type f -name '*.jpg' -exec du -ch {} + | grep total$

Method 9

du naturally traverses the directory hierarchy and awk can perform the filtering so something like this may be sufficient:

du -ak | awk 'BEGIN {sum=0} /.jpg$/ {sum+=$1} END {print sum}'

This works without GNU.

Method 10

This is what worked for me.

find -type f -iname *.jpg -print0 | du -ch --files0-from=- | grep total$

Method 11

Another would be

ls -al <directory> | awk '{t+=$5}END{print t}}'

Assuming you’re looking in a single directory. If you want to look at the current directory and beneath that

ls -Ral <directory> | awk '{t+=$5}END{print t}}'

Method 12

Other alternative using stat rather than du

stat -L -c %s ** | awk '{s+=$1} END {printf "%.0fn", s}'

See Gilles answer about using **

Method 13

This is a mashup of several answers and comments that do what I need.

find . ( -iname "*.jpg" -o -iname "*.png" ) -type f -exec du -bc {} + | grep total$ | cut -f1 | awk '{ total += $1 }; END { print total }'| numfmt --to=iec

find will get all the files recursively
-iname is for case INsensitive
-o and parenthesis to look for multiple patterns
du -bc will get the files’ size, sometimes in more than one call if there are many files
grep total will get only the total line as given by du
cut -f1 will take only the actual integer values
awk will sum them all
numfmt will convert it to a human-readable format

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating