Count files in a directory by extension

For the purpose of testing, I’d like count how many images files are inside a directory, separating each image file type by file extension (jpg=”yes”. This because later it will be useful for another script that will execute an action on each file extension). Can I use something like the following for only JPEG files?

jpg=""
count=`ls -1 *.jpg 2>/dev/null | wc -l`
if [ $count != 0 ]
then
echo jpg files found: $count ; jpg="yes"
fi

Considering file extensions jpg, png, bmp, raw and others, should I use a while cycle to do this?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

My approach would be:

  1. List all files in the directory
  2. Extract their extension
  3. Sort the result
  4. Count the occurrences of each extension

Sort of like this (last awk call is purely for formatting):

ls -q -U | awk -F . '{print $NF}' | sort | uniq -c | awk '{print $2,$1}'

(assuming GNU ls here for the -U option to skip sorting as an optimisation. It can be safely removed without affecting functionality if not supported).

Method 2

This recursively traverses files and counts extensions that match:

$ find . -type f | sed -e 's/.*.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
   6 tiff
   7 bmp
  26 jpeg
  38 gif
  51 jpg
  54 png

Method 3

I’d suggest a different approach, avoiding the possible word-splitting issues of ls

#!/bin/bash

shopt -s nullglob

for ext in jpg png gif; do 
  files=( *."$ext" )
  printf 'number of %s files: %dn' "$ext" "${#files[@]}"

  # now we can loop over all the files having the current extension
  for f in "${files[@]}"; do
    # anything else you like with these files
    :
  done 

done

You can loop over the files array with any other commands you want to perform on the files of each particular extension.


More portably – or for shells that don’t provide arrays explicitly – you could re-use the shell’s positional parameter array i.e.

set -- *."$ext"

and then replace ${#files[@]} and ${files[@]} with $# and "[email protected]"

Method 4

find -type f | sed -e 's/.*.//' | sort | uniq -c

Method 5

Maybe it can get shorter

exts=( *.jpg *.png *.gif ); printf "There are ${#exts[@]}" extensions;

Method 6

Anything involving ls is likely to produce unexpected results with special chars (space and other symbols). Any bashism (like arrays) isn’t portable. Anything involving while read is usually slow.

On the other hand, find is VERY flexible (lots of options to filter), it has [at least] two syntax which are fail safe for special chars… and It scales well on large directory.

For this example, I have used -iname to match both upper and lower case extension name. I have also restricted the -maxdepth 1 to respect your question’s “in current directory”. Rather than counting the number of lines, where filenames could include CR/LF, -print0 will print a NULL byte at the end of each filename… so | tr -d -c "00" | wc -l is accurately counting files (NULL bytes!).

extensions="jpg png gif"
for ext in $extensions; do
  c=$(find . -maxdepth 1 -iname "*.$ext" -print0 | tr -d -c "00" | wc -c)
  if [ $c -gt 0 ]; then
    echo "Found $c  *.$ext files"

    find . -maxdepth 1 -iname "*.$ext" -print0 | xargs -0 -r -n1 DOSOMETHINGHERE
    # or #  find . -maxdepth 1 -iname "*.$ext" -exec "ls" "-l" "{}" ";"
  fi
done

P.S. -print0 | tr -d -c "00" | wc -c can be replaced with -printf "00" | wc -c or even -printf 'n' | wc -l.

Method 7

can just use ls for something this simple IMO

ls -l /opt/ssl/certs/*.pem | wc -l

or

count=$(ls -l /some/folder/*.jpg | wc -l)

or

ls *.{mp3,exe,mp4} 2>/dev/null | wc -l

Method 8

Usually this type of task is best solved by breaking it up into chunks (the Unix philosophy). Find the files, strip out all but their extensions, sort alphabetically (to break ties) then by number of occurrences:

find . -type f | egrep -o '.[^/.]+$' | sort | uniq -c | sort -n

You might like additional flourishes. I removed files which are only extension (like .gitignore), combined results by case (so gif and GIF are both under gif), and stripped out the initial dot:

find . -type f | egrep -v '^.' | egrep -o '.[^/.]+$' | tr 'A-Z' 'a-z' | sed -e 's/^.//' | sort | uniq -c | sort -n

You might instead choose to limit to certain image types

find . -type f -iname '*.jpg' -o -iname '*.jpeg' -o -iname '*.png' -o -iname '*.bmp' -o -iname '*.raw' -o -iname '*.gif' | egrep -o '.[^.]+$' | uniq -c | sort -n

Hopefully these are both useable of themselves and show how to combine the various utilities into a nice result.

Method 9

If you are sure of the extension, you can go with find like

find *.jpeg | wc -l


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x