I have a directory of 30 TB having billions of files in it which are formally all JPEG files. I am deleting each folder of files like this:
sudo rm -rf bolands-mills-mhcptz
This command just runs and doesn’t show anything whether it’s working or not.
I want to see as it’s deleting files or what is the current status of the command.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use rm -v to have rm print one line per file deleted. This way you can see that rm is indeed working to delete files. But if you have billions of files then all you will see is that rm is still working. You will have no idea how many files are already deleted and how many are left.
The tool pv can help you with a progress estimation.
http://www.ivarch.com/programs/pv.shtml
Here is how you would invoke rm with pv with example output
$ rm -rv dirname | pv -l -s 1000 > logfile 562 0:00:07 [79,8 /s] [====================> ] 56% ETA 0:00:05
In this contrived example I told pv that there are 1000 files. The output from pv shows that 562 are already deleted, elapsed time is 7 seconds, and the estimation to complete is in 5 seconds.
Some explanation:
pv -lmakespvto count by newlines instead of bytespv -s numbertellspvwhat the total is so that it can give you an estimation.- The redirect to
logfileat the end is for clean output. Otherwise the status line frompvgets mixed up with the output fromrm -v. Bonus: you will have a logfile of what was deleted. But beware the file will get huge. You can also redirect to/dev/nullif you don’t need a log.
To get the number of files you can use this command:
$ find dirname | wc -l
This also can take a long time if there are billions of files. You can use pv here as well to see how much it has counted
$ find dirname | pv -l | wc -l 278k 0:00:04 [56,8k/s] [ <=> ] 278044
Here it says that it took 4 seconds to count 278k files. The exact count at the end (278044) is the output from wc -l.
If you don’t want to wait for the counting then you can either guess the number of files or use pv without estimation:
$ rm -rv dirname | pv -l > logfile
Like this you will have no estimation to finish but at least you will see how many files are already deleted. Redirect to /dev/null if you don’t need the logfile.
Nitpick:
- do you really need
sudo? - usually
rm -ris enough to delete recursively. no need forrm -f.
Method 2
Check out lesmana’s answer, it’s much better than mine — especially the last pv example, which won’t take much longer than the original silent rm if you specify /dev/null instead of logfile.
Assuming your rm supports the option (it probably does since you’re running Linux), you can run it in verbose mode with -v:
sudo rm -rfv bolands-mills-mhcptz
As has been pointed out by a number of commenters, this could be very slow because of the amount of output being generated and displayed by the terminal. You could instead redirect the output to a file:
sudo rm -rfv bolands-mills-mhcptz > rm-trace.txt
and watch the size of rm-trace.txt.
Method 3
Another option is to watch the number of files on the filesystem decrease. In another terminal, run:
watch df -ih pathname
The used-inodes count will decrease as rm makes progress. (Unless the files mostly had multiple links, e.g. if the tree was created with cp -al). This tracks deletion progress in terms of number-of-files (and directories). df without -i will track in terms of space used.
You could also run iostat -x 4 to see I/O operations per second (as well as kiB/s, but that’s not very relevant for pure metadata I/O).
If you get curious about what files rm is currently working on, you can attach an strace to it and watch as the unlink() (and getdents) system calls spew on your terminal. e.g. sudo strace -p $(pidof rm). You can ^c the strace to detach from rm without interrupting it.
I forget if rm -r changes directory into the tree it’s deleting; if so you could look at /proc/<PID>/cwd. Its /proc/<PID>/fd might often have a directory fd open, so you could look at that to see what your rm process is currently looking at.
Method 4
While the above answers all use rm, rm can actually be quite slow at deleting a large numbers of files, as I recently observed when extracting ~100K files from a .tar archive actually took less time than deleting them. Although this does not actually answer the question you asked, a better solution to your problem might be to use a different method to delete your files, such as one of the upvoted answers to this question.
My personal favorite method is to use rsync -a --delete. I find that this method performs fast enough that it’s worth the ease-of-use over the most upvoted answer to that question, in which the author has written a C program that you would need to compile. (Note that this will output every file being processed to stdout, much like rm -rv; this can slow down the process by a surprising amount. If you do not want this output, use rsync -aq --delete or redirect the output to a file instead.)
The author of that answer says:
The program will now (on my system) delete 1000000 files in 43 seconds. The closest program to this was rsync -a –delete which took 60 seconds (which also does deletions in-order, too but does not perform an efficient directory lookup).
I have found that this is good enough for my purposes. Also potentially important from that answer, at least if you’re using ext4:
As a forethought, one should remove the affected directory and remake it after. Directories only ever increase in size and can remain poorly performing even with a few files inside due to the size of the directory.
Method 5
One thing you could do would be to start up the rm process in the background (with no output, so it won’t be slowed down) and then, monitor it in the foreground with a simple(a) command:
pax> ( D=/path/to/dir ; rm -rf $D & while true ; do ...> if [[ -d $D ]] ; then ...> echo "$(find $D | wc -l) items left" ...> else ...> echo "No items left" ...> break ...> fi ...> sleep 5 ...> done ) 27912 items left 224 items left No items left pax> _
The find/wc combo could be replaced with any tool able to give you the units you want.
(a) Well, relatively simple, compared to, say, nuclear physics, the Riemann hypothesis, or what to buy my wife for Xmas 🙂
Method 6
A while ago I wrote something to print the rate that lines were printed. You can run rm -rfv | ./counter and it will print lines per sec/min. Although not a direct progress, it will give you some feedback on the progress rate, maybe the rm wandered into a network filesystem or similar perhaps?
Link to the code is here:
http://www.usenix.org.uk/code/counter-0.01.tar.gz
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0