Does rsync verify files copied between two local drives?

I want to make a fresh new copy of a large number of files from one local drive to another.

I’ve read that rsync does a checksum comparison of files when sending them to a remote machine over a network.

  1. Will rsync make the comparison when copying the files between two local drives?
  2. If it does do a verification – is it a safe bet? Or is it better to do a byte by byte comparison?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

rsync always uses checksums to verify that a file was transferred correctly. If the destination file already exists, rsync may skip updating the file if the modification time and size match the source file, but if rsync decides that data need to be transferred, checksums are always used on the data transferred between the sending and receiving rsync processes. This verifies that the data received are the same as the data sent with high probability, without the heavy overhead of a byte-level comparison over the network.

Once the file data are received, rsync writes the data to the file and trusts that if the kernel indicates a successful write, the data were written without corruption to disk. rsync does not reread the data and compare against the known checksum as an additional check.

As for the verification itself, for protocol 30 and beyond (first supported in 3.0.0), rsync uses MD5. For older protocols, the checksum used is MD4.

While long considered obsolete for secure cryptographic hashes, MD5 and MD4 remain adequate for checking file corruption.

Source: the man page and eyeballing the rsync source code to verify.

Method 2

rsync does not do the post-copy verification for local file copies. You can verify that it does not by using rsync to copy a large file to a slow (i.e. USB) drive, and then copying the same file with cp, i.e.:

time rsync bigfile /mnt/usb/bigfile

time cp bigfile /mnt/usb/bigfile

Both commands take about the same amount of time, therefore rsync cannot possibly be doing the checksum—since that would involve re-reading the destination file off the slow disk.

The man page is unfortunately misleading about this. I also verified this with strace—after the copy is complete, rsync issues no read() calls on the destination file, so it cannot be checksumming it. One more you can verify it is with something like iotop: you see rsync doing read and write simultaneously (copying from source to destination), then it exits. If it were verifying integrity, there would be a read-only phase.

Method 3

rsync makes a checksum comparison before copying (in some cases), to avoid copying what’s already there. The point of the checksum comparison is not to verify that the copy was successful. That’s the job of the underlying infrastructure: the filesystem drivers, the disk drivers, the network drivers, etc. Individual applications such as rsync don’t need to bother with this madness. All rsync needs to do (and does!) is to check the return values of system calls to make sure there was no error.

Method 4

Quick and dirty answers, directly to the questions.

Q: Will rsync make the comparison when copying the files between two local drives?

A: It will do comparison to figure out what to copy.

Q: If it does do a verification – is it a safe bet? Or is it better to do a byte by byte comparison?

A: as safe as the mathematics behind MD5 checksum of file. You can try to do simple experiment to learn and trust the tool.

Long answer: I guess, you wanted rsync to do file comparison (bit by bit or by checksum) after copying files. If you are one of the few that value data integrity, you might find the below useful:

rsync -avh [source] [destination] && rsync -avhc [source] [destination]

The above code rsync files folder on first run and if complete without issue, will run rsync again immediately while performing same file name comparison by using hash of entire file.

Method 5

Using rsync to verify the integrity of a duplicate

To guarantee that this test physically re-reads the files from the drive media, I suggest powering-down both drives and restarting them before running this test. This will clear their internal volatile caches.

If not also restarting Linux, you should at least drop the caches (*) with:

sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Then to re-read both trees and compare their checksums:

rsync --dry-run --checksum --itemize-changes --archive SRC DEST

Modern rsync checksum uses MD5, which is 128 bits. The likelihood of this failing to detect an error in an individual file is astronomically low (some discussion here), but not impossible.

Method 6

Instead of using rsync you can use cp -rp to recursively copy a directory, followed by diff -r. GNU diff accepts two directory trees, which are both read in full when compared.

Rationale (thanks @they): both cp and diff are usually already installed and quite suitable for a onetime action like OP’s “I want to make a fresh new copy of a large number of files from one local drive to another.”

Method 7

This answer is to clarify confusion I had myself and which is (IMO) difficult to understand from comments to accepted question. rsync man page states:

Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a whole-file
check‐
sum that is generated as the file is transferred

It is probably correct, but it does not mean the actual written files data are checked after copy. First, with local drives both sides are one PC (so rsync might skip the check if side is the same); second, checksum “is generated as the file is transferred” (emphasis mine), not as being written to the media.

P.S. I’ve just investigated my finding that after copying with rsync I’ve got files with mismatched checksums (I copied on glitchy PC I knew that beforehand but from reading man page thought rsync would make sure files are copied properly) and found that QA.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x