Tar produces different files each time

I often have large directories that I want to transfer to a local computer from a server. Instead of using recursive scp or rsync on the directory itself, I’ll often tar and gzip it first and then transfer it.

Recently, I’ve wanted to check that this is actually working so I ran md5sum on two independently generated tar and gzip archives of the same source directory. To my suprise, the MD5 hash was different. I did this two more times and it was always a new value. Why am I seeing this result? Are two tar and gzipped directories both generated with the same version of GNU tar in the exact same way not supposed to be exactly the same?

For clarity, I have a source directory and a destination directory. In the destination directory I have dir1 and dir2. I’m running:

tar -zcvf /destination/dir1/source.tar.gz source && md5sum /destination/dir1/source.tar.gz >> md5.txt

tar -zcvf /destination/dir2/source.tar.gz source && md5sum /destination/dir2/source.tar.gz >> md5.txt

Each time I do this, I get a different result from md5sum. Tar produces no errors or warnings.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

From the looks of things you’re probably being bitten by gzip timestamps; to avoid those, run

GZIP=-n tar -zcvf ...

Note that to get fully reproducible tarballs, you should also impose the sort order used by tar:

GZIP=-n tar --sort=name -zcvf ...

If your version of tar doesn’t support --sort, use this instead:

find source -print0 | LC_ALL=C sort -z | GZIP=-n tar --no-recursion --null -T - -zcvf ...

Method 2

On Mac @stephen-kitt’s answer didn’t work for me, not exactly sure why but when I separated the gzip from the tar command it started producing the same hash. Here’s what I ended up with:

outputpath="$(pwd)/folder_to_zip" 
find "$outputpath" -print0 | LC_ALL=C sort -z | tar -s "#$outputpath/##" --no-recursion --null -T - -cf - | gzip -n > "$outputpath.tar.gz" && md5 "$outputpath.tar.gz"


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x