In the manual page of tar command, an option for following hard links is listed.
-h, --dereference
follow symlinks; archive and dump the files they point to
--hard-dereference
follow hard links; archive and dump the files they refer to
How does tar know that a file is a hard link? How does it follow it?
What if I don’t choose this option? How does it not hard-dereference?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
By default, if you tell tar to archive a file with hard links, and more than one such link is included among the files to be archived, it archives the file only once, and records the second (and any additional names) as hard links. This means that when you extract that archive, the hard links will be restored.
If you use the --hard-dereference option, then tar does not preserve hard links. Instead, it treats them as independent files that just happen to have the same contents and metadata. When you extract the archive, the files will be independent.
Note: It recognizes hard links by first checking the link count of the file. It records the device number and inode of each file with more than one link, and uses that to detect when the same file is being archived again. (When you use --hard-dereference, it does not do this.)
Method 2
You can distinguish a file with hard link(s) to it from a non-hard-linked file with the “link count”. I see two ways of getting this from the command line:
% stat original File: ‘original’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: 804h/2052d Inode: 932815 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 500/ bediger) Gid: ( 1000/ bediger) Access: 2012-07-13 22:13:52.317101530 -0600 Modify: 2012-07-13 22:13:52.317101530 -0600 Change: 2012-07-13 22:14:08.050894536 -0600 Birth: -
Or
1010 % ls -li total 0 932815 -rw-r--r-- 2 bediger bediger 0 Jul 13 22:13 original 932815 -rw-r--r-- 2 bediger bediger 0 Jul 13 22:13 secondary
That lonely ‘2’ before “bediger” is the link count. Note that both filenames have the same inode number, 932815.
I’m certain that both of these commands get the link count from the st_nlink field of struct stat, which gets filled in by a stat() system call.
As near as I can tell, running tar with --hard-dereference means that instead of getting a single file with two distinct filenames (as in example above), you get two files, each with a single filename. tar probably checks the link count on each file, and by default when extracting, it creates a hard link on the second filename it has for the hard-linked file data. When called with --hard-dereference on archive creation, it appears to create an entirely new file for the second file name when the extraction invocation of tar runs.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0