Dereferencing hard links

In the manual page of tar command, an option for following hard links is listed.

-h, --dereference
      follow symlinks; archive and dump the files they point to

      follow hard links; archive and dump the files they refer to

How does tar know that a file is a hard link? How does it follow it?

What if I don’t choose this option? How does it not hard-dereference?


Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

By default, if you tell tar to archive a file with hard links, and more than one such link is included among the files to be archived, it archives the file only once, and records the second (and any additional names) as hard links. This means that when you extract that archive, the hard links will be restored.

If you use the --hard-dereference option, then tar does not preserve hard links. Instead, it treats them as independent files that just happen to have the same contents and metadata. When you extract the archive, the files will be independent.

Note: It recognizes hard links by first checking the link count of the file. It records the device number and inode of each file with more than one link, and uses that to detect when the same file is being archived again. (When you use --hard-dereference, it does not do this.)

Method 2

You can distinguish a file with hard link(s) to it from a non-hard-linked file with the “link count”. I see two ways of getting this from the command line:

% stat original
  File: ‘original’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 804h/2052d      Inode: 932815      Links: 2
Access: (0644/-rw-r--r--)  Uid: (  500/ bediger)   Gid: ( 1000/ bediger)
Access: 2012-07-13 22:13:52.317101530 -0600
Modify: 2012-07-13 22:13:52.317101530 -0600
Change: 2012-07-13 22:14:08.050894536 -0600
 Birth: -

1010 % ls -li 
total 0
932815 -rw-r--r-- 2 bediger bediger 0 Jul 13 22:13 original
932815 -rw-r--r-- 2 bediger bediger 0 Jul 13 22:13 secondary

That lonely ‘2’ before “bediger” is the link count. Note that both filenames have the same inode number, 932815.

I’m certain that both of these commands get the link count from the st_nlink field of struct stat, which gets filled in by a stat() system call.

As near as I can tell, running tar with --hard-dereference means that instead of getting a single file with two distinct filenames (as in example above), you get two files, each with a single filename. tar probably checks the link count on each file, and by default when extracting, it creates a hard link on the second filename it has for the hard-linked file data. When called with --hard-dereference on archive creation, it appears to create an entirely new file for the second file name when the extraction invocation of tar runs.

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments