I’m playing with btrfs, which allows cp --reflink to copy-on-write. Other programs, such as lxc-clone, may use this feature as well. My question is, how to tell if a file is a CoW of another? Like for hardlink, I can tell from the inode number.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Good question. Looks like there aren’t currently any easy high-level ways to tell.
One problem is that a file may only share part of the data via Copy-on-Write. This is called a physical extent, and some or all of the physical extents may be shared between CoW files.
There is nothing analogous to an (Edit: see my other answer).inode which, when compared between files, would tell you that the files share the same physical extents.
The low level answer is that you can ask the kernel which physical extents are used for the file using the FS_IOC_FIEMAP ioctl, which is documented in Documentation/filesystems/fiemap.txt. In principle, if all of the physical extents are the same, then the file must be sharing the same underlying storage.
Few things implement a way to look at this information at a higher level. I found some go code here. Apparently the filefrag utility is supposed to show the extents with -v. In addition, btrfs-debug-tree shows this information.
I would exercise caution however, since these things may have had little use in the wild for this purpose, you could find bugs giving you wrong answers, so beware relying on this data for deciding on operations which could cause data corruption.
Some related questions:
- How to find out if a file on btrfs is copy-on-write?
- How to find data copies of a given file in Btrfs filesystem?
Method 2
Further to my previous answer, I have just released fienode which computes a SHA1 hash of the physical extents of the file and can be used to find some (identical) reflink copies. Beware though, there are caveats (see the documentation). BTRFS decided to change some, but not all, of the physical extents of a refink copy I made without provocation or warning, causing the value to change.
Method 3
Easiest solution for this is using btrfs filesystem du .
Exclusive will be 0.00B if it is CoW.
Found here: https://unix.stackexchange.com/a/655813/525352
Method 4
This does not add much to the accepted answer but someone has summarised the problems and several methods here – https://www.ctrl.blog/entry/distinguish-file-link-clone.html
Problems:
- distinguish symbolic links and hard links from ref links
- identify partial clones (files that share some but not all data)
Solutions:
- Use filefrag
- Use
statto identify the device as clones must reside on the same filesystem.
Quote:
Hard links share the same inode number as their destination, whereas
clones have their own inodes. This distinction (plus a copy-on-write
file system) is what enables clones to act independently of their
originals even when modified by non-cloning aware programs.”
- Run the command filefrag -v file1 file2 (part of e2fsprogs). Compare the files’ physical_offset ranges within the extent rows that
have the shared flag set.The two files share deduplicated/cloned data on the storage drive if
they share any identical or overlapping ranges.As to determining which is the original and which is the clone … .
That is almost impossible to determine without a time machine.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0