I have a sparse file, in which only some blocks are allocated:
~% du -h --apparent-size example 100K example ~% du -h example 52K example
I would like to know which blocks of the file are actually allocated. Is there a system call or kernel interface that could be used to get a list of either the allocations, or the holes of file?
Simply checking for a long enough string of zeros (the approach used by GNU cp, rsync, etc) does not work correctly:
~% cp example example1 ~% du -h example1 32K example1
It detected other sequences of zeros that were actually allocated.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
There is a similar question on SO. The currently accepted answer by @ephemient suggests using an ioctl called fiemap which is documented in linux/Documentation/filesystems/fiemap.txt. Quoting from that file:
The fiemap ioctl is an efficient method for userspace to get file
extent mappings. Instead of block-by-block mapping (such as bmap),
fiemap returns a list of extents.
Sounds like this is the kind of information you’re looking for. Support by filesystems is again optional:
File systems wishing to support fiemap must implement a
->fiemap
callback on theirinode_operationsstructure.
Support for the SEEK_DATA and SEEK_HOLE arguments to lseek you mentioned from Solaris was added in Linux 3.1 according to the man page, so you might use that as well. The fiemap ioctl appears to be older, so it might be more portable across different Linux versions for now, whereas lseek might be more portable across operating systems if Solaris has the same.
Method 2
There is a collection of python programs called sparseutils that use SEEK_HOLE and SEEK_DATA to determine which sections of the file are represented as holes and which are data. Usage is quite straightforward. mksparse can be used to generate a sparse file according to some given layout.
$ echo hole,data,hole | mksparse --hole-size 4096 --data-size 4096 example $ du -sh example 4.0K example
The sparsemap program can be used to print the layout to stdout:
$ sparsemap example HOLE 4096 DATA 4096 HOLE 4096
Method 3
It depends on the file system. I don’t believe their is a call, which may be why many tools don’t handle copying sparse files well. The GNU tool chain use searching for large blocks of zeros as that allows them to remove unused allocated blocks. Many copy tools will convert a sparse file into a file with all blocks allocated.
You will likely have to open the inode, and parse the result. Inode format is file system dependent. Some file systems may have part of your data in the inode itself.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0