Bulk remove a large directory on a ZFS without traversing it recursively

I want to remove the contents of a zfs datasets subdir. It’s a large amount of data. For the pool “nas”, the path is /nas/dataset/certainFolder

$ du -h -d 1 certainFolder/
1.2T    certainFolder/

Rather than me have to wait for rm -rf certainFolder/ can’t I just destroy the handle to that directory so its overwrite-able(even by the same dir name if I chose to recreate it) ??

So for e.g. not knowing much about zfs file system internals,
specifically how it journals its files, I wonder if I was able to access
that journal/map directly, for e.g., then remove the right entries, so that the dir would no longer display. That space dir holds has to be removed from some kind of audit as well.

Is there an easy way to do this? Even if on an ext3 fs, or is that already what the recursive remove command has to do in the first place, i.e. pilfer through and edit journals?

I’m just hoping to do something of the likes of kill thisDir to where it simply removes some kind of ID, and poof the directory no longer shows up in ls -la. The data is still there on the drive obviously, but the space will now be reused(overwritten), because ZFS is just that cool?

I mean I think zfs is really that cool, how can we do it? Ideally? rubbing hands together 🙂

My specific use case (besides my love for zfs) is management of a backup archive. The data is pushed to zfs via freefilesync (AWESOME PROG) on/from win boxes across SMB to the zfs pool. When removing rm -rf /nas/dataset/certainFolder through a putty term, it stalls, the term is obviously unusable for a long time now. I of course then have to open another terminal, to continue. Thats gets old, plus its no fun to monitor the rm -rf, it can take hours.

Maybe I should set the command to just release the handle e.g. &, then print to std out, that might be nice. More realistically, recreate the data-set in a few seconds zfs destroy nas/dataset; zfs create -p -o compression=on nas/dataset after the thoughts from the response from @Gilles.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Tracking freed blocks is unavoidable in any decent file system and ZFS is no exception. There is however a simple way under ZFS to have a nearly instantaneous directory deletion by “deferring” the underlying cleanup. It is technically very similar to Gilles’ suggestion but is inherently reliable without requiring extra code.

If you create a snapshot of your file system before removing the directory, the directory removal will be very fast because nothing will need to be explored/freed under it, all being still referenced by the snapshot. You can then destroy the snapshot in the background so the space will be gradually recovered.

d=yourPoolName/BackupRootDir/hostNameYourPc/somesubdir
zfs snapshot ${d}@quickdelete && { 
    rm -rf /${d}/certainFolder
    zfs destroy ${d}@quickdelete & 
}

Method 2

What you’re asking for is impossible. Or, more precisely, there’s a cost to pay when deleting a directory and its files; if you don’t pay it at the time of the deletion, you’ll have to pay it elsewhere.

You aren’t just removing a directory — that would be near-instantaneous. You’re removing a directory and all the files inside it and also recursively likewise removing all of its subdirectories. Removing a file means decrementing its link count, and then marking its resources (the blocks use for file contents and file metadata, and the inode if the filesystem uses an inode table) as free if the link count reaches 0 and the file isn’t open. This is an operation that has to be done for every file in the directory tree, so the time it takes is at least proportional to the number of files.

You could delay the cost of marking the resources as free. For example, there are garbage-collected filesystems, where you can remove a directory without removing the files it contains. A run of the garbage collector will detect the files that aren’t reachable via the directory structure and mark them as free. Doing rm -f directory; garbage-collect on a garbage collected filesystem does the same things as rm -rf on a traditional filesystem, with different triggers. There are few garbage-collected filesystems because the GC is additional complexity which is rarely needed. The GC time could come at any moment, when the filesystem needs some free blocks and doesn’t find any, so the performance of an operation would be dependent on past history, not just on the operation, which is usually undesirable. You’d need to run the garbage collector just to get the actual amount of free space.

If you want to simulate the GC behavior on a normal filesystem, you can do it:

mv directory .DELETING; rm -rf .DELETING &

(I omitted many important details such as error checking, as resilience to power loss, etc.) The directory name becomes non-existent immediately; the space is reclaimed progressively.

A different approach to avoid paying the cost during removal without GC would be to pay it during allocation. Mark the directory tree as deleted, and go through deleted directories when allocating blocks. That would be hard to reconcile with hard links, but on a filesystem without hard links, it can be done with O(1) cost increase in allocation. However that would make a very common operation (creating or enlarging a file) more expensive, with the only benefit being a relatively rare operation (removing a large directory tree) cheaper.

You could bulk-remove a directory tree if that tree was stored as its own pool of blocks. (Note: I’m using the word “pool” in a different meaning from ZFS’s “storage pool”. I don’t know what the proper terminology is.) That could be very fast. But what do you do with the free space? If you reassign it to another pool, that has a cost, though a lot less than deleting files individually. If you leave the space as unused reserve space, you can’t reclaim it immediately. Having an individual pool for a directory tree means added costs to increase or reduce the size of that pool (either on the fly or explicitly). Making the tree its own storage pool also increases the cost of moving files into and out of the tree.

Method 3

If it has to be quick, I generate a new temporary directory, mv the directory below it and then recursively delete the temporary:

t=`mktemp -d`
mv certainFolder $t/
rm -rf $t &

Method 4

If the folder you want to delete and re-create quickly is in its own dataset (and not just a sub-directory of another dataset), you can do:

zfs rename pool/dataset pool/dataset.old
zfs create -o ...options... pool/dataset
zfs destroy -r pool/dataset.old

The new pool/dataset can be used immediately while the old one is being destroyed.

It’s a little more complicated than that if there are any child datasets that you don’t want to delete (e.g. pool/dataset/child, which will be renamed along with its parent as pool/dataset.old/child), but no more so than if there were sub-directories that you want to keep when deleting most of a sub-directory. Just rename them back into the new pool/dataset before destroying pool/dataset.old, e.g. zfs rename pool/dataset.old/child pool/dataset/child. Similarly, you mv a subdir of pool/dataset.old back into pool/dataset.

If it’s just a sub-directory, you can do the same as you would on any other filesystem:

mv subdir subdir.old
mkdir subdir
chmod ... subdir ; chown ... subdir    # if and as required
rm -rf subdir.old/ &

This is pretty much the same as what Gilles said in his answer.

Again, if there are child sub-directories you want to keep, move them back into the new subdir before running the rm -rf, e.g. mv subdir/child subdir/.

I’ve been doing this for decades, since the 1990s at least – the zfs rename version is just an obvious evolution of the same method. I can’t remember if directories could be renamed in MS-DOS but if they could, I was probably doing this with MS-DOS in the 1980s too.

BTW, for both datasets and sub-directories, you don’t have to delete the .old one immediately. I tend to keep them around until I’m sure I’ve retrieved everything I want to keep from them, or until I need to recover the disk space they’re using up. I like to delay the point-of-no-return for as long as possible.


BTW, it’s often a good idea to use datasets instead of sub-directories ZFS, because you can have different settings (e.g. for compression type, quota, reserved, atime/relatime, encryption, etc) on each dataset, and each dataset can be snapshotted and backed up with zfs send individually.

The price of that, though, is that moving a file or subdirectory tree to another dataset is a copy-and-delete operation, same as it would be when moving them to another filesystem on a different disk or partition or LV etc. A zfs dataset is effectively a different, separate filesystem with its own mount point. See How to move files from one zfs filesystem to a different zfs filesystem in the same pool? – a comment there brought me here today.

Also worth noting: a dataset’s mount-point can be anywhere in the filesystem hierarchy, it doesn’t have to be mounted directly under its parent, and the mount-point can be changed as needed. e.g. if I have a smallish SSD root pool called “rpool” and a large HDD pool for bulk data called “export”, I can move large sub-directories (and datasets) of rpool to export and still have them mounted in the same place. e.g.

zfs create export/share-doc
mv /usr/share/doc/* /export/share-doc/
zfs set mountpoint=/usr/share/doc export/share-doc

(this is a simplified example. In practice, I tend to duplicate the fs hierarchy – e.g. create datasets for export/usr, export/usr/share, export/usr/share/doc to a) keep the top level of a pool uncluttered and b) in case I need to move other subdirs or datasets there)

This distinction between dataset name & hierarchy vs mount-point is important to understand. If you recursively destroy a dataset, its children will also be destroyed no matter where they are mounted. So remember to rename any children that you want to keep.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x