I am in the process of salvaging data from a 1 TB failing drive (asked about it in Procedure to replace a hard disk?). I have done ddrescue from a system rescue USB with a resulting error size of 557568 B in 191 errors, probably all in /home (I assume what it calls “errors” are not bad sectors, but consecutive sequences of them).
Now, the several guides I’ve seen around suggest doing e2fsck on the new disk, and I expected this to somehow find that some files have been assigned “blank sectors/blocks”, to the effect of at least knowing which files could not be saved whole. But no errors were found at all (I ran it without -y to make sure I didn’t miss anything). Now I am running it again with -c, but at 95% no errors were found so far; I guess I have a new drive with some normal-looking files with zeroed or random pieces inside, undetectable until on day I open them with the corresponding software, or Linux Mint needs them.
Can I do anything with the old/new drives in order to obtain a list of possibly corrupted files? I don’t know how many they could be, since that 191 could go across files, but at least the total size is not big; I am mostly concerned about a big bunch old family photos and videos (1+ MB each), the rest is probably irrelevant or was backed up recently.
Update: the new pass of e2fsck did give something new of which I understand nothing:
Block bitmap differences: +231216947 +(231216964--231216965) +231216970 +231217707 +231217852 +(231217870--231217871) +231218486 Fix<y>? yes Free blocks count wrong for group #7056 (497, counted=488). Fix<y>? yes Free blocks count wrong (44259598, counted=44259589). Fix<y>? yes
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You’ll need the block numbers of all encountered bad blocks (ddrescue should have given you a list, I hope you saved it), and then you’ll need to find out which files make use of these blocks (see e.g. here). You may want to script this if there are a lot of bad blocks.
e2fsck doesn’t help, it just checks consistency of the file system itself, so it will only act of the bad blocks contain “adminstrative” file system information.
The bad blocks in the files will just be empty.
Edit
Ok, let’s figure out the block size thingy. Let’s make a trial filesystem with 512-byte device blocks:
$ dd if=/dev/zero of=fs bs=512 count=200 $ /sbin/mke2fs fs $ ll fs -rw-r--r-- 1 dirk dirk 102400 Apr 27 10:03 fs $ /sbin/tune2fs -l fs ... Block count: 100 ... Block size: 1024 Fragment size: 1024 Blocks per group: 8192 Fragments per group: 8192
So the filesystem block size is 1024, and we’ve 100 of those filesystem blocks (and 200 512-byte device blocks). Rescue it:
$ ddrescue -b512 fs fs.new fs.log GNU ddrescue 1.19 Press Ctrl-C to interrupt rescued: 102400 B, errsize: 0 B, current rate: 102 kB/s ipos: 65536 B, errors: 0, average rate: 102 kB/s opos: 65536 B, run time: 1 s, successful read: 0 s ago Finished $ cat fs.log # Rescue Logfile. Created by GNU ddrescue version 1.19 # Command line: ddrescue fs fs.new fs.log # Start time: 2017-04-27 10:04:03 # Current time: 2017-04-27 10:04:03 # Finished # current_pos current_status 0x00010000 + # pos size status 0x00000000 0x00019000 + $ printf "%in" 0x00019000 102400
So the hex ddrescue units are in bytes, not any blocks. Finally, let’s see what debugfs uses. First, make a file and find its contents:
$ sudo mount -o loop fs /mnt/tmp $ sudo chmod go+rwx /mnt/tmp/ $ echo 'abcdefghijk' > /mnt/tmp/foo $ sudo umount /mnt/tmp $ hexdump -C fs ... 00005400 61 62 63 64 65 66 67 68 69 6a 6b 0a 00 00 00 00 |abcdefghijk.....| 00005410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
So the byte address of the data is 0x5400. Convert this to 1024-byte filesystem blocks:
$ printf "%in" 0x5400 21504 $ expr 21504 / 1024 21
and let’s also try the block range while we are at it:
$ /sbin/debugfs fs debugfs 1.43.3 (04-Sep-2016) debugfs: testb 0 testb: Invalid block number 0 debugfs: testb 1 Block 1 marked in use debugfs: testb 99 Block 99 not in use debugfs: testb 100 Illegal block number passed to ext2fs_test_block_bitmap #100 for block bitmap for fs Block 100 not in use debugfs: testb 21 Block 21 marked in use debugfs: icheck 21 Block Inode number 21 12 debugfs: ncheck 12 Inode Pathname 12 //foo
So that works out as expected, except block 0 is invalid, probably because the file system metadata is there. So, for your byte address 0x30F8A71000 from ddrescue, assuming you worked on the whole disk and not a partition, we subtract the byte address of the partition start
210330128384 – 7815168 * 512 = 206328762368
Divide that by the tune2fs block size to get the filesystem block (note that since multiple physical, possibly damaged, blocks make up a filesystem block, numbers needn’t be exact multiples):
206328762368 / 4096 = 50373233.0
and that’s the block you should test with debugfs.
Method 2
NTFS, ext3, ext4
After copying the data off your fail{ing,ed} drive with ddrescue, use ddrutility to find the affected filenames.
I successfully got it to list affected NTFS files on a 1TB partition given a ddrescue mapfile in under 20 seconds.
It writes its log file in the current directory.
The linked page mentions support for NTFS, ext3 and ext4.
btrfs, zfs
These filesystems have their own built-in scrub function.
Method 3
I would recommend an already implemented utility called ddrutility. That would eliminate the manual tedious calculations.
You should be running it on your cloned copy (not the original) drive device like so:
ddru_findbad /dev/sdb /ddrescue-disk-copy.map
The usage of the map file (second argument) is mandatory here.
The utility is quite smart, supports different filesystems (even NTFS) and also has the functionality of testing of yet-to-be split erroneous sectors (marking them as bad temporary), so you should be able to estimate if you need the whole ddrescue procedure to be finished. Also note, that /dev/sdb is used as a whole disk here (not some partition like /dev/sdb1), since the whole disk was originally cloned.
The utility is available in Debian repos and can be installed with:
sudo apt install ddrutility
The project’s wiki: https://sourceforge.net/p/ddrutility/wiki/Home
Method 4
The easiest way, although not necessarily the fastest or most efficient way, would be to:
- Run ddrescue normally to rescue the whole drive, and be sure to preserve the mapfile.
- ReRun
ddrescuein fill-mode to mark bad sectors with a unique
pattern. They reccomend something like this:ddrescue --fill-mode=- <(printf "BAD-SECTOR ") outfile mapfile
In order to alleviate false positives you want to use a pattern that would not normally exist in any file.
- Mount the rescued image/disk with it’s native operating system.
- Use an appropriate operating system utility, like
e2fsckon linux, to verify and possibly repair the filesystem directory structure. Any bad sectors that fall in filesystem structures first need to be resolved before you can go looking for all the file corruption.
Repairing directory structures is an art in and of it’s self which is
out of this answers scope. - Use an appropriate utility provided by the operating system, like
grep, to scan all the files on the filesystem and list those which
contain the unique pattern that fill-mode marked them with. - If necessary, you can examine the files with the appropriate editor
to locate the position of the actual data loss by searching for the
unique pattern within the file(s).
This is operating system independent so I’m intentionally not giving details that vary depending on the specific filesystem type. I first had to do this on an NTFS filesystem using windows utilities, but it’s the same idea on ext3/4, etc.
Method 5
I used Filezilla simple and fixed my problem. Use regular Filezilla to backup all good data. I notice that one big file was not copying correctly (Stopping in the middle and restarting the transfer). Luckly I have a previous backup of same file. To duplicate the disk, then I had to find the bad blocks on the disk using this procedure:
1st find out the problem disk identifying the HD info using fdisk -l
2nd if lets say your disk is /dev/sdb then you need to run the command
badblocks -v /dev/sdb it will list all you bad blocks on the drive. Luckily there will be a few. If no bad blocks are found, then your drive blocks are OK and need to figure something else out. My block size is 512 so I use that default number to run DD
3rd each block is 512 size, so what I done is to set bs=512
Each time I runned DD regularly as I always do, my data, after the errors, will come out corrupted. So I then use the parameters as explained on the page https://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html search the “For failing disks” part.
dd if=/dev/sdb of=/dev/sda bs=512 conv=noerror,sync iflag=fullblock
It took a while. Each bad block encountered sound like a banging on the faulty drive. It does copy block by block, and thru all my bad blocks made the same noise. The amount of times made a noise, was because it found another bad block and tells you about on display error msg. What the ‘conv=noerror,sync’ does, is to pad out bad reads with NULs, while ‘iflag=fullblock’ caters for short reads, but keeps in sync your data up to the end. No corruption at all, it just does not copy the faulty blocks and fills it with empty NULs.
After the copy with DD was done, I just replace that bad file reverting Filezilla from a past backup and everything worked OK. I hope this will be usefull for others trying to backup faulty drives.
NOTE: My bad blocks where pretty much close to each other. About 4 blocks at a time together in groups where detected bad. If your blocks are all over the disk, several files could be affected. Luckly, on my case, a big database 4gb file was only affected.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0