The idea is simple:
- Remove all duplicates, including zero length files
- Fine tuning: Hand-pick and remove files deemed unnecessary
Since the mileage for second step might vary from person to person, I’ll elaborate on the first step.
I chose jdupes for deleting the duplicates because it’s open-source and is cross platform.
For a given folder we would run the following to wipe the duplicates:
jdupes --recurse --delete --no-prompt --zero-match .
This will recursively delete all the duplicates except the source file without prompting for a confirmation. It will also consider zero length files to be duplicates.
The target OS is Windows, which has the glaring problem of drives.
There could be files unique in a given drive but are actually duplicates in the inter-drive space. There are two ways to combat this.
Rinse, move, repeat
- Run jdupes on a drive to free some space
- Move some data from other drives into the current drive to fill it up again
- Repeat
This, obviously, is a terrible idea beacause we have the overhead cost of moving the files after each run as well as the fact that we have to run jdupes exhaustively for many iterations.
Single pass with hardlinks
- Pick a random drive as the parent node
- Hardlink all the other drives to the parent
- Run jdupes
Consider the following scenario
- Parent drive: A
- Child drive: E
- Child drive: B
- etc.
To create the hardlink of E
in A
, we would run the following in powershell.
New-Item -ItemType HardLink -Path A:\Edrive -Value E:\
We repeat this for the rest of the drives like drive B
where
Edrive
becomesBdrive
E:
becomesB:
When we run jdupes from the A
drive with
cd A:
jdupes --recurse --delete --no-prompt --zero-match .
it traverses the hardlinks and removes duplicates in all the linked drives.
Finally we can remove the hardlinks
rm A:\Edrive
Note: Do not run jdupes at
SYSTEMROOT
(the root ofC:
drive for most people) as there are legitimate duplicates which, if deleted, can brick a system. I’d recommend running jdupes in individual directories like Music, Documents, etc.