Backstory
So here’s the story. I have a ton of files that I’ve accumulated over the decades, and since they’ve been disorganized from the start (which was as far back as 1998) it has been very difficult to keep it all together in a way that makes sense. Compounding my problem is the fact that I have so many interests, I kinda have a library of different stuff, if you put a tornado in that library. So last year, I bought 2 big hard drives with the intent of FINALLY getting it organized. I took everything from my other drives and copied it to both the new ones. Then I put one in my server and the other in my workstation. Then I started shuffling things around on the workstation drive, but didn’t have time to finish it.
So here I am, a year later, and I have a bunch of new files on each, that isn’t on the other, and I can’t remember which is which. Normally, you’d just use rsync in both directions, but since I have most of the same files, just in different paths, that’s not going to work.
I needed a script. Something that can read the contents of both drives, compare JUST the files, regardless of the path, and spit out the files that are on drive 1 but not drive 2, and vice versa.
The Solution
#!/bin/bash DRIVE1=/path/to/drive1 DRIVE2=/path/to/drive2 find "$DRIVE1" -type f > drive1filesfullpath.txt find "$DRIVE2" -type f > drive2filesfullpath.txt cat drive1filesfullpath.txt | sed 's!.*/!!' | sort > \ drive1filenames.txt cat drive2filesfullpath.txt | sed 's!.*/!!' | sort > \ drive2filenames.txt comm -23 <(sort drive1filenames.txt) <(sort drive2filenames.txt) \ > drive1missingfiles.txt comm -23 <(sort drive2filenames.txt) <(sort drive1filenames.txt) \ > drive2missingfiles.txt echo "Files on Drive 1 that aren't on Drive 2: " echo "============================================" cat drive1missingfiles.txt | while read -r line; do cat drive1filesfullpath.txt | grep "$line"; done echo ; echo "Files on Drive 2 that aren't on Drive 1: " echo "============================================" cat drive2missingfiles.txt | while read -r line; do cat drive2filesfullpath.txt | grep "$line"; done
There really isn’t that much going on here, and it could use some improvements, but it seems to work in this case. The next step would be to either take command line arguments for the input drives, or to write a simple GUI frontend for it.
Of course, this doesn’t move any files, it only reads out which ones are not on the other drive, but It’s hugely helpful for my case, where MOST of the files are present on both drives, but not all.
