duplicate files
You may find yourself overwhelmed by files and in the need to keep the filesystem organized. If deleting is the best option, you may consider these 2 options:
preferred: Hardlink
From http://code.google.com/p/hardlinkpy/ , "hardlink.py is a tool to hardlink together identical files in order to save space.". Thus the filesystem is the same, but duplicate files are checked so they are actually written once on the hard drive.
Warning
This post is certainly obsolete...
Dupinator
Dupinator, tries to find duplicates and to report them in order to clean-up the organization of your files.
changelog
dupinator 2 : version 2 : http://www.shearersoftware.com/personal/weblog/2005/01/14/dupinator-ii
dupinator 1 : The latest version can be found at http://svn.red-bean.com/bbum/trunk/hacques/dupinator.py. It is a one-off that solved a problem, not an attempt to write the world's best python script. http://www.pycs.net/bbum/2004/12/29/
It works by:
launched via command line by passing a set of directories to be scanned
traverses all directories and groups all files by size
scans all sets of files of one size and checksums (md5) the first 1024 bytes
for all files that have the same checksum for the first 1024 bytes, checksums the whole file and collects together all real duplicates
deletes all duplicates of any one file, leaving the first encountered file as the one remaining copy