This script is designed to identify and remove duplicate images in a specified directory based on content similarity, and then rename the remaining images to a standardized format. It uses OpenCV for image processing and the scikit-image
library to compare image similarity.
- Traverse a specified directory and list all image files.
- Compare images based on both filename and pixel-wise content.
- Handle different image orientations.
- Remove the lower resolution duplicate when duplicates are found.
- Rename all images to the format
image-%timestamp%.%OriginalImageExtension%
. - Perform a sanity check to ensure all images follow the naming convention.
- Log each step of the process to
/var/log/dedupe.log
.
- Python 3.x
- OpenCV
scikit-image
-
Install the required libraries:
pip install opencv-python-headless scikit-image
-
Download the script:
Save the script to a file, for example,
dedupe_and_rename.py
. -
Place the script in
/usr/bin
directory:sudo mv dedupe_and_rename.py /usr/bin/dedupe_and_rename
-
Make the script executable:
sudo chmod +x /usr/bin/dedupe_and_rename
Once the script is set up, you can run it from anywhere on your system by calling dedupe_and_rename
.
dedupe_and_rename Ensure you update the directory variable in the script with the path to your image directory before running the script.
** The script logs all its operations to /var/log/dedupe.log. You can check this log file for detailed information about the script's actions and any errors encountered.
** Traverses the directory and lists all image files with the given extensions.
** Loads an image from the specified file path using OpenCV.
** Rotates an image by the specified angle (0, 90, 180, 270 degrees).
** Computes the structural similarity index between two images to determine how similar they are.
** Compares images in all orientations to find the highest similarity.
** Compares each image with every other image in the list, and removes the lower resolution duplicate if they are similar.
** Generates a unique filename using the current timestamp and checks for collisions.
** Renames all images in the directory to the format image-%timestamp%.%originalImageExtension%.
** Ensures all filenames follow the naming convention and handles renaming again if necessary.
def main():
directory = "path/to/your/image/directory"
extensions = (".png", ".jpg", ".jpeg", ".bmp")
logging.info(f"Starting duplicate removal in directory: {directory}")
images = list_images(directory, extensions)
logging.info(f"Found {len(images)} images to process")
compare_and_remove_duplicates(images)
logging.info("Duplicate removal process completed")
logging.info("Starting renaming process")
rename_images(directory, extensions)
logging.info("Renaming process completed")
logging.info("Starting sanity check process")
sanity_check(directory, extensions)
logging.info("Sanity check process completed")
if __name__ == "__main__":
main()