Duplicate Image Detection in Large-Scale Image Databases

An efficient and explainable coarse-to-fine framework for detecting and visualizing duplicate images across large and diverse image database

Project Overview

With the proliferation of social media, the public has increasingly turned to use images to communicate and share information. However, this trend has also led to an increase in the number of ways that images can be duplicated, either maliciously or innocently. The detection of duplicate images has a wide range of applications, including addressing issues of copyright infringement, combating misinformation, and addressing ethical concerns. Through this study, we aim to propose an efficient and accurate duplicate image detection framework that can be applied to various image databases with varying data challenges. The proposed solution is a comprehensive coarse-to-fine module where the coarse module focuses on limiting the image dataset by selecting candidate images and the fine-module focuses on deep diving into local similarity. A Feature Contribution Explainability submodule provides visualizations of the areas contributing to the duplicate detection of the images. Through the integration of these modules, the proposed system aims to achieve efficient and effective filtering of large image databases. By combining high-level and low-level feature extraction, contextual information embedding, and attention mechanisms, the system offers a comprehensive approach to candidate image identification and ranking images according to the similarity. Experimental results demonstrate the system’s potential in accurately selecting candidate images with high probability for duplicate detection and provide insights into the contributed areas within the images.

Project Team

Dr. Priyanga Dilini Talagala, Department of Computational Mathematics, University of Moratuwa, Sri Lanka

Dilumika Chandrasiri, Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka

Dinithi Sandarekha, Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka

Chandeepa Pathirana, Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka

Outputs

Publicaitons

  1. M. D. N. Chandrasiri and P. D. Talagala, “Cross-ViT: Cross-attention Vision Transformer for Image Duplicate Detection,” 2023 8th International Conference on Information Technology Research (ICITR), Colombo, Sri Lanka, 2023, pp. 1-6, doi: 10.1109/ICITR61062.2023.10382916.

  2. Chandeepa Pathirana and P. D. Talagala, “DHybrid Feature-Hash Module For Image Duplicate Detection,” 2025 10th International Conference on Information Technology Research (ICITR), Colombo, Sri Lanka, 2025, (Under Review)

Presentations

Cross-ViT: Cross-attention Vision Transformer for Image Duplicate Detection

Date: December 7th to 8th, 2023

Event: 8th International Conference on Information Technology Research (ICITR)

Location: University of Moratuwa, Sri Lanka

Awards

Best Research Award 2023: Merit award of the undergraduate category at the competition organized by the Institute of Applied Statistics Sri Lanka (IASSL) for the research project titled “Duplicate detection in large-scale image databases.”

IASSL Best Research Award 2023

IASSL Best Research Award 2023