Logo image
A study of explainable contrastive language-image pre-training models using perturbation-based analysis: a thesis in Computer Science
Thesis   Open access

A study of explainable contrastive language-image pre-training models using perturbation-based analysis: a thesis in Computer Science

Ojonukpemi Felix Ameh
Master of Science (MS), University of Massachusetts Dartmouth
2025
DOI:
https://doi.org/10.62791/20414

Abstract

Contrastive Language-Image Pre-training (CLIP) Models refer to Machine Learning models that learn by contrasting between correct and incorrect pairings of text and image, with broad initialized foundations from training on large datasets. They explore an emergent space in Artificial Intelligence research as they offer a way to map multiple modalities of data together, and exhibit impressive zero-shot predictions. An important part of advancing the viability of CLIP-like models is thorough analysis to characterize as much as possible of their inner workings and increase explainability. Pursuing explainability in particularly multimodal models, where decision processes are quite opaque, is an important part of continued improvements. It provides easily intelligible interpretations of their outputs and decision-making processes, and also aids their widespread adoption. CLIP, the widely used foundational contrastive model, serves as a benchmark for multimodal representation learning and makes it the ideal choice for explainability-focused analysis. This thesis will introduce an approach for conducting perturbation-based analysis, investigating how systematic modifications to input images affect their similarity scores and semantic alignment, and providing structured and comprehensible visual and statistical outputs. One experiment identifies inputs highly susceptible to specific perturbations, while another visualizes semantic drift across a controlled set of text labels, illustrating how perturbations change the conceptual understanding of input.
pdf
Ameh O.F. COE MS Thesis 20253.31 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

102 File views/ downloads
28 Record Views

Details

Logo image