Detecting Hard-Coded Credentials in Software Repositories via LLMs

Chidera Biringa; Gokhan Kul

doi:10.1145/3744756

Back

Detecting Hard-Coded Credentials in Software Repositories via LLMs

Journal article

Open access

Peer reviewed

Detecting Hard-Coded Credentials in Software Repositories via LLMs

Chidera Biringa and Gokhan Kul

Digital threats (Print), Vol.6(3)

07/07/2025

DOI: https://doi.org/10.1145/3744756

Abstract

Collaboration in software development

Computing methodologies

Machine learning algorithms

Security and privacy

Software and its engineering

Software security engineering

Software developers frequently hard-code credentials such as passwords, generic secrets, private keys, and generic tokens in software repositories, even though it is strictly advised against due to the severe threat to the security of the software. These credentials create attack surfaces exploitable by a potential adversary to conduct malicious exploits such as backdoor attacks. Recent detection efforts utilize embedding models to vectorize textual credentials before passing them to classifiers for predictions. However, these models struggle to discriminate between credentials with contextual and complex sequences resulting in high false positive predictions. Context-dependent Pre-trained Language Models (PLMs) or Large Language Models (LLMs) such as Generative Pre-trained Transformers (GPT) tackled this drawback by leveraging the transformer neural architecture capacity for self-attention to capture contextual dependencies between words in input sequences. As a result, GPT has achieved wide success in several natural language understanding endeavors. Hence, we assess LLMs to represent these observations and feed extracted embedding vectors to a deep learning classifier to detect hard-coded credentials. Our model outperforms the current state-of-the-art by 13% \(\in\) F1 measure on the benchmark dataset. We have made all source code and data publicly available 1 to facilitate the reproduction of all results presented in this paper.

Files and links (1)

url

https://doi.org/10.1145/3744756View

Published (Version of record) Open

Metrics

6 Record Views

Details

Title: Detecting Hard-Coded Credentials in Software Repositories via LLMs
Creators: Chidera Biringa - University of Massachusetts Dartmouth
Gokhan Kul - University of Massachusetts Dartmouth
Publication Details: Digital threats (Print), Vol.6(3)
Publisher: ACM; NEW YORK
Number of pages: 16
Grant note: UMass Dartmouth's Marine and Undersea Technology (MUST) Research Program - Office of Naval Research (ONR): N00014-23-1-2141
This research was supported in part by UMass Dartmouth's Marine and Undersea Technology (MUST) Research Program funded by the Office of Naval Research (ONR) under Grant No. N00014-23-1-2141. The views and conclusions expressed in this article are those of the authors and do not reflect the official policy or position of the University of Massachusetts Dartmouth, the Office of Naval Research, U.S. Navy, U.S. Department of Defense, or U.S. Government.
Academic Unit: Department of Computer and Information Science; Cybersecurity Center
Language: English
Resource Type: Journal article
DOI: https://doi.org/10.1145/3744756
Record Identifier: 9914470434901301

Detecting Hard-Coded Credentials in Software Repositories via LLMs

Abstract

Files and links (1)

Related links

Metrics

Related content

Details