Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Catching common vulnerabilities with code language models

Al Atiiq, Syafiq LU ; Gehrmann, Christian LU ; Khalil, Karim LU and Dahlén, Kevin (2025) 2025 IEEE Secure Development Conference (SecDev) p.45-57
Abstract
Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul... (More)
Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code. (Less)
Please use this url to cite or link to this publication:
author
; ; and
organization
publishing date
type
Contribution to conference
publication status
published
subject
pages
45 - 57
conference name
2025 IEEE Secure Development Conference (SecDev)
conference dates
2025-10-14 - 2025-10-16
DOI
10.1109/SecDev66745.2025.00016
language
English
LU publication?
yes
id
4a13fc00-b39b-4ac7-8940-4fa571d55e8d
date added to LUP
2025-12-10 12:51:32
date last changed
2025-12-11 14:56:04
@misc{4a13fc00-b39b-4ac7-8940-4fa571d55e8d,
  abstract     = {{Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code.}},
  author       = {{Al Atiiq, Syafiq and Gehrmann, Christian and Khalil, Karim and Dahlén, Kevin}},
  language     = {{eng}},
  month        = {{10}},
  pages        = {{45--57}},
  title        = {{Catching common vulnerabilities with code language models}},
  url          = {{http://dx.doi.org/10.1109/SecDev66745.2025.00016}},
  doi          = {{10.1109/SecDev66745.2025.00016}},
  year         = {{2025}},
}