Catching common vulnerabilities with code language models
(2025) 2025 IEEE Secure Development Conference (SecDev) p.45-57- Abstract
- Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul... (More)
- Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/4a13fc00-b39b-4ac7-8940-4fa571d55e8d
- author
- Al Atiiq, Syafiq LU ; Gehrmann, Christian LU ; Khalil, Karim LU and Dahlén, Kevin
- organization
- publishing date
- 2025-10-14
- type
- Contribution to conference
- publication status
- published
- subject
- pages
- 45 - 57
- conference name
- 2025 IEEE Secure Development Conference (SecDev)
- conference dates
- 2025-10-14 - 2025-10-16
- DOI
- 10.1109/SecDev66745.2025.00016
- language
- English
- LU publication?
- yes
- id
- 4a13fc00-b39b-4ac7-8940-4fa571d55e8d
- date added to LUP
- 2025-12-10 12:51:32
- date last changed
- 2025-12-11 14:56:04
@misc{4a13fc00-b39b-4ac7-8940-4fa571d55e8d,
abstract = {{Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code.}},
author = {{Al Atiiq, Syafiq and Gehrmann, Christian and Khalil, Karim and Dahlén, Kevin}},
language = {{eng}},
month = {{10}},
pages = {{45--57}},
title = {{Catching common vulnerabilities with code language models}},
url = {{http://dx.doi.org/10.1109/SecDev66745.2025.00016}},
doi = {{10.1109/SecDev66745.2025.00016}},
year = {{2025}},
}