Catching common vulnerabilities with code language models

Al Atiiq, Syafiq; Gehrmann, Christian; Khalil, Karim; Dahlén, Kevin

Catching common vulnerabilities with code language models

Mark

Al Atiiq, Syafiq ^LU ; Gehrmann, Christian ^LU ; Khalil, Karim ^LU and Dahlén, Kevin (2025) 2025 IEEE Secure Development Conference (SecDev) p.45-57

Abstract: Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul... (More); Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/4a13fc00-b39b-4ac7-8940-4fa571d55e8d

author

Al Atiiq, Syafiq ^LU ; Gehrmann, Christian ^LU ; Khalil, Karim ^LU and Dahlén, Kevin

organization

publishing date

2025-10-14

type

Contribution to conference

publication status

published

subject

pages

45 - 57

conference name

2025 IEEE Secure Development Conference (SecDev)

conference dates

2025-10-14 - 2025-10-16

DOI

10.1109/SecDev66745.2025.00016

language

English

LU publication?

yes

id

4a13fc00-b39b-4ac7-8940-4fa571d55e8d

date added to LUP

2025-12-10 12:51:32

date last changed

2026-02-25 14:32:01

@misc{4a13fc00-b39b-4ac7-8940-4fa571d55e8d,
  abstract     = {{Code Language Model (code-LM)-based vulnerability detection for C/C++ faces a substantial challenge. Previous research has shown that even though it is better than any prior machine learning approach, it still struggles to generalize well, as shown by the low F1 score. Prior works treated the problem as a binary classification: either vulnerable or non-vulnerable. Looking deeper at the various vulnerability types, we see that this oversimplifies the problem, as different vulnerabilities have different characteristics. This paper investigates the same problem but with a different question, i.e., how would the model perform if the task is to classify whether the code is vulnerable to a specific type? We use the recently released PrimeVul dataset as a basis to investigate the ability to correctly classify different types of vulnerabilities. We conduct two experiments by fine-tuning code LMs on (1) datasets specific to each of the most common types and (2) cumulative datasets incorporating an increasing number of the most common types. We show that it is challenging to correctly identify a specific class of vulnerability in a dataset containing all types of vulnerabilities. However, if the task is modified to correctly identify the most common vulnerabilities, the cumulative model outperforms the previous results using binary classification on the dataset. This result shows a promising path to make code-LM practical in assisting developers with vulnerability detection tasks in C/C++ code.}},
  author       = {{Al Atiiq, Syafiq and Gehrmann, Christian and Khalil, Karim and Dahlén, Kevin}},
  language     = {{eng}},
  month        = {{10}},
  pages        = {{45--57}},
  title        = {{Catching common vulnerabilities with code language models}},
  url          = {{http://dx.doi.org/10.1109/SecDev66745.2025.00016}},
  doi          = {{10.1109/SecDev66745.2025.00016}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Catching common vulnerabilities with code language models