Comparative Analysis of Static Application Security Testing Tools on Real-world Java Vulnerabilities

Ansgariusson, Wilmer; Ståhl, Jonathan

Comparative Analysis of Static Application Security Testing Tools on Real-world Java Vulnerabilities

Mark

Ansgariusson, Wilmer ^LU and Ståhl, Jonathan ^LU (2025) EITM01 20251
Department of Electrical and Information Technology

Abstract: With the increasing complexity and scale of modern software systems, ensuring software security is more critical than ever. As projects grow, so does the likelihood of vulnerabilities being introduced. Static Application Security Testing (SAST) tools assist developers in identifying such vulnerabilities during development. In this study, five Java SAST tools (Bearer, CodeQL, Horusec, Semgrep and SonarQube) were evaluated based primarily on their vulnerability detection rate and execution time to determine their effectiveness.

The tools were tested using Java entries from the CVEfixes data set, which contains real-world code changes, including both pre- and post-fix versions of known vulnerabilities. Each tool analyzed code before and... (More); With the increasing complexity and scale of modern software systems, ensuring software security is more critical than ever. As projects grow, so does the likelihood of vulnerabilities being introduced. Static Application Security Testing (SAST) tools assist developers in identifying such vulnerabilities during development. In this study, five Java SAST tools (Bearer, CodeQL, Horusec, Semgrep and SonarQube) were evaluated based primarily on their vulnerability detection rate and execution time to determine their effectiveness.

The tools were tested using Java entries from the CVEfixes data set, which contains real-world code changes, including both pre- and post-fix versions of known vulnerabilities. Each tool analyzed code before and after vulnerability fixes across three scenarios, varying the context available (single files, modified files and full projects). True positives were defined as vulnerabilities detected before, but not after, a fix. This approach helps assess a tool’s ability to correctly identify actual vulnerabilities. Among the tools, CodeQL and Horusec performed the best, with CodeQL showing stronger potential if allowed to be utilized fully. Bearer underperformed, while SonarQube and Semgrep may offer better results with more permissive configurations. (Less)
Popular Abstract: Application security is more critical than ever, as software becomes increasingly central to our everyday lives. Insecure applications can lead to data theft, financial loss, or system compromise. To address these risks, developers rely on various tools to catch bugs early, one common approach being Static Application Security Testing (SAST). We evaluated and compared several SAST tools to assess their effectiveness in identifying security issues during development.

A SAST tool is a type of computer program that can locate and identify security bugs in code. There are a lot of different SAST tools available on the market and it can be difficult to know which one to use. Interviews reveal that programmers choose such tools based on... (More); Application security is more critical than ever, as software becomes increasingly central to our everyday lives. Insecure applications can lead to data theft, financial loss, or system compromise. To address these risks, developers rely on various tools to catch bugs early, one common approach being Static Application Security Testing (SAST). We evaluated and compared several SAST tools to assess their effectiveness in identifying security issues during development.

A SAST tool is a type of computer program that can locate and identify security bugs in code. There are a lot of different SAST tools available on the market and it can be difficult to know which one to use. Interviews reveal that programmers choose such tools based on popularity rather than performance. Programmers also do not trust SAST tool benchmarks, stating that they contain bugs that would not occur in real-world software.

We want programmers to be able to trust benchmarks and therefore conducted our own tests on SAST tools. We scanned five popular SAST tools on just under 500 real security bugs from publicly available software projects.

Our results show that the best performing SAST tools were CodeQL and Horusec. We could also conclude that it is difficult to determine an actual winner since results can vary a lot based on how the tools are configured. The tools detected only a small portion of the actual bugs in the programs and often sounded the alarm incorrectly. This indicates that while SAST tools are helpful, solely relying on them would be a mistake. As AI continues to advance, it might one day offer a smarter alternative to today’s SAST tools, but more research is needed. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9189955

author

Ansgariusson, Wilmer ^LU and Ståhl, Jonathan ^LU

supervisor

Christian Gehrmann ^LU

organization

Department of Electrical and Information Technology

course

EITM01 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

report number

LU/LTH-EIT 2025-1052

language

English

id

9189955

date added to LUP

2025-06-02 12:54:42

date last changed

2025-06-02 12:54:42

@misc{9189955,
  abstract     = {{With the increasing complexity and scale of modern software systems, ensuring software security is more critical than ever. As projects grow, so does the likelihood of vulnerabilities being introduced. Static Application Security Testing (SAST) tools assist developers in identifying such vulnerabilities during development. In this study, five Java SAST tools (Bearer, CodeQL, Horusec, Semgrep and SonarQube) were evaluated based primarily on their vulnerability detection rate and execution time to determine their effectiveness.

The tools were tested using Java entries from the CVEfixes data set, which contains real-world code changes, including both pre- and post-fix versions of known vulnerabilities. Each tool analyzed code before and after vulnerability fixes across three scenarios, varying the context available (single files, modified files and full projects). True positives were defined as vulnerabilities detected before, but not after, a fix. This approach helps assess a tool’s ability to correctly identify actual vulnerabilities. Among the tools, CodeQL and Horusec performed the best, with CodeQL showing stronger potential if allowed to be utilized fully. Bearer underperformed, while SonarQube and Semgrep may offer better results with more permissive configurations.}},
  author       = {{Ansgariusson, Wilmer and Ståhl, Jonathan}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Comparative Analysis of Static Application Security Testing Tools on Real-world Java Vulnerabilities}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Comparative Analysis of Static Application Security Testing Tools on Real-world Java Vulnerabilities