Filtering False Positive Alarms in JavaDL and Language Experience Report

Rikås, Karl-Oskar; Weslien, Frank

Filtering False Positive Alarms in JavaDL and Language Experience Report

Mark

Rikås, Karl-Oskar ^LU and Weslien, Frank (2021) In LU-CS-EX EDAM05 20211
Department of Computer Science

Abstract: JavaDL is a domain-specific language (DSL) for writing static program analyses in a declarative logic programming style, based on Datalog. The key feature of this DSL is the ability to pattern-match on literal source code syntax and reason non-locally through declarative programming.

Static program analyses generally suer from producing false positive alarms. This results in developers having to deal with unnecessary alarms. A machine learning model could mitigate this problem by filtering true alarms from false ones.

We investigate if features based on JavaDL’s pattern-matching are effective. Our results show that they are not, as the knowledge learned does not transfer over to unseen projects.

Points-to analysis is another way... (More); JavaDL is a domain-specific language (DSL) for writing static program analyses in a declarative logic programming style, based on Datalog. The key feature of this DSL is the ability to pattern-match on literal source code syntax and reason non-locally through declarative programming.

Static program analyses generally suer from producing false positive alarms. This results in developers having to deal with unnecessary alarms. A machine learning model could mitigate this problem by filtering true alarms from false ones.

We investigate if features based on JavaDL’s pattern-matching are effective. Our results show that they are not, as the knowledge learned does not transfer over to unseen projects.

Points-to analysis is another way of improving the precision of otherwise more conservative analysis such as finding non-exhaustive switch statements in Java. As the first users of JavaDL we attempted to write a Points-to analysis, for a subset of the Java language. We report on our experience and put forth possible improvements to JavaDL in a case study. (Less)
Popular Abstract: In an increasingly digitized world, we have become ever more reliant on code.
It exists everywhere.
Not only in our smartphones but also in our microwaves, cars, and airplanes.
In April of 2019, Boeing admitted that their new 737 Max jets had a fatal flaw in the software which had caused two of its planes to crash.
Bugs happen and sometimes with deadly consequences.

One area of research, static program analysis, tries to find bugs in code before it is ever run.
Unfortunately, it is not perfect and a common complaint is that it reports too many bugs that aren't real.
Developers sometimes have to sift through a hundred alerts just to find one actual bug.
Until a few years ago there was no solution in sight.
But now, with the... (More); In an increasingly digitized world, we have become ever more reliant on code.
It exists everywhere.
Not only in our smartphones but also in our microwaves, cars, and airplanes.
In April of 2019, Boeing admitted that their new 737 Max jets had a fatal flaw in the software which had caused two of its planes to crash.
Bugs happen and sometimes with deadly consequences.

One area of research, static program analysis, tries to find bugs in code before it is ever run.
Unfortunately, it is not perfect and a common complaint is that it reports too many bugs that aren't real.
Developers sometimes have to sift through a hundred alerts just to find one actual bug.
Until a few years ago there was no solution in sight.
But now, with the advent of machine learning, there is a promising path forward.

These powerful algorithms don't look at the world as you and I do.
All they understand are numbers.
So to be able to teach an AI to prioritize alarms for developers we first need to transform the code into a format it can understand.
We need to transform code into numbers.

We explore and prototype an algorithm to do just that: transforming code into numbers.
It looks at all the locations where bugs were found and tries to find common patterns that repeat across the source code.
Those insights are then fed into an algorithm that tries to guess which bugs are real and which are not.

Unfortunately, it is not always that you get the results that you hoped for.
Our prototype wasn't good enough.
perhaps the algorithm was too simplistic, or maybe it's not the right approach.
One possible avenue worth exploring would be to use recent advancements in code embeddings.
It lets the algorithm teach itself what is important and how it should be represented and is a powerful idea that has improved AI's understanding of the text. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9060476

author

Rikås, Karl-Oskar ^LU and Weslien, Frank

supervisor

Christoph Reichenbach ^LU
Alexandru Dura ^LU

organization

Department of Computer Science

course

EDAM05 20211

year

2021

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

static program analysis, alarm filtering, feature engineering

publication/series

LU-CS-EX

report number

2021-38

ISSN

1650-2884

language

English

id

9060476

date added to LUP

2021-09-01 14:32:14

date last changed

2021-09-01 14:32:14

@misc{9060476,
  abstract     = {{JavaDL is a domain-specific language (DSL) for writing static program analyses in a declarative logic programming style, based on Datalog. The key feature of this DSL is the ability to pattern-match on literal source code syntax and reason non-locally through declarative programming.

Static program analyses generally suer from producing false positive alarms. This results in developers having to deal with unnecessary alarms. A machine learning model could mitigate this problem by filtering true alarms from false ones.

We investigate if features based on JavaDL’s pattern-matching are effective. Our results show that they are not, as the knowledge learned does not transfer over to unseen projects.

Points-to analysis is another way of improving the precision of otherwise more conservative analysis such as finding non-exhaustive switch statements in Java. As the first users of JavaDL we attempted to write a Points-to analysis, for a subset of the Java language. We report on our experience and put forth possible improvements to JavaDL in a case study.}},
  author       = {{Rikås, Karl-Oskar and Weslien, Frank}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX}},
  title        = {{Filtering False Positive Alarms in JavaDL and Language Experience Report}},
  year         = {{2021}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Filtering False Positive Alarms in JavaDL and Language Experience Report