Modeling code quality using machine intelligence

Malmström, Arvid; Kindt, Erik

Modeling code quality using machine intelligence

Mark

Malmström, Arvid ^LU and Kindt, Erik ^LU (2022) In Master'sTheses in Mathematical Sciences FMAM05 20221
Mathematics (Faculty of Engineering)

Abstract: For any company with a software development branch, one of the most important aspects is to write maintainable, understandable, high-quality source code. This will result in fewer work hours to refactor the code if changes are needed. Therefore it’s expensive to keep working with source code of poor quality. The question is, how to measure the quality of source code?

The writers of this thesis approached the problem using machine learn- ing. Firstly two literature studies were conducted, one concerning usable software metrics and one concerning usable machine learning algorithms. Secondly, a large, external, labeled data base was set up and the models were trained on said data. There were different final models set up. Two to focus on... (More); For any company with a software development branch, one of the most important aspects is to write maintainable, understandable, high-quality source code. This will result in fewer work hours to refactor the code if changes are needed. Therefore it’s expensive to keep working with source code of poor quality. The question is, how to measure the quality of source code?

The writers of this thesis approached the problem using machine learn- ing. Firstly two literature studies were conducted, one concerning usable software metrics and one concerning usable machine learning algorithms. Secondly, a large, external, labeled data base was set up and the models were trained on said data. There were different final models set up. Two to focus on large projects consisting of hundreds of files and two to focus on singular stand-alone files. After examining the models, the ones with the lowest RMSEs on the test set were used to compare the predictions from the models with the opinions of experienced developers.

The final models were built using the algorithms artificial neural network and random forest which both gave similar results. The models for specific files were tested for both front end and back end files where four files in each were ranked according to the final score from the model. The rank was then compared to a ranking of the same files by experienced software developers. The results showed for front end files, that the model agreed with the developers ranking to a large extent. For back end files, there was a wider discrepancy between the opinions of the developers and the predictions made by the model.

The final models were also able to evaluate a large project over time, to evaluate the overall code quality, which also was one thought application of the model. When a project that was rewritten a year ago was continuously evaluated, the code quality improved which was the expected result according to developers in the project. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9092905

author

Malmström, Arvid ^LU and Kindt, Erik ^LU

supervisor

Carina Geldhauser ^LU

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20221

year

2022

type

H2 - Master's Degree (Two Years)

subject

Technology and Engineering

keywords

Machine Learning, Artificial Intelligence, Software Development, Software Metrics

publication/series

Master'sTheses in Mathematical Sciences

report number

LUTFMA-3486-2022

ISSN

1404-6342

other publication id

2022:E62

language

English

id

9092905

date added to LUP

2022-08-05 10:28:02

date last changed

2022-08-05 10:28:02

@misc{9092905,
  abstract     = {{For any company with a software development branch, one of the most important aspects is to write maintainable, understandable, high-quality source code. This will result in fewer work hours to refactor the code if changes are needed. Therefore it’s expensive to keep working with source code of poor quality. The question is, how to measure the quality of source code?

The writers of this thesis approached the problem using machine learn- ing. Firstly two literature studies were conducted, one concerning usable software metrics and one concerning usable machine learning algorithms. Secondly, a large, external, labeled data base was set up and the models were trained on said data. There were different final models set up. Two to focus on large projects consisting of hundreds of files and two to focus on singular stand-alone files. After examining the models, the ones with the lowest RMSEs on the test set were used to compare the predictions from the models with the opinions of experienced developers.

The final models were built using the algorithms artificial neural network and random forest which both gave similar results. The models for specific files were tested for both front end and back end files where four files in each were ranked according to the final score from the model. The rank was then compared to a ranking of the same files by experienced software developers. The results showed for front end files, that the model agreed with the developers ranking to a large extent. For back end files, there was a wider discrepancy between the opinions of the developers and the predictions made by the model.

The final models were also able to evaluate a large project over time, to evaluate the overall code quality, which also was one thought application of the model. When a project that was rewritten a year ago was continuously evaluated, the code quality improved which was the expected result according to developers in the project.}},
  author       = {{Malmström, Arvid and Kindt, Erik}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master'sTheses in Mathematical Sciences}},
  title        = {{Modeling code quality using machine intelligence}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Modeling code quality using machine intelligence