Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Automatic Collection Selection using Machine Learning

Couderc, Noric LU orcid (2022)
Abstract
Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.

In this licentiate thesis, we consider the problem of building automated... (More)
Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.

In this licentiate thesis, we consider the problem of building automated tooling, which would help the developer choose among several collection implementations. We consider an existing tool called Brainy, which targets C++, and adapt it to the Java context. In doing so, we investigate how to synthesize benchmarks and analyze their behavior to create training data for automated classification. We propose one new generative model for collection benchmarks and present the challenges that porting JBrainy to Java entails. Finally, we compare JBrainy's suggestions versus greedy search, on five well known benchmarks. Our investigations show that JBrainy's suggestions were almost as effective than those of greedy search in minimizing the running time of programs. However, we also find that Brainy's benchmark synthesis methods do not apply well to the Java context, since they introduce some significant biases.
(Less)
Please use this url to cite or link to this publication:
author
supervisor
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Software engineering, Machine Learning, Collection efficiency, Java
pages
60 pages
publisher
Lund University
language
English
LU publication?
yes
additional info
Licentiate thesis
id
fc550dca-03ff-4f69-813d-674a6eb7685e
date added to LUP
2022-02-07 12:16:06
date last changed
2023-02-23 15:27:19
@misc{fc550dca-03ff-4f69-813d-674a6eb7685e,
  abstract     = {{Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.<br/><br/>In this licentiate thesis, we consider the problem of building automated tooling, which would help the developer choose among several collection implementations. We consider an existing tool called Brainy, which targets C++, and adapt it to the Java context. In doing so, we investigate how to synthesize benchmarks and analyze their behavior to create training data for automated classification. We propose one new generative model for collection benchmarks and present the challenges that porting JBrainy to Java entails. Finally, we compare JBrainy's suggestions versus greedy search, on five well known benchmarks. Our investigations show that JBrainy's suggestions were almost as effective than those of greedy search in minimizing the running time of programs. However, we also find that Brainy's benchmark synthesis methods do not apply well to the Java context, since they introduce some significant biases.<br/>}},
  author       = {{Couderc, Noric}},
  keywords     = {{Software engineering; Machine Learning; Collection efficiency; Java}},
  language     = {{eng}},
  month        = {{02}},
  note         = {{Licentiate Thesis}},
  publisher    = {{Lund University}},
  title        = {{Automatic Collection Selection using Machine Learning}},
  url          = {{https://lup.lub.lu.se/search/files/113792314/thesis.pdf}},
  year         = {{2022}},
}