Automatic Collection Selection using Machine Learning

Couderc, Noric

Automatic Collection Selection using Machine Learning

Mark

Couderc, Noric ^LU

(2022)

Abstract: Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.

In this licentiate thesis, we consider the problem of building automated... (More); Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.

In this licentiate thesis, we consider the problem of building automated tooling, which would help the developer choose among several collection implementations. We consider an existing tool called Brainy, which targets C++, and adapt it to the Java context. In doing so, we investigate how to synthesize benchmarks and analyze their behavior to create training data for automated classification. We propose one new generative model for collection benchmarks and present the challenges that porting JBrainy to Java entails. Finally, we compare JBrainy's suggestions versus greedy search, on five well known benchmarks. Our investigations show that JBrainy's suggestions were almost as effective than those of greedy search in minimizing the running time of programs. However, we also find that Brainy's benchmark synthesis methods do not apply well to the Java context, since they introduce some significant biases.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/fc550dca-03ff-4f69-813d-674a6eb7685e

author

Couderc, Noric ^LU

supervisor

Christoph Reichenbach ^LU
Emma Söderberg ^LU

organization

publishing date

2022-02-07

type

Thesis

publication status

published

subject

Computer Systems

keywords

Software engineering, Machine Learning, Collection efficiency, Java

pages

60 pages

publisher

Lund University

language

English

LU publication?

yes

additional info

Licentiate thesis

id

fc550dca-03ff-4f69-813d-674a6eb7685e

date added to LUP

2022-02-07 12:16:06

date last changed

2025-04-04 14:17:46

@misc{fc550dca-03ff-4f69-813d-674a6eb7685e,
  abstract     = {{Most recent programming languages include a collection framework as part of their standard library (or runtime). Examples are Java, C#, Python and Ruby. The Java Collection Framework provides a number of collection classes, some of which implement the same abstract data type, which makes them interchangeable. Developers can therefore choose between several functionally equivalent options. Since collections have different performance characteristics, and may be allocated in thousands of programs locations, the choice of collection has an important impact on performance. Unfortunately, programmers often make sub-optimal choices when picking their collections.<br/><br/>In this licentiate thesis, we consider the problem of building automated tooling, which would help the developer choose among several collection implementations. We consider an existing tool called Brainy, which targets C++, and adapt it to the Java context. In doing so, we investigate how to synthesize benchmarks and analyze their behavior to create training data for automated classification. We propose one new generative model for collection benchmarks and present the challenges that porting JBrainy to Java entails. Finally, we compare JBrainy's suggestions versus greedy search, on five well known benchmarks. Our investigations show that JBrainy's suggestions were almost as effective than those of greedy search in minimizing the running time of programs. However, we also find that Brainy's benchmark synthesis methods do not apply well to the Java context, since they introduce some significant biases.<br/>}},
  author       = {{Couderc, Noric}},
  keywords     = {{Software engineering; Machine Learning; Collection efficiency; Java}},
  language     = {{eng}},
  month        = {{02}},
  note         = {{Licentiate Thesis}},
  publisher    = {{Lund University}},
  title        = {{Automatic Collection Selection using Machine Learning}},
  url          = {{https://lup.lub.lu.se/search/files/113792314/thesis.pdf}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Automatic Collection Selection using Machine Learning