Large-scale information retrieval in software engineering - an experience report from industrial application

Unterkalmsteiner, Michael; Gorschek, Tony; Feldt, Robert; Lavesson, Niklas

Large-scale information retrieval in software engineering - an experience report from industrial application

Mark

Unterkalmsteiner, Michael ; Gorschek, Tony ; Feldt, Robert and Lavesson, Niklas (2016) In Empirical Software Engineering 21(6). p.2324-2365

Abstract: Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the... (More); Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7d5ac6e8-a78b-4f6b-8bd6-398984b3b9ab

author

Unterkalmsteiner, Michael ; Gorschek, Tony ; Feldt, Robert and Lavesson, Niklas

publishing date

2016-12-01

type

Contribution to journal

publication status

published

subject

Software Engineering

keywords

Data mining, Experiment, Information retrieval, Test case selection

in

Empirical Software Engineering

volume

21

issue

6

pages

42 pages

publisher

Springer

external identifiers

scopus:84946763269

ISSN

1382-3256

DOI

10.1007/s10664-015-9410-8

project

Embedded Applications Software Engineering

language

English

LU publication?

no

id

7d5ac6e8-a78b-4f6b-8bd6-398984b3b9ab

date added to LUP

2018-09-27 11:13:53

date last changed

2025-10-14 09:52:31

@article{7d5ac6e8-a78b-4f6b-8bd6-398984b3b9ab,
  abstract     = {{<p>Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.</p>}},
  author       = {{Unterkalmsteiner, Michael and Gorschek, Tony and Feldt, Robert and Lavesson, Niklas}},
  issn         = {{1382-3256}},
  keywords     = {{Data mining; Experiment; Information retrieval; Test case selection}},
  language     = {{eng}},
  month        = {{12}},
  number       = {{6}},
  pages        = {{2324--2365}},
  publisher    = {{Springer}},
  series       = {{Empirical Software Engineering}},
  title        = {{Large-scale information retrieval in software engineering - an experience report from industrial application}},
  url          = {{http://dx.doi.org/10.1007/s10664-015-9410-8}},
  doi          = {{10.1007/s10664-015-9410-8}},
  volume       = {{21}},
  year         = {{2016}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Large-scale information retrieval in software engineering - an experience report from industrial application