Measuring Semantic Distances between Software Artifacts to Consolidate Issues from the Development and the Field

Nasser, Mahmoud

Measuring Semantic Distances between Software Artifacts to Consolidate Issues from the Development and the Field

Mark

Nasser, Mahmoud ^LU (2017) In LU-CS-EX 2017-09 EDA920 20152
Department of Computer Science

Abstract: Identifying and keeping track of different structural representations of functionally overlapping issues is important in order to keep a well maintained issue management corpus, establishing efficient and organized response ability
to develop and code software patches repairing these issues and defects. This
is normally achieved by manual, time-costly reviewing-processes by special
teams put up to this task.

In this project we implement a tool using information retrieval technology,
that intends to help these teams make better and faster qualitative assessments
by providing quantitative indications in the form of similarity scores to other
artifacts within a given dataset.

This approach is inspired by a paper with a similar... (More); Identifying and keeping track of different structural representations of functionally overlapping issues is important in order to keep a well maintained issue management corpus, establishing efficient and organized response ability
to develop and code software patches repairing these issues and defects. This
is normally achieved by manual, time-costly reviewing-processes by special
teams put up to this task.

In this project we implement a tool using information retrieval technology,
that intends to help these teams make better and faster qualitative assessments
by providing quantitative indications in the form of similarity scores to other
artifacts within a given dataset.

This approach is inspired by a paper with a similar goal, namely detecting
duplicate issue reports. That study found that 60 % of all marked duplicates
could be found with the corresponding implementation of this approach.
Achieving similar outcomes would contribute to improved and more effective
reviewing-processes.

We use the qualitative research method of informal interviews to define the
semantic distance metric to implement. In the evaluation we mainly use a
qualitative method to assess the accuracy of it, but verify our findings with a
quantitative method. We also investigate the scalability of the tool with quantitative
methods.

As a result of the limited scope of this thesis work, the tool in its current
state will have limited use in a live development environment. However, we
conclude that this approach has a development potential and could bring fruitful
findings in the issue management and issue maintenance field if developed
further upon. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8916761

author

Nasser, Mahmoud ^LU

supervisor

Elizabeth Bjarnason ^LU
Markus Borg ^LU

organization

Department of Computer Science

course

EDA920 20152

year

2017

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

keywords

information retrieval technology, semantic distances, issue management, issue maintenance, traceability link retrieval

publication/series

LU-CS-EX 2017-09

report number

LU-CS-EX 2017-09

ISSN

1650-2884

language

English

id

8916761

date added to LUP

2017-06-19 09:36:46

date last changed

2017-06-19 09:36:46

@misc{8916761,
  abstract     = {{Identifying and keeping track of different structural representations of functionally overlapping issues is important in order to keep a well maintained issue management corpus, establishing efficient and organized response ability
to develop and code software patches repairing these issues and defects. This
is normally achieved by manual, time-costly reviewing-processes by special
teams put up to this task.

In this project we implement a tool using information retrieval technology,
that intends to help these teams make better and faster qualitative assessments
by providing quantitative indications in the form of similarity scores to other
artifacts within a given dataset.

This approach is inspired by a paper with a similar goal, namely detecting
duplicate issue reports. That study found that 60 % of all marked duplicates
could be found with the corresponding implementation of this approach.
Achieving similar outcomes would contribute to improved and more effective
reviewing-processes.

We use the qualitative research method of informal interviews to define the
semantic distance metric to implement. In the evaluation we mainly use a
qualitative method to assess the accuracy of it, but verify our findings with a
quantitative method. We also investigate the scalability of the tool with quantitative
methods.

As a result of the limited scope of this thesis work, the tool in its current
state will have limited use in a live development environment. However, we
conclude that this approach has a development potential and could bring fruitful
findings in the issue management and issue maintenance field if developed
further upon.}},
  author       = {{Nasser, Mahmoud}},
  issn         = {{1650-2884}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LU-CS-EX 2017-09}},
  title        = {{Measuring Semantic Distances between Software Artifacts to Consolidate Issues from the Development and the Field}},
  year         = {{2017}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Measuring Semantic Distances between Software Artifacts to Consolidate Issues from the Development and the Field