More testers - The effect of crowd size and time restriction in software testing

Mäntylä, Mika; Itkonen, Juha

More testers - The effect of crowd size and time restriction in software testing

Mark

Mäntylä, Mika ^LU and Itkonen, Juha (2013) In Information and Software Technology 55(6). p.986-1003

Abstract: Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a... (More); Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a 2-h fixed slot, and another where the individuals could use as much time as they needed. Results: We found evidence that manual software testing is an additive task with a ceiling effect, like software reviews and usability inspections. Our results show that a crowd of five time-restricted testers using 10 h in total detected 71% more defects than a single non-time-restricted tester using 9.9 h. Furthermore, we use F-score measure from the information retrieval domain to analyze the optimal number of testers in terms of both effectiveness and validity of testing results. We suggest that future studies on verification and validation practices use F-score to provide a more transparent view of the results. Conclusions: The results seem promising for the time-pressured crowds by indicating that multiple time-pressured individuals deliver superior defect detection effectiveness in comparison to non-time-pressured individuals. However, caution is needed, as the limitations of this study need to be addressed in future works. Finally, we suggest that the size of the crowd used in software testing tasks should be determined based on the share of duplicate and invalid reports produced by the crowd and by the effectiveness of the duplicate handling mechanisms. (C) 2012 Elsevier B.V. All rights reserved. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3815290

author

Mäntylä, Mika ^LU and Itkonen, Juha

organization

publishing date

2013

type

Contribution to journal

publication status

published

subject

Computer Sciences

keywords

Software testing, Group performance, Division of labor, Human factors, Crowdsourcing, Methods for SQA and V&V

in

Information and Software Technology

volume

55

issue

6

pages

986 - 1003

publisher

Elsevier

external identifiers

wos:000318584800005
scopus:84876295001

ISSN

0950-5849

DOI

10.1016/j.infsof.2012.12.004

language

English

LU publication?

yes

id

d6ba0f9d-9277-4ac1-85a5-df4cfa7c1d9c (old id 3815290)

date added to LUP

2016-04-01 14:46:40

date last changed

2025-10-14 09:58:05

@article{d6ba0f9d-9277-4ac1-85a5-df4cfa7c1d9c,
  abstract     = {{Context: The questions of how many individuals and how much time to use for a single testing task are critical in software verification and validation. In software review and usability evaluation contexts, positive effects of using multiple individuals for a task have been found, but software testing has not been studied from this viewpoint. Objective: We study how adding individuals and imposing time pressure affects the effectiveness and efficiency of manual testing tasks. We applied the group productivity theory from social psychology to characterize the type of software testing tasks. Method: We conducted an experiment where 130 students performed manual testing under two conditions, one with a time restriction and pressure, i.e., a 2-h fixed slot, and another where the individuals could use as much time as they needed. Results: We found evidence that manual software testing is an additive task with a ceiling effect, like software reviews and usability inspections. Our results show that a crowd of five time-restricted testers using 10 h in total detected 71% more defects than a single non-time-restricted tester using 9.9 h. Furthermore, we use F-score measure from the information retrieval domain to analyze the optimal number of testers in terms of both effectiveness and validity of testing results. We suggest that future studies on verification and validation practices use F-score to provide a more transparent view of the results. Conclusions: The results seem promising for the time-pressured crowds by indicating that multiple time-pressured individuals deliver superior defect detection effectiveness in comparison to non-time-pressured individuals. However, caution is needed, as the limitations of this study need to be addressed in future works. Finally, we suggest that the size of the crowd used in software testing tasks should be determined based on the share of duplicate and invalid reports produced by the crowd and by the effectiveness of the duplicate handling mechanisms. (C) 2012 Elsevier B.V. All rights reserved.}},
  author       = {{Mäntylä, Mika and Itkonen, Juha}},
  issn         = {{0950-5849}},
  keywords     = {{Software testing; Group performance; Division of labor; Human factors; Crowdsourcing; Methods for SQA and V&V}},
  language     = {{eng}},
  number       = {{6}},
  pages        = {{986--1003}},
  publisher    = {{Elsevier}},
  series       = {{Information and Software Technology}},
  title        = {{More testers - The effect of crowd size and time restriction in software testing}},
  url          = {{http://dx.doi.org/10.1016/j.infsof.2012.12.004}},
  doi          = {{10.1016/j.infsof.2012.12.004}},
  volume       = {{55}},
  year         = {{2013}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

More testers - The effect of crowd size and time restriction in software testing