Advanced

Estimating the number of LTE Cell IDs; a comparison of different species richness estimators

Jokubauskaite, Milda (2018) In LUNFMS-4042-2020 MASK01 20182
Mathematical Statistics
Abstract
Species richness estimation is an ongoing difficult statistical problem. Most of the
research done in this subject focuses on biological data even though, a lot of the methods can be applied more widely. In this paper, we will focus on technological data and compare different known estimators (both, parametric and non-parametric) to see how well they work and compare in determining the total number of LTE Cell IDs (observed and unobserved) focusing on Northern Europe’s region
Popular Abstract
Vast amount of research and various testing have been done for the methods and
their effectiveness for determining the total species richness. One of the biggest questions, that still goes unanswered, is quite intuitive when coming to this problem: ’how many specimens have we not yet seen?’. This question have been asked in various fields for a long time , starting from ecology, where one would like to know, for example, how many different species coexist in one acre of the Amazon forest. Microbiology, genetical studies and various other are raising, essentially, the same question.

Many different methods were and still are researched to answer this question. Most
methods base their estimation on the already observed data. From there, the... (More)
Vast amount of research and various testing have been done for the methods and
their effectiveness for determining the total species richness. One of the biggest questions, that still goes unanswered, is quite intuitive when coming to this problem: ’how many specimens have we not yet seen?’. This question have been asked in various fields for a long time , starting from ecology, where one would like to know, for example, how many different species coexist in one acre of the Amazon forest. Microbiology, genetical studies and various other are raising, essentially, the same question.

Many different methods were and still are researched to answer this question. Most
methods base their estimation on the already observed data. From there, the question changes into: ’ what should I account for in my data in order to get the most accurate
estimator of the unknown? ’. Unfortunately, this question also does not have a uniform answer. It depends, researchers say. Many different estimators where developed and tested using different types of data, and different types of its collection methods. The data we collect is as diverse and complex as the fields of humanity, ecology, biology or as any other of the fields themselves, because our data is a sample, small representation of what we are trying to investigate.

In this paper we will work with technological data. We want to know that, if we
measured a certain amount of LTE cell IDs and assume that we did not discover all of them, how many do we have in total? Since there is no certain answer to this (some will even argue that it is impossible to be certain about the unknown), we will look at what different methods have to say about our data and test how well they fit it. We will work on finding the best estimator for our type of data and get one step closer to answering this almighty question.

Testing and analysis was done mainly in Northern European region countries
(Netherlands, Norway, Denmark, Sweden, Finland, Estonia, Latvia, Lithuania). One of the main investigated is mixed-exponential parametric estimator, which is giving consistently best estimation, if the data curve is smooth enough. Several non-parametric estimators’ groups were tested as well (Chao, ACE, jackknife). Second-order jackknife was assumed to overestimate the total richness for this data, and iChao1 estimator worked the best fulfilling its function as the lower richness bound estimator in almost all tests, and produced highest estimations, when the data set was not sufficiently large and artefacts were not removed. (Less)
Please use this url to cite or link to this publication:
author
Jokubauskaite, Milda
supervisor
organization
course
MASK01 20182
year
type
M2 - Bachelor Degree
subject
publication/series
LUNFMS-4042-2020
report number
2020:K4
ISSN
1654-6229
language
English
additional info
Uppladdad i efterhand mars 2020. Arbetet klarmarkerades i juli 2019.
id
9006325
date added to LUP
2020-03-10 16:00:39
date last changed
2020-03-10 16:00:39
@misc{9006325,
  abstract     = {Species richness estimation is an ongoing difficult statistical problem. Most of the
research done in this subject focuses on biological data even though, a lot of the methods can be applied more widely. In this paper, we will focus on technological data and compare different known estimators (both, parametric and non-parametric) to see how well they work and compare in determining the total number of LTE Cell IDs (observed and unobserved) focusing on Northern Europe’s region},
  author       = {Jokubauskaite, Milda},
  issn         = {1654-6229},
  language     = {eng},
  note         = {Student Paper},
  series       = {LUNFMS-4042-2020},
  title        = {Estimating the number of LTE Cell IDs; a comparison of different species richness estimators},
  year         = {2018},
}