Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications

Wang, Meisong ; Jayaraman, Prem Prakash ; Solaiman, Ellis ; Chen, Lydia Y. ; Li, Zheng LU ; Jun, Song ; Georgakopoulos, Dimitrios and Ranjan, Rajiv (2018) In Future Generation Computer Systems 87. p.580-590
Abstract

In the era of the Internet of Things and social media; communities, governments, and corporations are increasingly eager to exploit new technological innovations in order to track and keep up to date with important new events. Examples of such events include the news, health related incidents, and other major occurrences such as earthquakes and landslides. This area of research commonly referred to as Topic Detection and Tracking (TDT) is proving to be an important component of the current generation of Internet-based applications, where it is of critical importance to have early detection and timely response to important incidents such as those mentioned above. The advent of Big data though beneficial to TDT applications also brings... (More)

In the era of the Internet of Things and social media; communities, governments, and corporations are increasingly eager to exploit new technological innovations in order to track and keep up to date with important new events. Examples of such events include the news, health related incidents, and other major occurrences such as earthquakes and landslides. This area of research commonly referred to as Topic Detection and Tracking (TDT) is proving to be an important component of the current generation of Internet-based applications, where it is of critical importance to have early detection and timely response to important incidents such as those mentioned above. The advent of Big data though beneficial to TDT applications also brings about the enormous challenge of dealing with data variety, velocity and volume (3Vs). A promising solution is to employ Cloud Computing, which enables users to access powerful and scalable computational and storage resources in a "pay-as-you-go" fashion. However, the efficient use of Cloud resources to boost the performance of mission critical applications employing TDT is still an open topic that has not been fully and effectively investigated. An important prerequisite is to build a performance analysis capable of capturing and explaining specific factors (for example; CPU, Memory, I/O, Network, Cloud Platform Service, and Workload) that influence the performances of TDT applications in the cloud. Within this paper, our main contribution, is that we present a multi-layered performance analysis for big data TDT applications deployed in a cloud environment. Our analysis captures factors that have an important effect on the performance of TDT applications. The novelty of our work is that it is a first kind of vertical analysis on infrastructure, platform and software layers. We identify key parameters and metrics in each cloud layer (including Infrastructure, Software, and Platform layers), and establish the dependencies between these metrics across the layers. We demonstrate the effectiveness of the proposed analysis via experimental evaluations using real-world datasets obtained from Twitter.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Big Data, Cloud computing, Cloud-based TDT, Performance analysis
in
Future Generation Computer Systems
volume
87
pages
580 - 590
publisher
Elsevier
external identifiers
  • scopus:85043276152
ISSN
0167-739X
DOI
10.1016/j.future.2018.01.047
language
English
LU publication?
yes
id
30d70100-8a98-4a30-8b1a-d572abbdd3a7
date added to LUP
2018-03-22 15:12:37
date last changed
2022-04-25 06:26:49
@article{30d70100-8a98-4a30-8b1a-d572abbdd3a7,
  abstract     = {{<p>In the era of the Internet of Things and social media; communities, governments, and corporations are increasingly eager to exploit new technological innovations in order to track and keep up to date with important new events. Examples of such events include the news, health related incidents, and other major occurrences such as earthquakes and landslides. This area of research commonly referred to as Topic Detection and Tracking (TDT) is proving to be an important component of the current generation of Internet-based applications, where it is of critical importance to have early detection and timely response to important incidents such as those mentioned above. The advent of Big data though beneficial to TDT applications also brings about the enormous challenge of dealing with data variety, velocity and volume (3Vs). A promising solution is to employ Cloud Computing, which enables users to access powerful and scalable computational and storage resources in a "pay-as-you-go" fashion. However, the efficient use of Cloud resources to boost the performance of mission critical applications employing TDT is still an open topic that has not been fully and effectively investigated. An important prerequisite is to build a performance analysis capable of capturing and explaining specific factors (for example; CPU, Memory, I/O, Network, Cloud Platform Service, and Workload) that influence the performances of TDT applications in the cloud. Within this paper, our main contribution, is that we present a multi-layered performance analysis for big data TDT applications deployed in a cloud environment. Our analysis captures factors that have an important effect on the performance of TDT applications. The novelty of our work is that it is a first kind of vertical analysis on infrastructure, platform and software layers. We identify key parameters and metrics in each cloud layer (including Infrastructure, Software, and Platform layers), and establish the dependencies between these metrics across the layers. We demonstrate the effectiveness of the proposed analysis via experimental evaluations using real-world datasets obtained from Twitter.</p>}},
  author       = {{Wang, Meisong and Jayaraman, Prem Prakash and Solaiman, Ellis and Chen, Lydia Y. and Li, Zheng and Jun, Song and Georgakopoulos, Dimitrios and Ranjan, Rajiv}},
  issn         = {{0167-739X}},
  keywords     = {{Big Data; Cloud computing; Cloud-based TDT; Performance analysis}},
  language     = {{eng}},
  month        = {{03}},
  pages        = {{580--590}},
  publisher    = {{Elsevier}},
  series       = {{Future Generation Computer Systems}},
  title        = {{A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications}},
  url          = {{http://dx.doi.org/10.1016/j.future.2018.01.047}},
  doi          = {{10.1016/j.future.2018.01.047}},
  volume       = {{87}},
  year         = {{2018}},
}