Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Integrating data and analysis technologies within leading environmental research infrastructures : Challenges and approaches

Huber, Robert ; D'Onofrio, Claudio LU orcid ; Devaraju, Anusuriya ; Klump, Jens ; Loescher, Henry W. ; Kindermann, Stephan ; Guru, Siddeswara ; Grant, Mark ; Morris, Beryl and Wyborn, Lesley , et al. (2021) In Ecological Informatics 61.
Abstract

When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing... (More)

When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and , et al. (More)
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; and (Less)
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Data analysis environments, Data service providers, Research infrastructures, Scientific data analysis
in
Ecological Informatics
volume
61
article number
101245
publisher
Elsevier
external identifiers
  • scopus:85101012163
ISSN
1574-9541
DOI
10.1016/j.ecoinf.2021.101245
language
English
LU publication?
yes
id
bdd1e582-141f-449d-a9a3-64c7cecfae24
date added to LUP
2021-03-01 09:37:03
date last changed
2022-04-27 00:27:36
@article{bdd1e582-141f-449d-a9a3-64c7cecfae24,
  abstract     = {{<p>When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.</p>}},
  author       = {{Huber, Robert and D'Onofrio, Claudio and Devaraju, Anusuriya and Klump, Jens and Loescher, Henry W. and Kindermann, Stephan and Guru, Siddeswara and Grant, Mark and Morris, Beryl and Wyborn, Lesley and Evans, Ben and Goldfarb, Doron and Genazzio, Melissa A. and Ren, Xiaoli and Magagna, Barbara and Thiemann, Hannes and Stocker, Markus}},
  issn         = {{1574-9541}},
  keywords     = {{Data analysis environments; Data service providers; Research infrastructures; Scientific data analysis}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Ecological Informatics}},
  title        = {{Integrating data and analysis technologies within leading environmental research infrastructures : Challenges and approaches}},
  url          = {{http://dx.doi.org/10.1016/j.ecoinf.2021.101245}},
  doi          = {{10.1016/j.ecoinf.2021.101245}},
  volume       = {{61}},
  year         = {{2021}},
}