Advanced

Scalability study of database-backed file systems for High Throughput Computing

Trinh, Andy (2017)
Computer Science and Engineering (BSc)
Abstract
The purpose of this project is to study the read performance of transparent
database-backed file systems, a meld between two technologies with seemingly
similar purposes, in relation to conventional file systems. Systems such
as the ARC middleware relies on reading several millions of files every day,
and as the number of files increases, the performance suffers. To study the
capabilities of a database-backed file system, a candidate is chosen and put
into test. The candidate, ultimately being Database File System (DBFS), is
Oracle Database using FUSE to create a transparent file system interface.
DBFS is put into test by storing millions of small files in its datafile and
executing a scanning process of the ARC software. With the... (More)
The purpose of this project is to study the read performance of transparent
database-backed file systems, a meld between two technologies with seemingly
similar purposes, in relation to conventional file systems. Systems such
as the ARC middleware relies on reading several millions of files every day,
and as the number of files increases, the performance suffers. To study the
capabilities of a database-backed file system, a candidate is chosen and put
into test. The candidate, ultimately being Database File System (DBFS), is
Oracle Database using FUSE to create a transparent file system interface.
DBFS is put into test by storing millions of small files in its datafile and
executing a scanning process of the ARC software. With the performance
data gathered from these tests, it was concluded that DBFS, while performing
well on an HDD when compared to ext4 in terms of scalability and read
performance, is simply outperformed by XFS with small (from 50 000 files)
and large (up to 1 600 000 files) directories. (Less)
Please use this url to cite or link to this publication:
author
Trinh, Andy
organization
year
type
M2 - Bachelor Degree
subject
keywords
database-backed file system, dbfs, scalability, xfs, ext4, database, file system, fuse, arc, read performance, alternative storage, rdbms, file system interface
language
English
id
8924095
date added to LUP
2017-08-30 04:11:03
date last changed
2018-10-18 10:36:59
@misc{8924095,
  abstract     = {The purpose of this project is to study the read performance of transparent
database-backed file systems, a meld between two technologies with seemingly
similar purposes, in relation to conventional file systems. Systems such
as the ARC middleware relies on reading several millions of files every day,
and as the number of files increases, the performance suffers. To study the
capabilities of a database-backed file system, a candidate is chosen and put
into test. The candidate, ultimately being Database File System (DBFS), is
Oracle Database using FUSE to create a transparent file system interface.
DBFS is put into test by storing millions of small files in its datafile and
executing a scanning process of the ARC software. With the performance
data gathered from these tests, it was concluded that DBFS, while performing
well on an HDD when compared to ext4 in terms of scalability and read
performance, is simply outperformed by XFS with small (from 50 000 files)
and large (up to 1 600 000 files) directories.},
  author       = {Trinh, Andy},
  keyword      = {database-backed file system,dbfs,scalability,xfs,ext4,database,file system,fuse,arc,read performance,alternative storage,rdbms,file system
interface},
  language     = {eng},
  note         = {Student Paper},
  title        = {Scalability study of database-backed file systems for High Throughput Computing},
  year         = {2017},
}