Advanced

The Shortcut Index

Persson, Anton LU (2016) In 1650-2884 EDA920 20152
Department of Computer Science
Abstract
With a novel path index design, called the Shortcut Index, we partially solve the problem of executing traversal queries on dense neighborhoods in a graph database. We implement our design on top of the graph database Neo4j but it could be used for any graph database that uses the labeled property graph model.

By using a B+ tree, the Shortcut Index can achieve what we call neighborhood locality and range locality of paths. This means that data that belongs to the same part of the graph is located in the same space on disk. We empirically evaluate how this affects performance in terms of response time. In our benchmarks we use two different datasets, one that simulates a real world use case and a "lab environment" that makes it possible... (More)
With a novel path index design, called the Shortcut Index, we partially solve the problem of executing traversal queries on dense neighborhoods in a graph database. We implement our design on top of the graph database Neo4j but it could be used for any graph database that uses the labeled property graph model.

By using a B+ tree, the Shortcut Index can achieve what we call neighborhood locality and range locality of paths. This means that data that belongs to the same part of the graph is located in the same space on disk. We empirically evaluate how this affects performance in terms of response time. In our benchmarks we use two different datasets, one that simulates a real world use case and a "lab environment" that makes it possible to vary neighborhood density and percent of neighborhood interest to more accurately examine how it affects performance. Our experiments show that response time of the index scales very well with neighborhood density and percent of neighborhood interest when compared to Neo4j without the index.

We put focus on making the Shortcut Index useful in an OLTP (online transaction processing) environment, which enforces restrictions on update overhead which in turn restricts how long paths that can be indexed. The presented design can index arbitrary long paths but in our implementation we only index paths of length one.

We conclude that the Shortcut Index improves response time at a reasonable cost and is especially useful when indexing dense neighborhoods that are often queried with limitations on some property value. (Less)
Please use this url to cite or link to this publication:
author
Persson, Anton LU
supervisor
organization
alternative title
Path Indexing in Graph Databases
course
EDA920 20152
year
type
H3 - Professional qualifications (4 Years - )
subject
keywords
neo4j, ldbc, B+ tree, oltp, percent of neighborhood interest, range locality, neighborhood locality, graph database, labeled property graph, graph density, neighborhood density, path index
publication/series
1650-2884
report number
LU-CS-EX 2016-03
ISSN
1650-2884
language
English
id
8572658
date added to LUP
2016-02-05 09:36:08
date last changed
2016-02-05 09:36:08
@misc{8572658,
  abstract     = {With a novel path index design, called the Shortcut Index, we partially solve the problem of executing traversal queries on dense neighborhoods in a graph database. We implement our design on top of the graph database Neo4j but it could be used for any graph database that uses the labeled property graph model.

By using a B+ tree, the Shortcut Index can achieve what we call neighborhood locality and range locality of paths. This means that data that belongs to the same part of the graph is located in the same space on disk. We empirically evaluate how this affects performance in terms of response time. In our benchmarks we use two different datasets, one that simulates a real world use case and a "lab environment" that makes it possible to vary neighborhood density and percent of neighborhood interest to more accurately examine how it affects performance. Our experiments show that response time of the index scales very well with neighborhood density and percent of neighborhood interest when compared to Neo4j without the index.

We put focus on making the Shortcut Index useful in an OLTP (online transaction processing) environment, which enforces restrictions on update overhead which in turn restricts how long paths that can be indexed. The presented design can index arbitrary long paths but in our implementation we only index paths of length one.

We conclude that the Shortcut Index improves response time at a reasonable cost and is especially useful when indexing dense neighborhoods that are often queried with limitations on some property value.},
  author       = {Persson, Anton},
  issn         = {1650-2884},
  keyword      = {neo4j,ldbc,B+ tree,oltp,percent of neighborhood interest,range locality,neighborhood locality,graph database,labeled property graph,graph density,neighborhood density,path index},
  language     = {eng},
  note         = {Student Paper},
  series       = {1650-2884},
  title        = {The Shortcut Index},
  year         = {2016},
}