Advanced

Studies in Applied Data Structures

Swanson, Kurt LU (1998)
Abstract
The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for real-world problems. We present results for the following problems:



In tolerancing, the Out-Of-Roundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute... (More)
The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for real-world problems. We present results for the following problems:



In tolerancing, the Out-Of-Roundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.



We consider the general problem of (2-dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new non-trivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multi-dimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).



We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multi-character substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.



We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice. (Less)
Please use this url to cite or link to this publication:
author
opponent
  • Dr Boyar, Joan, Odense
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Systems engineering, Sparse Matrix, Suffix Tree, Roundness, Range searching, Data- och systemvetenskap, computer technology
pages
77 pages
publisher
Department of Computer Science, Lund University
defense location
E:1406
defense date
1998-10-08 10:15
external identifiers
  • Other:ISRN: LUNFD6/(NFCS-14)/1-77/(1998)
ISBN
91-628-3155-0
language
English
LU publication?
yes
id
766de679-a73b-4a51-ad67-ba3aced7a084 (old id 38914)
date added to LUP
2007-10-14 17:44:18
date last changed
2016-09-19 08:45:05
@phdthesis{766de679-a73b-4a51-ad67-ba3aced7a084,
  abstract     = {The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for real-world problems. We present results for the following problems:<br/><br>
<br/><br>
In tolerancing, the Out-Of-Roundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.<br/><br>
<br/><br>
We consider the general problem of (2-dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new non-trivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multi-dimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).<br/><br>
<br/><br>
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multi-character substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.<br/><br>
<br/><br>
We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice.},
  author       = {Swanson, Kurt},
  isbn         = {91-628-3155-0},
  keyword      = {Systems engineering,Sparse Matrix,Suffix Tree,Roundness,Range searching,Data- och systemvetenskap,computer technology},
  language     = {eng},
  pages        = {77},
  publisher    = {Department of Computer Science, Lund University},
  school       = {Lund University},
  title        = {Studies in Applied Data Structures},
  year         = {1998},
}