Studies in Applied Data Structures
(1998) Abstract
 The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for realworld problems. We present results for the following problems:
In tolerancing, the OutOfRoundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute... (More)  The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for realworld problems. We present results for the following problems:
In tolerancing, the OutOfRoundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.
We consider the general problem of (2dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new nontrivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multidimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.
We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/record/38914
 author
 Swanson, Kurt ^{LU}
 opponent

 Dr Boyar, Joan, Odense
 organization
 publishing date
 1998
 type
 Thesis
 publication status
 published
 subject
 keywords
 Systems engineering, Sparse Matrix, Suffix Tree, Roundness, Range searching, Data och systemvetenskap, computer technology
 pages
 77 pages
 publisher
 Department of Computer Science, Lund University
 defense location
 E:1406
 defense date
 19981008 10:15
 external identifiers

 other:ISRN: LUNFD6/(NFCS14)/177/(1998)
 ISBN
 9162831550
 language
 English
 LU publication?
 yes
 id
 766de679a73b4a51ad67ba3aced7a084 (old id 38914)
 date added to LUP
 20071014 17:44:18
 date last changed
 20160919 08:45:05
@phdthesis{766de679a73b4a51ad67ba3aced7a084, abstract = {The design of efficient data structures is of primary importance in creation of theoretical algorithms as well their more tangible descendants, computer programs. In this dissertation we study computational aspects of data structures and their respective algorithms from a theoretical viewpoint, which are however of direct importance in the implementation of solutions for realworld problems. We present results for the following problems:<br/><br> <br/><br> In tolerancing, the OutOfRoundness factor determines the relative circularity of planar shapes. We show that the Minimum Radial Separation algorithm given by Le and Lee runs in Theta(n^2) time even for convex polygons. Furthermore, we present an optimal O(n) time algorithm to compute the Minimum Radial Separation of convex polygons, which represents not only a factor n improvement over the previously best known algorithm, but also a factor of log n improvement over Le and Lee's conjectured complexity for the problem.<br/><br> <br/><br> We consider the general problem of (2dimensional) range reporting allowing arbitrarily convex queries. We show that using a traditional approach, a polylogarithmic query time can not be achieved unless more than linear space is used. Our arguments are based on a new nontrivial lower bound in a new model of computation, Layered Partitions, which can be used to describe all known algorithms for processing range queries, as well as many other data structures used to represent multidimensional data. We show that Omega((log n)/(log T(n))) partitions must be used to allow queries in O(T(n) + k) time, for n total and k reported elements, and for any growing function T(n).<br/><br> <br/><br> We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This word suffix tree} represents only the m suffixes that start at word boundaries. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time.<br/><br> <br/><br> We propose a new data structure for storing sparse matrices which are too large to fit entirely within main memory. This data structure is optimized to use the computer's page size and is arranged in order to be able to efficiently handle random access and updates useful for a wide range of matrix operations. We also present several variations on an ancillary structure which greatly decreases the probability of unnecessary page faults when accessing the structure, even when the size of main memory is extremely limited. We assert that these data structures are easy to implement and provide very good results in practice.}, author = {Swanson, Kurt}, isbn = {9162831550}, keyword = {Systems engineering,Sparse Matrix,Suffix Tree,Roundness,Range searching,Data och systemvetenskap,computer technology}, language = {eng}, pages = {77}, publisher = {Department of Computer Science, Lund University}, school = {Lund University}, title = {Studies in Applied Data Structures}, year = {1998}, }