Advanced

Approximation algorithms for Hamming clustering problems

Gasieniec, Leszek; Jansson, Jesper LU and Lingas, Andrzej LU (2004) In Journal of Discrete Algorithms 2(2 spec. iss.). p.289-301
Abstract
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.



We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits... (More)
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.



We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Theorem proving, Polynomials, Integer programming, Algorithms, Approximation theory, Mathematical models
in
Journal of Discrete Algorithms
volume
2
issue
2 spec. iss.
pages
289 - 301
publisher
Elsevier
external identifiers
  • scopus:10644222717
ISSN
1570-8667
DOI
10.1016/S1570-8667(03)00079-0
project
VR 2002-4049
language
English
LU publication?
yes
id
7d92de4f-efc8-4112-ae13-97e5207d062b (old id 631616)
date added to LUP
2007-12-05 15:18:53
date last changed
2017-12-17 03:50:30
@article{7d92de4f-efc8-4112-ae13-97e5207d062b,
  abstract     = {We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.<br/><br>
<br/><br>
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon&gt;0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0&lt;var epsilon&lt;1.},
  author       = {Gasieniec, Leszek and Jansson, Jesper and Lingas, Andrzej},
  issn         = {1570-8667},
  keyword      = {Theorem proving,Polynomials,Integer programming,Algorithms,Approximation theory,Mathematical models},
  language     = {eng},
  number       = {2 spec. iss.},
  pages        = {289--301},
  publisher    = {Elsevier},
  series       = {Journal of Discrete Algorithms},
  title        = {Approximation algorithms for Hamming clustering problems},
  url          = {http://dx.doi.org/10.1016/S1570-8667(03)00079-0},
  volume       = {2},
  year         = {2004},
}