Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Approximation algorithms for Hamming clustering problems

Gasieniec, Leszek ; Jansson, Jesper LU and Lingas, Andrzej LU (2004) In Journal of Discrete Algorithms 2(2 spec. iss.). p.289-301
Abstract
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.



We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits... (More)
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.



We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Theorem proving, Polynomials, Integer programming, Algorithms, Approximation theory, Mathematical models
in
Journal of Discrete Algorithms
volume
2
issue
2 spec. iss.
pages
289 - 301
publisher
Elsevier
external identifiers
  • scopus:10644222717
ISSN
1570-8667
DOI
10.1016/S1570-8667(03)00079-0
project
VR 2002-4049
language
English
LU publication?
yes
id
7d92de4f-efc8-4112-ae13-97e5207d062b (old id 631616)
date added to LUP
2016-04-01 15:36:41
date last changed
2022-02-12 08:49:02
@article{7d92de4f-efc8-4112-ae13-97e5207d062b,
  abstract     = {{We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.<br/><br>
<br/><br>
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon&gt;0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0&lt;var epsilon&lt;1.}},
  author       = {{Gasieniec, Leszek and Jansson, Jesper and Lingas, Andrzej}},
  issn         = {{1570-8667}},
  keywords     = {{Theorem proving; Polynomials; Integer programming; Algorithms; Approximation theory; Mathematical models}},
  language     = {{eng}},
  number       = {{2 spec. iss.}},
  pages        = {{289--301}},
  publisher    = {{Elsevier}},
  series       = {{Journal of Discrete Algorithms}},
  title        = {{Approximation algorithms for Hamming clustering problems}},
  url          = {{http://dx.doi.org/10.1016/S1570-8667(03)00079-0}},
  doi          = {{10.1016/S1570-8667(03)00079-0}},
  volume       = {{2}},
  year         = {{2004}},
}