Approximation algorithms for Hamming clustering problems
(2004) In Journal of Discrete Algorithms 2(2 spec. iss.). p.289-301- Abstract
- We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits... (More) - We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/631616
- author
- Gasieniec, Leszek ; Jansson, Jesper LU and Lingas, Andrzej LU
- organization
- publishing date
- 2004
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Theorem proving, Polynomials, Integer programming, Algorithms, Approximation theory, Mathematical models
- in
- Journal of Discrete Algorithms
- volume
- 2
- issue
- 2 spec. iss.
- pages
- 289 - 301
- publisher
- Elsevier
- external identifiers
-
- scopus:10644222717
- ISSN
- 1570-8667
- DOI
- 10.1016/S1570-8667(03)00079-0
- project
- VR 2002-4049
- language
- English
- LU publication?
- yes
- id
- 7d92de4f-efc8-4112-ae13-97e5207d062b (old id 631616)
- date added to LUP
- 2016-04-01 15:36:41
- date last changed
- 2022-02-12 08:49:02
@article{7d92de4f-efc8-4112-ae13-97e5207d062b, abstract = {{We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.<br/><br> <br/><br> We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1.}}, author = {{Gasieniec, Leszek and Jansson, Jesper and Lingas, Andrzej}}, issn = {{1570-8667}}, keywords = {{Theorem proving; Polynomials; Integer programming; Algorithms; Approximation theory; Mathematical models}}, language = {{eng}}, number = {{2 spec. iss.}}, pages = {{289--301}}, publisher = {{Elsevier}}, series = {{Journal of Discrete Algorithms}}, title = {{Approximation algorithms for Hamming clustering problems}}, url = {{http://dx.doi.org/10.1016/S1570-8667(03)00079-0}}, doi = {{10.1016/S1570-8667(03)00079-0}}, volume = {{2}}, year = {{2004}}, }