Approximation algorithms for Hamming clustering problems
(2004) In Journal of Discrete Algorithms 2(2 spec. iss.). p.289301 Abstract
 We study Hamming versions of two classical clustering problems. The Hamming radius pclustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the pradius of S and is denoted by varrho. The related Hamming diameter pclustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the pdiameter of S.
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits... (More)  We study Hamming versions of two classical clustering problems. The Hamming radius pclustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the pradius of S and is denoted by varrho. The related Hamming diameter pclustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the pdiameter of S.
We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomialtime solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric pclustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NPhard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the pdiameter, and that solving HDC exactly is an NPcomplete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthestpoint clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2time (1+var epsilon)approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/record/631616
 author
 Gasieniec, Leszek; Jansson, Jesper ^{LU} and Lingas, Andrzej ^{LU}
 organization
 publishing date
 2004
 type
 Contribution to journal
 publication status
 published
 subject
 keywords
 Theorem proving, Polynomials, Integer programming, Algorithms, Approximation theory, Mathematical models
 in
 Journal of Discrete Algorithms
 volume
 2
 issue
 2 spec. iss.
 pages
 289  301
 publisher
 Elsevier
 external identifiers

 scopus:10644222717
 ISSN
 15708667
 DOI
 10.1016/S15708667(03)000790
 project
 VR 20024049
 language
 English
 LU publication?
 yes
 id
 7d92de4fefc84112ae1397e5207d062b (old id 631616)
 date added to LUP
 20071205 15:18:53
 date last changed
 20171217 03:50:30
@article{7d92de4fefc84112ae1397e5207d062b, abstract = {We study Hamming versions of two classical clustering problems. The Hamming radius pclustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the pradius of S and is denoted by varrho. The related Hamming diameter pclustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the pdiameter of S.<br/><br> <br/><br> We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomialtime solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric pclustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NPhard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the pdiameter, and that solving HDC exactly is an NPcomplete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthestpoint clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2time (1+var epsilon)approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0<var epsilon<1.}, author = {Gasieniec, Leszek and Jansson, Jesper and Lingas, Andrzej}, issn = {15708667}, keyword = {Theorem proving,Polynomials,Integer programming,Algorithms,Approximation theory,Mathematical models}, language = {eng}, number = {2 spec. iss.}, pages = {289301}, publisher = {Elsevier}, series = {Journal of Discrete Algorithms}, title = {Approximation algorithms for Hamming clustering problems}, url = {http://dx.doi.org/10.1016/S15708667(03)000790}, volume = {2}, year = {2004}, }