Adapting k-means algorithms for outliers

07/02/2020
by   Christoph Grunau, et al.
0

This paper shows how to adapt several simple and classical sampling-based algorithms for the k-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical k-means++ algorithm to the setting with outliers. However, their algorithm needs to output O(log (k) · z) outliers, where z is the number of true outliers, to match the O(log k)-approximation guarantee of k-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed k-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1+ε)z outliers while achieving an O(1 / ε)-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular k-means of Bahmani et al. (PVLDB 2012). A theoretical application of our techniques is an algorithm with running time Õ(nk^2/z) that achieves an O(1)-approximation to the objective function while outputting O(z) outliers, assuming k ≪ z ≪ n. This is complemented with a matching lower bound of Ω(nk^2/z) for this problem in the oracle model.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro