

Wasserstein gradient flows and applications to sampling in machine learning - lecture 1
De Anna Korba


Wasserstein gradient flows and applications to sampling in machine learning - lecture 2
De Anna Korba
De Harry Lang
Apparaît dans la collection : 2016 - T1 - WS4 - Inference problems theme
We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space O(1)-approximation to the metric k-median and metric k-means problems in the sliding window model, answering the main open problem posed by Babcock, Datar, Motwani and O'Callaghan, which has remained unanswered for over a decade. Our algorithm uses O(k^3log^6 n) space and poly(k,log n) update time. This is an exponential improvement on the space required by the technique due to Babcock, et al. We introduce a data structure that extends smooth histograms as introduced by Braverman and Ostrovsky to operate on a broader class of functions. In particular, we show that using only polylogarithmic space we can maintain a summary of the current window from which we can construct an O(1)-approximate clustering solution. Merge-and-reduce is a generic method in computational geometry for adapting offline algorithms to the insertion-only streaming model. Several well-known coreset constructions are maintainable in the insertion-only streaming model using this method, including well-known coreset techniques for the k-median and k-means problems in both low-and high-dimensional Euclidean space. Previous work has adapted coreset techniques to the insertion-deletion model, but translating them to the sliding window model has remained a challenge. We give the first algorithm that, given an insertion-only streaming coreset of space s (maintained using merge-and-reduce method), maintains this coreset in the sliding window model using O(s^2ϵ^{−2}log n)space. For clustering problems, our results constitute the first significant step towards resolving problem number 20 from the List of Open Problems in Sublinear Algorithms.