Adaptive Sampling for Rapidly Matching Histograms

08/20/2017
by   Stephen Macke, et al.
0

In exploratory data analysis, analysts often have a need to identify histograms that possess a specific distribution, among a large class of candidate histograms, e.g., find histograms of countries whose income distribution is most similar to that of Greece. This distribution could be a new one that the user is curious about, or a known distribution from an existing histogram visualization. At present, this process of identification is brute-force, requiring the manual generation and evaluation of a large number of histograms. We present FastMatch: an end-to-end architecture for interactively retrieving the histogram visualizations that are most similar to a user-specified target, from a large collection of histograms. The primary technical contribution underlying FastMatch is a sublinear algorithm, HistSim, a theoretically sound sampling-based approach to identify the top-k closest histograms under ℓ_1 distance. While HistSim can be used independently, within FastMatch we couple HistSim with a novel system architecture that is aware of practical considerations, employing block-based sampling policies and asynchronous statistics and computation, building on lightweight sampling engines developed in recent work. In our experiments on several real-world datasets, FastMatch obtains near-perfect accuracy with up to 100× speedups over less sophisticated approaches.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro