Efficient Linear Bandits through Matrix Sketching

09/28/2018

∙

We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size m allows a O(md) update time for both algorithms, as opposed to Ω(d^2) required by their non-sketched versions (where d is the dimension of context vectors). When the selected contexts span a subspace of dimension at most m, we show that this computational speedup is accompanied by an improved regret of order m√(T) for sketched OFUL and of order m√(dT) for sketched Thompson Sampling (ignoring log factors in both cases). Vice versa, when the dimension of the span exceeds m, the regret bounds become of order (1+ε_m)^3/2d√(T) for OFUL and of order ((1+ε_m)d)^3/2√(T) for Thompson Sampling, where ε_m is bounded by the sum of the tail eigenvalues not covered by the sketch. Experiments on real-world datasets corroborate our theoretical results.

READ FULL TEXT

Efficient Linear Bandits through Matrix Sketching

Sign in with Google

Consider DeepAI Pro