Batched Lipschitz Bandits

10/19/2021

∙

In this paper, we study the batched Lipschitz bandit problem, where the expected reward is Lipschitz and the reward observations are collected in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that naturally fits into the batched feedback setting. In particular, we show that for a T-step problem with Lipschitz reward of zooming dimension d_z, our algorithm achieves theoretically optimal regret rate of 𝒪( T^d_z + 1/d_z + 2) using only 𝒪( log T/d_z) batches. For the lower bound, we show that in an environment with B-batches, for any policy π, there exists a problem instance such that the expected regret is lower bounded by Ω(R_z(T)^1/1-(1/d+2)^B), where R_z (T) is the regret lower bound for vanilla Lipschitz bandits that depends on the zooming dimension d_z, and d is the dimension of the arm space.

READ FULL TEXT

Batched Lipschitz Bandits

Sign in with Google

Consider DeepAI Pro