Batched Lipschitz Bandits

10/19/2021
by   Yasong Feng, et al.
0

In this paper, we study the batched Lipschitz bandit problem, where the expected reward is Lipschitz and the reward observations are collected in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that naturally fits into the batched feedback setting. In particular, we show that for a T-step problem with Lipschitz reward of zooming dimension d_z, our algorithm achieves theoretically optimal regret rate of 𝒪( T^d_z + 1/d_z + 2) using only 𝒪( log T/d_z) batches. For the lower bound, we show that in an environment with B-batches, for any policy π, there exists a problem instance such that the expected regret is lower bounded by Ω(R_z(T)^1/1-(1/d+2)^B), where R_z (T) is the regret lower bound for vanilla Lipschitz bandits that depends on the zooming dimension d_z, and d is the dimension of the arm space.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro