Improved Analysis of Robustness of the Tsallis-INF Algorithm to Adversarial Corruptions in Stochastic Multiarmed Bandits

03/23/2021
by   Saeed Masoudian, et al.
0

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). In the adversarial regime with a self-bounding constraint and the stochastic regime with adversarial corruptions as its special case we improve the dependence on corruption magnitude C. In particular, for C = Θ(T/log T), where T is the time horizon, we achieve an improvement by a multiplicative factor of √(log T/loglog T) relative to the bound of Zimmert and Seldin (2021). We also improve the dependence of the regret bound on time horizon from log T to log(K-1)T/(∑_i≠ i^*1/Δ_i)^2, where K is the number of arms, Δ_i are suboptimality gaps for suboptimal arms i, and i^* is the optimal arm. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro