We study the problem of an online advertising system that wants to optimally spend an advertiser’s given budget for a campaign across multiple platforms, without knowing the value for showing an ad to the users on those platforms. We model this challenging practical application as a Stochastic Bandits with Knapsacks problem over $T$ rounds of bidding with the set of arms given by the set of distinct bidding $m$-tuples, where $m$ is the number of platforms. This paper makes three contributions: First, we give algorithms that efficiently spend the budget across platforms for both discrete and continuous bid spaces: despite the exponential number of arms that we use in the stochastic bandits modeling, the regret only grows polynomially with $m$, $T$, the size $n$ of the discrete bid space of the platforms, and the budget $B$. Namely, for discrete bid spaces we give an algorithm with regret $Oleft(OPT sqrt {frac{mn}{B} }+ sqrt{mn OPT}right)$, where $OPT$ is the performance of the optimal algorithm that knows the distributions. For continuous bid spaces the regret of our algorithm is $tilde{O}left(m^{1/3} cdot minleft{ B^{2/3}, (m T)^{2/3} right} right)$. Secondly, we show an $ Omegaleft (sqrt {m OPT} right)$ lower bound for the discrete case and an $Omegaleft( m^{1/3} B^{2/3}right)$ lower bound for the continuous setting, almost matching the upper bounds. Finally, we use a real-world data set from a large internet online advertising company with multiple ad platforms and show that our algorithms outperform common benchmarks.

# Stochastic bandits for multi-platform budget optimization in online advertising

by dejan | Mar 31, 2021 | 0 comments