Abstract:

We study a general online convex optimization problem. There is a

convex set S and an unknown sequence of concave reward functions

f_1,f_2,.... In each period, we choose a feasible point x_t in S, and

receive the reward f_t(x_t) (f_t itself remains unknown). Our goal is

to have large average reward. We give an algorithm whose average

reward is guaranteed to approach the average reward of the best single

point x in S.

As an example, consider a company that has to decide how much to spend

advertising on a number of channels each week. At the end of the

week, they only find out their total profit, which is a function of

their expenditures. At the end of the year, we can guarantee that

their expected total profit is almost as large as if they had used

the best single campaign each week, for the whole year, chosen with

the benefit of hindsight. We assume only that the functions are

concave and bounded, i.e., no statistical assumptions relating one

function to the others.

Our approach is very simple: we give an approximation of the gradient

that is computed from evaluating a function at a single (random) point.

Joint work with Abie Flaxman and Brendan McMahan