Why Multi-armed Bandit algorithms are superior to A/B testing

This is a really nice description of the benefits of Multi-armed Bandit algorithms for testing: chrisstucchio.com

In summary: multi-armed bandit algorithms allow you to automatically present to your users whatever is the most likely winner among your options on every single test.

The huge advantage is lowered regret: you’re always showing your best guess at the winner, so you show the losers less often.

In addition you can overload your testing queue as much as you like, with the only consequence being that the algorithm will be correspondingly less certain that it is doing the right thing. But it will always make its best guess.

The drawbacks include that it’s not inherently built to consider traffic variations due to daypart, seasonality, different promotions, etc. You can reset the statistics anytime you like, but as it is presented here the algorithm isn’t set up to compare Tuesday data to other Tuesday data, for example.

Even so, this seems like a huge step forward from traditional AB testing for any outfit that is aggressively optimizing. I remember talking to a multivariate vendor who touted his system’s ability for you to pick only the combinations you wanted to test, and then to terminate early losers. That’s an inferior attempt to accomplish the same things multi-armed bandit achieves.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s