How Netflix does A/B testing

Datetime:2016-08-23 03:26:55          Topic: Test Engineer           Share

A couple of weeks ago, I attended a Designers + Geeks event atYelp’s headquarters in San Francisco. Anna Blaylock and Navin Iyengar, both Product Designers atNetflix, shared insights gleaned from their years of A/B testing on tens of millions of Netflix members. They also showed some relevant examples from the product to help attendees think about their own designs.

What follows is my recap of their presentation, along with some of my favorite takeaways.

Photo from the presentation.

Experimentation

I really liked this first slide of the presentation—I think it’s smart to use an image from Breaking Bad to explain the concept ofexperimentation.

The scientific method

Hypothesis

In science, a hypothesis is an idea or explanation that you then test through study and experimentation. In design, a theory or guess can also be called a hypothesis.

The basic idea of a hypothesis is that there is no pre-determined outcome. It’s something that can be tested, and those tests can be replicated.

“The general concept behindA/B testing is to create an experiment with a control group and one or more experimental groups (called ‘cells’ within Netflix) which receive alternative treatments. Each member belongs exclusively to one cell within a given experiment, with one of the cells always designated the ‘default cell.’ This cell represents the control group, which receives the same experience as all Netflix members not in the test.” The Netflix Tech Blog

Here’s how A/B testing is done at Netflix: As soon as the test is live, they track specific metrics of importance. For example, it could be elements like streaming hours and retention. Once the participants have provided enough meaningful conclusions, they move onto the efficacy of each test and define a winner out of the different variations.

Experiment

Many companies like Netflix run experiments to generate user data. It’s also important to take time and effort to organize the experiment properly to ensure that both the type and amount of data is sufficient and available to clarify the questions of interest as efficiently as possible.

You’ve probably noticed that the featured show on the Netflix homepage seems to change whenever you log in. They’re all part of Netflix’s complex experiments to get you to watch their shows.

Homepage when I logged in the first time.

Image from the presentation: The House of Cards page as seen by a signed-out user.

Homepage when I logged in the second time.

Homepage when I switch to a different user name.

Homepage when I switch the user to “kids.”

Homepage when I’m not signed in.

The idea of A/B testing is to present different content to different user groups, gather their reactions, and use the results to build strategies in the future. According to this blog post written by Netflix Engineer Gopal Krishnan , “If you don’t capture a member’s attention within 90 seconds, that member will likely lose interest and move onto another activity. Such failed sessions could at times be because we did not show the right content or because we did show the right content but did not provide sufficient evidence as to why our member should watch it.”

Netflix did an experiment back in 2013 to see if they could create a few artwork variants that increase the audience for a title. Here’s the result:

Image from The Netflix Tech Blog .

Krishnan adds, “It was an early signal that members are sensitive to artwork changes. It was also a signal that there were better ways they could help Netflix members find the types of stories they were looking for within the Netflix experience.”

Netflix later created a system that automatically grouped artwork that had different aspect ratios, crops, touch-ups, and localized title treatments but had the same background image. They replicated experiment on their other TV shows to track relative artwork performance. Here are some examples:

Image from The Netflix Tech Blog . The 2 marked images significantly outperformed all others.

Image from The Netflix Tech Blog . The last marked images significantly outperformed all others.

Check out these 2 blog posts to learn more about Netflix’s A/B testing:

What I learned

A/B testing is the most reliable way to learn user behaviors. As designers, we should think about our work through the lens of experimentation.

Image from the presentation: Your instinct isn’t always right.

  1. When and why are you A/B testing? Once you have a design in production, use A/B testing to tweak the design and target 2 key metrics: retention and revenue. By A/B testing changes throughout the product and tracking users over time, you can see whether your change improves retention or increases revenue. If it does, make it the default. In this way A/B testing can be used to continuously improve business metrics.
  2. Are your users finding or doing one thing you want them to find or to do? My experience is that often, users cannot always complete a task as fast as you expect , and sometimes they can’t even find a certain button you put on a page. The reasons can vary: It might be because the design isn’t intuitive enough; the color isn’t vibrant enough; the user isn’t tech-savvy; they don’t know how to make a decision because there are too many options on one page; and so on.
  3. Are your intuitions correct? Sadly, when it comes to user behavior, our intuitions could be wrong—and the only way to prove it is through A/B testing. A/B testing is the best way to validate whether one UX design is more effective than another. At work, our consumer product team proved that through A/B testing on our real estate website. For example, they wanted to figure out whether they can make a design change to improve the registration rate for users who clicked on a Google Ad. They created a few different experimental designs and tested them. They thought the design that only hides the property image would win, but they found that the design that hides both the property image and the price got the highest conversation rate.
  4. Explore the boundaries. The best ideas come from many idea explorations. At work, our product team workscollaboratively across many different projects. With so many parties involved (from designers to product managers todevelopers), we get to explore the boundaries together. Some of the best ideas sometimes come from the developers or the product managers after they test out our prototypes.
  5. Observe what people do, not what they say. When talking to users, it’s important to keep this in mind: They always say one thing, but they really do it differently. I conducted a few user testing sessions this week and have one perfect example to show you why. I had this one user testing out a contacts list view prototype and asked him if he usually sorts/filters his contacts. He said no because he wouldn’t need do so. But when he discovered the new filters dropdown menu, he was amazed by how convenient it is to sort and filter multiple options at a time—and he immediately asked when that can roll out in production.
  6. Use data to estimate the size of opportunity. 1. It’s always about the whys . Data can help shape ideas.

Knowing your user is the most exciting part of the design process. There is no finished design, but many chances for iteration to improve the design and give our users the best experience possible.

This post was originally published on Medium .





About List