Tuesday, April 18, 2017

The One With A/B Testing

In my last post, I introduced a little bit of what Netflix does with their data, but this data is mostly just surface data, not big picture data. In reality, Netflix uses a lot of different types of testing and algorithms to determine what new features they should launch for their audiences, what new shows, and how Netflix is presented in general.

Netflix has a large membership, over 81.5M members (techblog.netflix.com), so when they receive data and try to sort through it, it can be a little nerve-wracking. How do you figure out what the users want? How do you get them to stay with a certain show? How do you track whether or not what you are doing is working? The answer seems to be A/B testing. 
"In marketing and business intelligenceA/B testing is a term for a randomized experiment with two variants, A and B, which are the control and variation in the controlled experiment.[1] A/B testing is a form of statistical hypothesis testing with two variants leading to the technical term, two-sample hypothesis testing, used in the field of statistics." - Wikipedia.com (A/B Testing).
So how does this work with Netflix?

Netflix uses A/B testing to determine when and how to proceed with new features or setups on Netflix. For example, on Netflix's techblog, "www.techblog.netflix.com" they discuss the data they received by changing the covers of titles they have on the search lists. By changing the fonts and background images for the title an releasing those versions to some viewers, but keeping the titles the same for others, they discovered something interesting.

Photo: http://techblog.netflix.com/search/label/algorithms

Viewers were more inclined to watch and stay on the titles that had more expressive facial expressions in the title rather than just body motions or poses.

Coming to this conclusion took more than just one experiment process, however, it took at least three versions of this experiment to get to this conclusion.
1) Experiment 1 - Single Title Test with Multiple Test Cells:  This was the beginning of their journey to finding more successful artwork. They tested out different versions of one title, and kept a control group. There was a huge positive response to the change in title covers.
2) Experiment 2 - Multi-Cell Explore Exploit Test: They started mixing blockbuster titles and small industry titles to see if they had the same effect as the first experiment. There were also more rules and criteria for this project.
3) Experiment 3 - Single Cell Title Explore Test

Netflix launches over 200 tests per year (business.financialpost.com) to keep updating and improving their business. One of their newest projects is updating their rating system from a 5 star rating to a thumbs up/thumbs down rating. The reason? Their results showed it was skewed to mostly a positive outcome for their titles, with no ratings or very minimal ratings for shows they were not impressed with.

While what Netflix is doing could probably have an entire blog dedicated to their news, for the purposes of this blog, this will do for now. Keep on doing you Netflix, and we will be there for you.

Sources:
https://www.infoq.com/news/2017/03/netflix-big-data-analytics
http://business.financialpost.com/fp-tech-desk/personal-tech/data-is-mothers-milk-to-netflix-as-it-tweaks-algorithms-to-find-that-perfect-content-just-like-a-dating-app
http://techblog.netflix.com/search/label/algorithms

No comments:

Post a Comment