Movie Review Scores Are Fundamentally Flawed

Movie Review Scores Are Fundamentally Flawed

Rotten Tomatoes and Metacritic have become our first stop in determining how good a movie is. Until recently, I had no idea how each site arrived at their review scores. Once I found out, I realised I'd been reading them all wrong.

Where Rotten Tomatoes and Metacritic Ratings Come From

Rotten Tomatoes and Metacritic ratings are embedded in everything from movie listing apps like Flixster to Google search results. You've probably seen the rating next to a movie title. Experienced users might even know that each site actually has two scores: one for critics and one for regular viewers. What you may not realise is that each site calculates those numbers very differently.

To get the critics' ratings, Rotten Tomatoes collects critic reviews from a variety of sources, usually a couple of hundred or so, depending on how high profile the movie is. Each review is then categorized as either Fresh (positive) or Rotten (negative). The score you see is the percentage of the total reviews that are considered "Fresh". So, for example, with the recent superhero clash-up Batman v Superman, the site collected 327 reviews, 90 of which fell into the positive category. Ninety is 28 per cent of 327, so that becomes the movie's score.

Metacritic, on the other hand, uses a bit more nuance in their system. The company collects reviews from around the web and assigns them a score ranging from 0 to 100. In instances where a site uses a measurable metric -- like a numerical rating system or a letter grade -- Metacritic fills in a number that it most closely believes represents that figure. The site then takes a weighted average of all the reviews. The company doesn't reveal how much weight it assigns to individual reviewers, but it does explain that certain reviewers are given more significance in overall score based on their "stature". This system allows a bit more nuance to show through. In the case of Batman v Superman, Metacritic gave the movie a 44, which is considerably higher than the 28 per cent Rotten Tomatoes gave it.

It's worth pointing out that Rotten Tomatoes and Metacritic -- as well as IMDb -- also have separate user scores. These work more or less consistently across all three sites. Users are allowed to rate a movie on a scale from one to ten (technically Rotten Tomatoes uses a five-star rating, but you can use half-stars, making the maths functionally identical). Then, each site has different ways of weighting their scores, to come up with the final user rating.

Rotten Tomatoes Drags Scores Towards the Extreme

Movie Review Scores Are Fundamentally Flawed

The problem with Rotten Tomatoes' method is that by boiling down an entire review to "good" or "bad", it gives critical reviews the nuance of a coin flip. This dramatically sways review scores in polarising directions. While Rotten Tomatoes doesn't draw attention to it, you can find an "average rating" for every film directly below the Tomatometer score on the website. This scale averages reviewer scores after they have been assigned a value on a ten-point scale. If we look at that Batman v Superman example again, we see that its average rating is actually 4.9. That's even higher than Metacritic rated the movie. However, since Rotten Tomatoes treats a reviewer who thought the movie was OK but had some problems the same way it treats a reviewer that thought the movie was total crap, that slightly-below-average 4.9 score gets dragged down to an abysmal 28 per cent score.

This effect isn't just negative, though. We can look at the other big winter superhero clash to see the effect in reverse. Captain America: Civil War pulls in a respectable average rating of 7.9 on Rotten Tomatoes right now, but the Tomatometer score is considerably higher at 92 per cent (with 126 "Fresh" reviews out of 137). Once again, Metacritic's method gives Civil War a score of 77, which is much closer to Rotten Tomatoes average rating. Appropriately, this effect makes the Tomatometer a bit like Captain America's super soldier serum: Good becomes great. Bad becomes worse.

The same effect applies to Rotten Tomatoes user scores, though it's a bit less pronounced. Any score of 3.5 stars (or 7 out of 10) is considered positive, or "Fresh". Less than that is considered negative or "Rotten". The user score represents the percentage of positive ratings. While this is still simplistic, the source data has more room for a middle ground than a subjective "good" or "bad", and it has a much bigger data set to pull from.

Metacritic is More Nuanced, But Also Might Be More Biased

Movie Review Scores Are Fundamentally Flawed

Rotten Tomatoes biggest problem may be that it avoids nuance, but there's an understandable reason why it might want to. While Metacritic embraces nuance, it's also sometimes criticised for getting it "wrong". As we established earlier, Metacritic assigns a numeric value to reviews before averaging them. However, picking those numbers can be a subjective ordeal.

For example, many review sites will offer letter grades attached to their reviews on an A through F scale. In the case of an F, Metacritic would assign that review a score of 0, while a review like a B- might receive a 67. Some reviewers disagree with how this metric is assigned, believing that an F should be closer to a 50, or a B should be closer to an 80. The lack of standardisation across letter grades notwithstanding, this highlights a key problem with Metacritic: How do you put a numerical value on an opinion?

Paradoxically, Metacritic gives reviewers both more and less control over their scores. A reviewer's rankings and opinions are represented more faithfully with a numerical score than a boolean good/bad value. On the other hand, it also has more wiggle room that might result in a reviewer's opinions being represented in a way they disagree with. This can be a huge problem if an industry starts relying on review scores. Of course, if Metacritic only allowed each reviewer to choose a score of either 100 or a 0, there would probably be a lot more disagreement (which, mathematically speaking, is exactly what Rotten Tomatoes does).

What Really Matters In a Review Score

No matter how "objective" we try to get when it comes to review scores, we're still trying to convert opinions into numbers. That's a bit like trying to turn love into a fossil fuel. The conversion doesn't make sense on its face. However, review scores are still useful. There are a lot of movies out there and most of us don't have enough time or money to watch them all for ourselves. Reviewers help us determine which films are worth spending our time on. Handy review scores make it even easier, turning the decision into a simple, two-digit number. In my experience (also an opinion!) here are the best ways to use each metric:

  • Rotten Tomatoes is a basic yes/no recommendation engine. If you want a simple answer to the question "Should I see this movie?" Rotten Tomatoes probably answers it pretty well. The score isn't necessarily reflective of how good the movie is, but it measures enthusiasm for a film pretty well. Just keep in mind that it tends to drag films to the extremes.
  • Metacritic tries to measure the value of a film, based on reviewers opinions. Opinions are never objective, but Metacritic will probably more closely resemble the actual quality of a film than Rotten Tomatoes. The flip side is that the site may also inadvertently inject opinions of its own.
  • User reviews on all sites are generally consistent representations of the public's opinion. There are minor variances between Rotten Tomatoes, Metacritic and IMDb user ratings, but since they're all open to the public, you can use any user rating to get a decent glimpse into what the average movie going audience thinks. Just keep in mind, it's exactly that. The average movie-going audience. If your tastes differ from the mainstream, you might not agree with user ratings.

Most importantly, remember that your opinions are still your own. Reviewers, no matter how well-intentioned, come from different backgrounds than you and might enjoy some things you don't. Moviegoers like to follow review scores like they're a competitive sport. While that's fun and all, it's important to keep in mind no score will ever be truly objective as long as they're measuring opinions. Use the metrics that are most helpful to you to decide what you'll spend your time on, but don't let a number tell you what to like or dislike.


Comments

    RT is not a simple yes/no engine. It's far more nuanced. It represents the percentage of people who like it. It often surprises me how accurate this can be at judging a movie. The recent Pixar movie about Dinosaurs was shockingly bad as it aimed at the lowest common denominator and was ultra conservative. I thought RT scores might go to either extreme, but they reflected a disrespect for this cheesy movie. This is likely because critics see many movies and have a good basis to compare relative value. The more critics that like it, the higher the score.

    Conversely, IMDB has many scores of 1 for movies. These greatly drag down scores. There are very few movies that deserve a 1. But, if 1 in 10 people score a movie 1, the other 9 need to score it 10 to get 90%.

    There are some nuances of scoring that are bad. Disney's jungle book was very well made for my taste. However, had Christopher Walken's role of King Louie been played by an African American, this movie would have been panned by the critics. Here's one of many articles on the topic.
    http://www.theguardian.com/film/2016/apr/03/jungle-book-disney-remake-racism-worries

    The problem is that conservative thinking is usually rewarded and some prejudices are accepted. It also means that good movies can be panned because of something innocent deemed sensitive if it aligns with Politically Correct thinking. For example, feminists in the media were outraged when Black Widow was saved by a man (Google it). Yet, Black Widow (with no super powers) had saved many men, no questions asked. This sort of rocking the Politically Correct boat can sink a film and drives some shallow thinking in many movies today, for fear of losing ticket sales. If Janet Van Dyme was punched in the face in an unprovoked manner by Ant Man (ie. The opposite of what happened in Ant Man), you would have seen outrage and a likely box office bomb. Not withstanding, I highly disrespect that media is willing to tolerate violence and sexism from some people (hint: one gender). The lack of condemnation for the punch in Ant Man is one of many examples.

    You'll never find a perfect system. I tend to prefer RT because it measures critics' consensus. I also can easily target 1 positive and 1 negative review to get a better balance. I rejected Zootopia on this basis despite a high score of 98%.

Join the discussion!