Movie Review Scores Are Fundamentally Flawed

8 years ago

May 5, 2016 at 4:00 am

Movie Review Scores Are Fundamentally Flawed

Rotten Tomatoes and Metacritic have become our first stop in determining how good a movie is. Until recently, I had no idea how each site arrived at their review scores. Once I found out, I realised I’d been reading them all wrong.

Where Rotten Tomatoes and Metacritic Ratings Come From

Rotten Tomatoes and Metacritic ratings are embedded in everything from movie listing apps like Flixster to Google search results. You’ve probably seen the rating next to a movie title. Experienced users might even know that each site actually has two scores: one for critics and one for regular viewers. What you may not realise is that each site calculates those numbers very differently.

To get the critics’ ratings, Rotten Tomatoes collects critic reviews from a variety of sources, usually a couple of hundred or so, depending on how high profile the movie is. Each review is then categorized as either Fresh (positive) or Rotten (negative). The score you see is the percentage of the total reviews that are considered “Fresh”. So, for example, with the recent superhero clash-up Batman v Superman, the site collected 327 reviews, 90 of which fell into the positive category. Ninety is 28 per cent of 327, so that becomes the movie’s score.

Metacritic, on the other hand, uses a bit more nuance in their system. The company collects reviews from around the web and assigns them a score ranging from 0 to 100. In instances where a site uses a measurable metric — like a numerical rating system or a letter grade — Metacritic fills in a number that it most closely believes represents that figure. The site then takes a weighted average of all the reviews. The company doesn’t reveal how much weight it assigns to individual reviewers, but it does explain that certain reviewers are given more significance in overall score based on their “stature”. This system allows a bit more nuance to show through. In the case of Batman v Superman, Metacritic gave the movie a 44, which is considerably higher than the 28 per cent Rotten Tomatoes gave it.

It’s worth pointing out that Rotten Tomatoes and Metacritic — as well as IMDb — also have separate user scores. These work more or less consistently across all three sites. Users are allowed to rate a movie on a scale from one to ten (technically Rotten Tomatoes uses a five-star rating, but you can use half-stars, making the maths functionally identical). Then, each site has different ways of weighting their scores, to come up with the final user rating.

Rotten Tomatoes Drags Scores Towards the Extreme

The problem with Rotten Tomatoes’ method is that by boiling down an entire review to “good” or “bad”, it gives critical reviews the nuance of a coin flip. This dramatically sways review scores in polarising directions. While Rotten Tomatoes doesn’t draw attention to it, you can find an “average rating” for every film directly below the Tomatometer score on the website. This scale averages reviewer scores after they have been assigned a value on a ten-point scale. If we look at that Batman v Superman example again, we see that its average rating is actually 4.9. That’s even higher than Metacritic rated the movie. However, since Rotten Tomatoes treats a reviewer who thought the movie was OK but had some problems the same way it treats a reviewer that thought the movie was total crap, that slightly-below-average 4.9 score gets dragged down to an abysmal 28 per cent score.

This effect isn’t just negative, though. We can look at the other big winter superhero clash to see the effect in reverse. Captain America: Civil War pulls in a respectable average rating of 7.9 on Rotten Tomatoes right now, but the Tomatometer score is considerably higher at 92 per cent (with 126 “Fresh” reviews out of 137). Once again, Metacritic’s method gives Civil War a score of 77, which is much closer to Rotten Tomatoes average rating. Appropriately, this effect makes the Tomatometer a bit like Captain America’s super soldier serum: Good becomes great. Bad becomes worse.

The same effect applies to Rotten Tomatoes user scores, though it’s a bit less pronounced. Any score of 3.5 stars (or 7 out of 10) is considered positive, or “Fresh”. Less than that is considered negative or “Rotten”. The user score represents the percentage of positive ratings. While this is still simplistic, the source data has more room for a middle ground than a subjective “good” or “bad”, and it has a much bigger data set to pull from.

Metacritic is More Nuanced, But Also Might Be More Biased

Rotten Tomatoes biggest problem may be that it avoids nuance, but there’s an understandable reason why it might want to. While Metacritic embraces nuance, it’s also sometimes criticised for getting it “wrong”. As we established earlier, Metacritic assigns a numeric value to reviews before averaging them. However, picking those numbers can be a subjective ordeal.

For example, many review sites will offer letter grades attached to their reviews on an A through F scale. In the case of an F, Metacritic would assign that review a score of 0, while a review like a B- might receive a 67. Some reviewers disagree with how this metric is assigned, believing that an F should be closer to a 50, or a B should be closer to an 80. The lack of standardisation across letter grades notwithstanding, this highlights a key problem with Metacritic: How do you put a numerical value on an opinion?

Paradoxically, Metacritic gives reviewers both more and less control over their scores. A reviewer’s rankings and opinions are represented more faithfully with a numerical score than a boolean good/bad value. On the other hand, it also has more wiggle room that might result in a reviewer’s opinions being represented in a way they disagree with. This can be a huge problem if an industry starts relying on review scores. Of course, if Metacritic only allowed each reviewer to choose a score of either 100 or a 0, there would probably be a lot more disagreement (which, mathematically speaking, is exactly what Rotten Tomatoes does).

What Really Matters In a Review Score

No matter how “objective” we try to get when it comes to review scores, we’re still trying to convert opinions into numbers. That’s a bit like trying to turn love into a fossil fuel. The conversion doesn’t make sense on its face. However, review scores are still useful. There are a lot of movies out there and most of us don’t have enough time or money to watch them all for ourselves. Reviewers help us determine which films are worth spending our time on. Handy review scores make it even easier, turning the decision into a simple, two-digit number. In my experience (also an opinion!) here are the best ways to use each metric:

Rotten Tomatoes is a basic yes/no recommendation engine. If you want a simple answer to the question “Should I see this movie?” Rotten Tomatoes probably answers it pretty well. The score isn’t necessarily reflective of how good the movie is, but it measures enthusiasm for a film pretty well. Just keep in mind that it tends to drag films to the extremes.
Metacritic tries to measure the value of a film, based on reviewers opinions. Opinions are never objective, but Metacritic will probably more closely resemble the actual quality of a film than Rotten Tomatoes. The flip side is that the site may also inadvertently inject opinions of its own.
User reviews on all sites are generally consistent representations of the public’s opinion. There are minor variances between Rotten Tomatoes, Metacritic and IMDb user ratings, but since they’re all open to the public, you can use any user rating to get a decent glimpse into what the average movie going audience thinks. Just keep in mind, it’s exactly that. The average movie-going audience. If your tastes differ from the mainstream, you might not agree with user ratings.

Most importantly, remember that your opinions are still your own. Reviewers, no matter how well-intentioned, come from different backgrounds than you and might enjoy some things you don’t. Moviegoers like to follow review scores like they’re a competitive sport. While that’s fun and all, it’s important to keep in mind no score will ever be truly objective as long as they’re measuring opinions. Use the metrics that are most helpful to you to decide what you’ll spend your time on, but don’t let a number tell you what to like or dislike.

Our Housing System Is Broken and the Poorest Australians Are Being Hardest Hit

12 of the Best Lamps to Buy If You’re Sick of Using the Big Light

Limit Your Data Usage With These Plans and Phone Settings

Baby Reindeer: How the Series Brings a Needed Perspective on Male Victimisation

The Secret to Happiness, According to Psychology Experts

Here Are Amazon Australia’s Best Deals of the Week

TPG Has Changed the Prices for Almost All of Its NBN Plans

Wrap Me in ALDI’s $30 Heated Winter Travel Blanket

JB Hi-Fi Is Clearing Out Games For As Little As $2

Amazon Australia Beauty Week Sale: 24 of the Best Products to Shop

Movie Review Scores Are Fundamentally Flawed

Where Rotten Tomatoes and Metacritic Ratings Come From

Rotten Tomatoes Drags Scores Towards the Extreme

Metacritic is More Nuanced, But Also Might Be More Biased

What Really Matters In a Review Score

Comments