Machine learning holds great promise for helping us to manage vast swathes of data and complementing humans as we try to solve more complex problems in our world. Everything from finding the best route between home and the airport through to finding cures for diseases can be helped by machine learning. But when machine learning is used to make decisions that directly affect people, we need to be able to ask how the models work. Emily Pries, from Lyft, looked at the question of machine learning fairness at the the recent Twilio Signal conference.
Pries’ presentation was made through the lens of looking at how machine learning is used in social justice campaigns. The value of a machine learning model, she said, is that the difference between the actual outcome of a scenario is as close as possible to the predicted outcome. So, if a machine learning model that predicts something matches the actual outcome, it can be considered a reasonable model.
“Fair does not mean equal,” said Pries.
As a former teacher, Pries said that in a fair system, all the students at a school would have the same snacks during break time. But if one person has something different then it could be argued that the distribution of food is not equal. However, if a student has an allergy, then having the alternate food is seen as fair.
Fairness comes from understanding how decisions or predictions are made.
The problem is that the models that are used are not always well understood. And where that’s the case it it easy to feel that a system is unfair. For example, Pries noted that the models used to allocate school resources in the USA are created by private companies and the algorithms they use are proprietary. That means teachers can’t change their performance scores, which dictate resource allocations, as they aren’t able to understand what behaviours result in superior ratings.
“If we cannot know learn where the prediction comes from, how do we know that it’s fair?”, asked Pries.
In order to show part of the complexity in understanding how fairness works, Pries walked the audience through Simpson’s paradox. This is a statistical effect that is counterintuitive when we look at data and use it to make predictions.
Using the example of applicants at a university, Pries said that the number of male applicants for substantially greater than the number of female applicants. Also, the number of males admitted was greater than the number of women. But, when you aggregate the data across departments, there were more women than men.
So, although more men applied and more men were admitted, there were more women in the university at a department level.
The question of fairness in machine learning models used can apply in many contexts. And it’s important to understand what variables we are prepared to accept in different models.
“If you’re applying for a home loan, your interest rate shouldn’t be determined by your gender, ethnicity – these things are irrelevant. In some situations where we break this. If we’re talking about car insurance rates, we are happy for young people to pay a lot more because they end up in accidents a lot more that older people do”.
In other words, for a model to be trusted we need to know what criteria are used and the maths being applied needs to verifiable.
Pries looked at how these models can break down in the area of semantic analysis. In assigning numeric values to words, machine learning can be used to help understand text. But a recent example, cited by Pries, highlighted how the models can break down.
Google’s text analysis was able to connect the words man – king – programmer. When then the gender was changed, the progression became woman – queen – receptionist. Clearly, the model was flawed.
A machine learning tool in the USA, called COMPAS, was created to assist with sentencing people convicted on a crime in order to remove racial bias in sentencing. The model aimed to predict whether a person convicted of a crime would be likely to reoffend. But the algorithm was proprietary and it was not possible to understand how the predictions about reoffending were made.
But the model was on 60% accurate and, in the portion of offenders that the model didn’t work for, there was a higher level of reoffending. But as the model couldn’t be examined, there’s no way of understanding what factors were being used. And high-risk offenders, as identified by the model, didn’t reoffend.
For machine learning models to satisfy everyone, Pries says they need to be useful across groups and not make incorrect predictions either in the positive or negative. But it’s not possible to have all three of these situations covered all the time.
Which brought Pries to an interesting point. If a machine learning model offers perfect predictions for a situation, then that must mean we understand that environment and, therefore, don’t need machine learning.
However, what we do need is explainability of the machine learning model. That allows for the model to be understood, questioned when it doesn’t;t delver the desired impact and then refined.
But without explainability, it’s not possible to accept the fairness of a machine learning system.
Anthony Caruana attended Twilio Signal in San Francisco as a guest of Twilio.