Back in 2015, Google was called out for its photo app that mistakenly labeled pictures of people with darker skin as gorillas. As you can imagine, it was a PR disaster. Of course, the company publicly apologized, said that such a result is unacceptable, and promised to fix the mistake.
But apparently – as Wired uncovered three and a half years later – Google somehow never got to fixing the underlying issue. Instead, it implemented a workaround, blocking its AI from identifying gorillas (and other primates) altogether to prevent another miscategorization. A Google spokesperson confirmed to Wired that certain image categories were and remained blocked after the incident in 2015, adding that, “Image labeling technology is still early and unfortunately it’s nowhere near perfect.”
The fact that Google, one of the largest tech companies employing some of the best and brightest AI talent, could not address this issue demonstrates that it’s extremely difficult to mitigate existing bias in machine learning models.
Here, we take a look at how models become biased in the first place, and explore ways to prevent it.
Reason #1: Insufficient Training Data
A major contributor to the problem of bias in AI is that not enough training data was collected. Or more precisely, there is a lack of good training data for certain demographic groups. Because algorithms can only pick up patterns if they have seen plenty of examples. The consequences of insufficiently diverse data can easily be observed with facial recognition technology. A study showed that models performed significantly better on pictures of white males (99% accuracy) versus black females (65%), because the majority of images used in model training consisted of white men.
Reason #2: Humans Are Biased – And Therefore, So Is The Data That AI Is Trained On
Whether we like it or not, as humans we all carry our (un)conscious biases that are reflected in the data we collect about our world. Last year, women accounted for only 6% of S&P company CEOs. Women also held significantly fewer senior management positions than males. This is at least partially attributable to decades of bias about women in the workplace. Yet this is the raw employment data that would be used to train a hiring algorithm. The result? The algorithm might determine that being female correlates poorly with being a CEO. Hiring managers using that algorithm to fill open senior management positions may be presented with resumes from primarily male candidates.
Another related common problem with human bias occurs in the context of supervised machine learning, where humans oftentimes label the data that is used to train a model. Even if they are well-intentioned and do not mean any harm, their unconscious biases could sneak into the training sample.
Search engines also perpetuate some types of bias. Google, for instance, shows predominantly images of black women when one searches for “unprofessional hairstyles.” Since top search results are clicked most often, the skewed results are cemented into the search algorithm, and more and more people are exposed to images of black women that are incorrectly labeled as unprofessional.
If bias in AI is not successfully addressed, it will perpetuate and potentially even amplify biases in our society.
Reason #3: De-Biasing Data Is Exceptionally Hard To Do
If you want a fair algorithm, but historical data is biased, can you clean the data to make it fair? One approach that has been tried is removing sensitive attributes. For example, a person’s race. Unfortunately, research has shown that this does not prevent models from becoming biased, because of correlated attributes that can be used as proxies. Think about a neighborhood that is known to be home to predominantly Black people. Even if race were excluded from training data, the ZIP code of this neighborhood could serve as a proxy that indicates a person’s race. It has been shown that even if sensitive columns are removed, proxies allowed for systematic discrimination of minorities. To counteract this, some researchers advise to actually keep the sensitive columns in the dataset, as they could serve as a more straightforward lever to mitigate bias. For example, if you aim for a model that treats males and females equally, you can use the gender column to directly monitor and correct for potential violations of your desired equality criteria during model training. You can also experiment with Fair Synthetic Data (see below).
Reason #4: Diversity Amongst AI Professionals Is Not As High As It Should Be
Lack of diversity among AI pros is another contributing factor to bias in AI, because the more diverse the team, the more perspectives it can cover. Consider that at Facebook and Google fewer than 2% of technical roles are held by employees with darker skin color – and women account for only 22% of AI professionals globally. A famous example of why diversity helps to mitigate bias comes from Joy Buolamwini, founder of the Algorithmic Justice League and graduate researcher at the MIT Media Lab. When the Ghanaian-American computer scientist joined her research group, she discovered that facial recognition tools performed poorly on her darker skin tone – and sometimes only worked if she wore a white mask. An all-white research team may not have noticed (or thought to look for) the discrepancy.
Reason #5: External Audits Are Challenging Due to Privacy Regulations
Especially in scenarios where AI applications are used in high-stakes environments, many believe that external audits should be used to systematically vet algorithms to detect potential biases. This may be an excellent idea – but often privacy is an issue. To thoroughly evaluate an algorithm one needs not only access to the model but also to the training data. But companies cannot share the customer data they use to develop models, as they need to comply with GDPR, CCPA, and other privacy regulations.
Reason #6: Fairness Is Hard To Define
In the 1970s, only 5% of musicians in the top five orchestras were female. Blind auditions increased the percentage of women to 30%, which certainly is an improvement – but many people would agree that this is still not yet fair. Should there be 50% of women in the orchestra, because roughly half of our world’s population is female? Or would it be fairer if the same percentage of female, as well as male applicants, get accepted? Fairness needs to be defined – and with over 30 different mathematical fairness definitions, various stakeholders first need to come to a conclusion which one to use, before technologists/data scientists could implement them.
Reason #7: Model Drift
Do you remember Microsoft’s “Tay”? The innocent AI chatbot started as a harmless experiment and was intended to learn from conversations with Twitter users – which it did (but probably not as imagined). In less than a day, Tay became misogynistic and racist, tweeting about its hate for feminists and Jews and its love for Hitler.
Of course, Microsoft immediately shut down Tay. What remained is a statutory example that even if you take measures to mitigate bias during the initial training phase, many algorithms are designed to continuously learn and thus are especially vulnerable to becoming biased.
New approaches to de-biasing data
How to achieve fairness in AI? For the research community it is clear that there is no silver bullet to bias mitigation, but that different steps need to be taken and tools need to be applied together, like pieces of a puzzle, to make fairness work. Arguably one of the biggest problems lies in unfair training data, and synthetic data – a new approach to big data anonymization – helps to address it. Traditionally, synthetic data is used to generate a fully anonymous, yet completely realistic and representative, version of an existing dataset that can be used to train machine learning models in a privacy-safe manner.
But the technology can be taken one step further to create fair synthetic data – artificial data sets that reflect the world not as is but as society would like it to be. By tweaking the generation process it becomes possible to create additional data points for groups that are underrepresented or don’t exist in the real data (for instance, more high-earning women, or people of color). This enables an AI algorithm to pick up on these patterns, which is the prerequisite to properly perform for and fairly treat everyone.
Another area where synthetic data has potential to contribute to algorithmic fairness is auditing of AI systems. To audit an algorithm, it is not sufficient to just look at the code. An auditor needs to have access to the model’s training data to understand its behavior. While privacy laws wouldn’t allow a company to share its real AI training data with an external auditor, a synthetic version could be shared in compliance with regulations. Plus, if the auditor identifies gaps in the data, he or she could even create synthetic examples of minority groups the algorithm has never seen during its training process (for instance, transgender individuals or 80+ years-old college students) to thoroughly test how the machine would treat those individuals in the real world.
Fairness is not only the right thing to do – it also pays off
Many companies worry that investing in model quality for fairness will be too costly – that fairness comes at the expense of profits. The truth is that biased algorithms are more costly, not only leading to bad decisions, but also increasing risk in many areas, including public perception. In addition to being the right thing to do morally, investing in fairness and responsible AI is good for business.
Public pressure can play a role in persuading companies to make AI fairness a priority. It’s crucial that society demands fairness in AI and puts it on the agenda of regulators (which subsequently will improve the chance that de-biased AI also makes it on the priority list of conscientious companies). The EU’s AI Act (still in draft form) will help here, as it requires organizations to use fair training data and ensure that their AI algorithms don’t discriminate. Fines currently up to 6% of global annual turnover – even higher than GDPR’s 4% fines – definitely helped put AI fairness on C-level executives’ radar, an important step in the right direction towards more responsible use of AI.
The power of AI is that it can scale processes so effortlessly that they can amplify both the good and the bad way beyond the human scale. A racist judge can make only so many decisions, which are one too many of course, but imagine a racist AI judge, dishing out systemic bad decisions by the thousands per minute. It’s imperative that AI-systems – both the algorithm and the data – are carefully audited, assessed and continually reassessed. A biased AI is a bad apple that can corrupt everything else in your basket.
About the Author
Alexandra Ebert is Chief Trust Officer at MOSTLY AI. Alexandra joined MOSTLY AI in 2019 and took on the role of Chief Trust Officer in 2020 to further strengthen the public trust in synthetic data. She engages with the privacy community, regulators, the media as well as customers. Alexandra regularly speaks at international conferences and is engaged in public policy issues in the emerging field of synthetic data and Ethical AI. In addition, she is the host of MOSTLY AI’s Data Democratization Podcast. Before joining the company, she researched the impact of GDPR on the deployment of AI in Europe and completed a Master’s degree in digital marketing.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1