Two prominent research-image collections—including one supported by Microsoft and Facebook—display a predictable gender bias in their depiction of activities such as cooking and sports. Images of shopping and washing are linked to women, for example, while coaching and shooting are tied to men.
Machine-learning software trained on the datasets didn’t just mirror those biases, it amplified them. If a photo set generally associated women with cooking, software trained by studying those photos and their labels created an even stronger association.
Mark Yatskar, a researcher at the Allen Institute for Artificial Intelligence, says that this phenomenon could also amplify other biases in data, for example related to race. “This could work to not only reinforce existing social biases but actually make them worse,” says Yatskar, who worked with Ordóñez and others on the project while at the University of Washington.
“A system that takes action that can be clearly attributed to gender bias cannot effectively function with people,” he says.
When image-recognition software is “trained” by examining these datasets, the bias is amplified. A system trained on the COCO dataset associated men with keyboards and computer mice even more strongly than the dataset itself.
The researchers devised a way to neutralize this amplification phenomenon—effectively forcing learning software to reflect its training data. But it requires a researcher to be looking for bias in the first place, and to specify what he or she wants to correct. And the corrected software still reflects the gender biases baked into the original data.
One point of agreement in the field is that using machine learning to solve problems is more complicated than many people previously thought.
“Work like this is correcting the illusion that algorithms can be blindly applied to solve problems,” says Suresh Venkatasubramanian, a professor at the University of Utah.