Bias in AI Happens When We Optimize the Wrong Thing

Finding examples of “problematic” AI is relatively easy these days. Microsoft has inadvertently given rise to an unhinged, neo-nazi Twitter Bot while an AI beauty contest judge seems to strongly favor white women. Despite the sensational nature of these examples, they reflect a pervasive problem plaguing many modern AI systems.

Machine learning is designed to discover and exploit patterns in data so as to optimize some notion of performance. Most measures of good performance involve maximizing accuracy, yet this performance metric is often sufficient only for situations in which perfect accuracy can be achieved. When a task is difficult enough that the system is prone to errors, AI agents may fail in ways that we, as humans, may consider unfair or that take advantage of undesirable patterns in the data. Here, I discuss the issue of bias in AI and argue that great care must be taken to train a machine learning system to avoid systematic bias.

The notion of “perfect accuracy” is also simplistic in general. If an AI system is being used to screen candidates to hire, deciding how to define accuracy is already a value judgment.

In short, if you are a business professional looking to use some form of machine learning, you need to be aware of how bias can manifest itself in practice.

I define bias more broadly than many colloquial definitions so that it captures the more subtle effects that can appear in machine learning contexts: bias is when a machine learning system treats a some distinguishable subgroup of the data in an undesirable way. Among the more well-known examples are machine learning tools that exhibit social biases present in the data used to train them. For instance, one popular machine learning technique designed to work with natural language infamously associated the word “woman” with gender-stereotypical vocations like “receptionist” and “homemaker”.

For more detailed perspective on how social biases manifest themselves in AI systems, I recommend reading this article from Kevin Petrasic and Benjamin Saul of the law firm White & Case.

Yet machine learning bias is not exclusively a byproduct of “prejudiced” data. For example, consider a training dataset with 1,000 images: 990 contain cats and the remaining 10 contain dogs. A typical machine learning task on such a dataset is classification, automatically determining if a new image contains either a cat or a dog. Most standard machine learning classifiers aim to maximize performance on the training dataset, yet problems arise because of the large imbalance in the number of images of cats and dogs in the data fed to the algorithm. The machine learning algorithm can easily achieve 99% accuracy on the training dataset by simply predicting that every image contains a cat. Since the resulting system systematically misclassifies all images of dogs, we may reasonably say that the system is unfair or biased.

Though the cat/dog classifier is rather innocuous, imagine instead a system that is supposed to locate people in an image as part of a self-driving car AI. Perhaps the training data contains more men than women, and suddenly the autonomous vehicle is more likely to collide with female pedestrians. Similarly, an automated resume screening system may strongly prefer hiring white men over otherwise equal candidates. These examples are almost trivially clear-cut: it is obvious that these systems are biased in a way that we consider problematic. Unfortunately, discovering hidden biases can be difficult. This is especially true of machine learning systems for which decision making may be opaque, so that the correlations they discover between different features of the data are almost impossible to probe.

Only by discouraging machine learning systems from exploiting a certain bias can we expect such a system to avoid doing so. Relatedly, most of the objectives used in modern machine learning fail to take into account human judgment, resulting in AI systems that fail to capture notions of human values like fairness. How a machine should weigh the relative importance of accuracy and fairness is an area of active research. Yet a general function capable of evaluating the fairness of a machine learning system across arbitrary application domains remains an elusive goal, and so algorithmic implementations of fairness are uncommon in off-the-shelf machine learning tools.

Though machine learning tools are not yet “bias aware” by default, it is worth mentioning that many recent companies looking to provide machine-learning-based services are actively including mechanisms to fight bias.

With the ever-lengthening list of tasks that modern machine learning has be used to automate, it is often easy to overlook the potential problems such systems can introduce. Understanding how these systems can exacerbate problems in the datasets they are trained on is the first step towards addressing the issue. As machine learning tools are put into production, it is ultimately the responsibility of their designers to ensure that such systems do not exploit unintended bias.