Bias in AI Algorithms and Platforms Can Affect eLearning

Artificial intelligence, machine learning, and otheralgorithm-driven technologies have the potential to revolutionize eLearningthrough their ability to parse data, identify patterns, categorize information,and predict outcomes. Algorithms enhance the efficiency and personalization ofmany types of online training, performancesupport, and problem-solving tools. However, any machine learning or data-crunchingalgorithm is only as reliable as its coding and the data used to “train” it,and built-in bias in algorithms is becoming a significant concern intechnologies used for tasks as disparate as making parole decisions,determining the terms of consumer loans, deciding which job applicants tointerview, guiding students on college and career paths—or predicting whichentry-level employees might succeed as managerial candidates.

Learning Solutions offers this overview of bias inalgorithms as part of a deeper exploration of the potential of artificialintelligence to transform eLearning design. A companion article will examineproposals for evaluating algorithms or platforms to detect or mitigate bias.

Encoded bias

Some algorithms are built to be biased: In a paperon auditing algorithms, Christian Sandvig and three co-authors describe SABRE,an early algorithm for booking airline flights that American Airlines createdin the 1950s. Executives were open about—even proud of—their algorithm, withthe company president declaring in testimony to Congress that “biasing SABRE’ssearch results to the advantage of his own company was in fact his primaryaim.” Sandvig notes that today, what he calls “algorithm transparency” remainsa pressing problem—and it can be much harder to detect. Algorithms might still beprogrammed to favor the products or services of their creator—Amazon andGoogle, among others, have been accused of this.

Some bias is unintentional. Algorithms might use acombination of demographic characteristics to determine what products,services, or opportunities to offer to individuals. While the characteristicsused superficially seem unbiased, the result could be discriminatory. Forexample, consumers might be offered different investment opportunities orcredit cards based on a combination of where they live, whether they rent orown their homes, and their income. Those characteristics can offer strong hintsas to the individuals’ race or ethnicity and patterns or decisions based onthem could result in minorities being disproportionally steered to lessfavorable products. 

In an eLearning context, algorithms might limit the courseselection shown to some learners or exclude valuable content from a curatedsite, based on coded criteria that the L&D team are unaware of. 

Data sets used to “train” machine learning algorithms

Machine learning is at the heart of much personalizedand adaptive eLearning. Algorithm-powered machinelearning-based technologies are “trained” using a data set. Bias can taintthe algorithm when the data set used to train the algorithm is not sufficientlydiverse or representative.

Facial recognition algorithms, for instance, learn torecognize faces by being taught to identify faces from thousands or millions ofimages. Eventually, the algorithm “learns” what elements to look for in animage to decide whether the image is a human face, a dog, or a cat. The machinelearning detects as its pattern that a human face has specific features—nose,eyes, lips—and that these features follow a pattern of shapes, sizes, etc. Ifthe training data set is not diverse, the pattern will tend to match only alimited subset of the shapes and features that human faces include. Inwell-publicized examples, facial recognition programs have had trouble recognizingAsian and African or African American faces because the vast majority of theimages in the training sets contained mostly Caucasian faces.

A similar pattern-matching algorithm could learn otherdiscriminatory patterns. Imagine, for example, a career-counseling or coachingtool or an algorithm intended to predict which new hires were the mostpromising candidates for management training. The training data set mightlogically use historical employment records—an accurate set of datarepresenting employees who had become successful managers—to learn the patternof what characteristics those managers shared. Based on that pattern, thealgorithm could predict which applicants or new hires might succeed asmanagerial trainees. However, if the data set reflects the current andhistorical makeup of management in American corporations, the vast majority ofexamples of successful managers would be white men. Thus the tool would learnan unintended and extremely undesirable pattern: The best management candidatesare white men. Similar biases have been found in algorithms that, for example,present men with ads for high-paying and executive-level positions far morefrequently than they present those job ads to equally qualified women, or thatguide women and minorities to different academic or career paths than whitemales with similar grades, course histories, and other characteristics.

An additional area where bias in data used to teach algorithmscould affect corporate eLearning is in the algorithms and technologies that usenatural language recognition and processing. This technology—that powersmachine translation, searches and interactions in digital assistants, and muchmore—uses types of AI that could be widely implemented in eLearning andperformance support tools.

Natural language use in AI is built upon systems that“teach” programs to associate word pairs. These associations, called“embeddings,” are learned by the machine algorithms from “thousands of articles the algorithms hadautomatically scavenged and analyzed from online sources such as Google Newsand Wikipedia,” according to Ben Dickson of TechTalks. Themachines scour this content to learn how people use language, with the goal ofcreating sentences, responses, and examples that will sound natural tousers of the program.

But the articles reflect human and social bias. Articlesabout engineers and high-tech workers, for example, are more likely to includemale names than female names, since these fields are male-dominated. But, whilea human might notice this imbalance and see a social problem, all the algorithmsees is a pattern, a pattern that it builds into its language processing rules.Thus when a team of Boston University and Microsoft researchers studiedWord2vec, a widely used algorithm, they found extreme gender bias in manyof the associations.

As some of these examples show, bias in algorithms or theirresults can result from perfectly reasonable choices as to which parameters areused, unintentional selection of data sets that are insufficiently diverse, oruse of data and tools that reflect historical and current realities, with allof their inequalities.

Whether bias is embedded in the basic code at the foundationof an AI- or algorithm-based technology or creeps in as a result of training data,it has the potential to become widespread.  The increasing use of off-the-shelf code orcode libraries disseminates this biased code globally and to an enormousvariety of applications.

“Given the convenience and reusability of libraries, over timemany programmers use popular code libraries to do common tasks. If the codebeing reused has bias, this bias will be reproduced by whoever uses the codefor a new project or research endeavor,” Joy Buolamwini, the founder of the Algorithmic Justice League, wrotein Medium. “Unfortunately, reused code at times reflects the lack ofinclusion in the tech space in non-obvious but important ways.”

The bias in AI algorithms will also be present in any eLearning builtupon these flawed technologies or platforms. Awareness is the first step towardavoiding bias; in a companion article, Learning Solutions will presentsuggestions for auditing algorithms and platforms to detect bias. And TheeLearning Guild is presenting a Data& Analytics Summit August 22 – 23, 2018. Register now for theopportunity to learn more about the relationships between data and content.

Share:


Contributor

Topics: