Garbage In, Garbage Out

In the past five years or so, our devices have gotten really smart. We all have phones in our pockets capable of responding to our commands (at least in theory), and a wide selection of tubes we can put into our homes to translate our spoken words into action. The intelligence these devices display is a product of machine learning, a collection of technologies enabling computers to recognize patterns and create models based on them.

The way these technologies are implemented is complicated, but in essence they work by training a program from some sort of data set. The program iterates through models, moving from ones poorly recognize patterns in the data set to ones more perfectly matching them. This technique has enabled a lot of cool technology, from improving Google Translate to Apple’s new facial recognition.

The problem with machine learning, however, is it is subject to a classic problem in computer science: garbage in means garbage out. In other words, if the data the program is learning from is incorrect or biased, its conclusions will be too. What makes this issue especially problematic is our lack of knowledge of why the program makes the decisions it does; we only know it believes they are correct. On top of this, users of these programs tend to uncritically accept their conclusions, leaving faulty programs with little check on their decisions.

While in some cases this is merely inconvenient, like when your phone stubbornly refuses to understand what you mean, it can be catastrophic when it is applied to more important problems. In a disturbing finding, ProPublica determined an algorithm Florida used to determine defendants’ potential for recidivism was biased against black people. Specifically, they determined “blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend.” Critically, though, the algorithm itself did not know the racial background of those it was evaluating. Instead, it developed inaccurate heuristics about existing data leading to biased and inaccurate conclusions about new data it was presented.

This is a problem some in the tech industry are aware of. John Giannandrea, the head of Google’s AI division, recently identified it as a much bigger problem than malicious superintelligence. However, this internal worry too often does not translate into any specific action. Tech companies and regulators are indifferent to the risks these biases pose, and the public is largely unaware the problem exists.

This is a critical oversight, especially as these technologies work their way into every aspect of our lives. In the future, mortgage companies could use machine learning to analyze the creditworthiness of potential applicants, a practice with the potential to further entrench existing bias. As well, there is substantial interest in using machine learning to inform hiring decisions, analyzing applications based on unconventional indicators of good job performance. While this does have the potential to reduce rates of unconscious bias, the programs already in use have not been adequately scrutinized and could very well cause as many problems as they solve.

While machine learning does have the potential to make smarter technology that is better able to respond to human needs, the problems it poses make public scrutiny and regulation essential. Machine intelligence does not arise from the ether. We create it. We must make sure we do so responsibly and collaboratively, and that we pass on as few of our biases as possible. If we are to create a future where technology makes our lives better, we must first make sure it does not replicate the inequalities that plague us today.

Garbage In, Garbage Out

Share this:

Published by Walter Hanley