Considering how powerful AI systems are and the roles they increasingly play in helping make high-stakes decisions about our lives, homes and societies, they receive surprisingly little formal scrutiny.
That is starting to change, thanks to the burgeoning field of AI audits. When they work well, these audits allow us to reliably verify how well a system is working and figure out how to mitigate any potential bias or harm.
Famously, a 2018 audit of commercial facial recognition systems by AI researchers Joy Buolamwini and Timnit Gebru found that the system failed to recognize darker-skinned people as well as white people. For dark-skinned women, the error rate was up to 34%. As Abeba Birhane, an AI researcher, points out in a new essay in Nature, the audit “fueled a critical body of work that has exposed bias, discrimination, and the oppressive nature of facial analysis algorithms.” The hope is that by doing these kinds of audits on different AI systems, we can better root out problems and have a larger conversation about how AI systems are affecting our lives.
Regulators are catching up, and that is partly driving demand for audits. A new law in New York City will begin requiring all AI-powered recruiting tools to be audited for bias starting in January 2024. In the European Union, big tech companies will be required to conduct annual audits of their recruitment systems. AI from 2024, and the next AI The law will require audits of “high risk” AI systems.
It’s a huge ambition, but there are some huge hurdles. There is no common understanding of what an AI audit should look like, and not enough people with the right skills to conduct it. The few audits that do happen today are mostly ad hoc and vary widely in quality, Alex Engler, who studies AI governance at the Brookings Institution, told me. One example he gave is that of AI recruiting firm HireVue, which hinted in a press release that an external audit found its algorithms to be unbiased. It turned out that it made no sense: the audit hadn’t actually looked at the company’s models and was bound by a confidentiality agreement, which meant there was no way to verify what it found. It was essentially nothing more than a public relations stunt.
One way the AI community is trying to address the lack of auditors it is through bias bounty competitions, which work similarly to cybersecurity bug bounties, that is, they call on people to create tools to identify and mitigate algorithmic biases in AI models. One such competition launched last week, organized by a group of volunteers, including Twitter’s AI ethics leader Rumman Chowdhury. The team behind it hopes he is the first of many.
It’s a good idea to create incentives for people to learn the skills needed to conduct audits, and also to start creating standards for what audits should look like by showing which methods work best. You can read more about this here.
The growth of these audits suggests we could one day see cigarette-package-style warnings. that AI systems could harm their health and safety. Other sectors, such as chemicals and food, have regular audits to ensure products are safe to use. Could something like this become the norm in AI?