The problem
Bias is difficult to detect when developing an AI system because the software you write is not the AI. You write software, you feed it a large set of training data that is tagged with values and the software produces an AI that can attribute those values to new data. So, if you give it a set of pictures, and someone has tagged the interesting ones as A and the boring ones as B, it will produce an AI which can look at new pictures and tag them as A or B.
The problem is that the AI isn't created in a way such that humans can understand what rules or patterns it is using to connect the data to the values. Humans would create a set of rules such as "curved lines", "variety of colours", but the generated AI can have millions of tiny rules. Reviewing the rules to find bias is thus often impractical, so bias is usually detected by looking at the output, and sometimes this only becomes visible after a lot of use.
The bias could originate in the software or in the data. Or sometimes it comes from the data but has to be fixed in the software. The data might accurately show that visibly pregnant women are more likely to take maternity leave than women who are not visibly pregnant, or than men. There is no error in the data, but as a society we have decided that this criteria cannot be used in, for example, hiring decisions. That's something that usually has to be addressed in the software, but it still may not be easy. It's almost impossible to give an AI an instruction such as "ignore pregnancy status" because AI isn't applying a simple rule about body shape but rather millions of tiny rules and it's hard to decide which of these rules are directly or partially or indirectly linked to pregnancy.
How access to source code helps
But that doesn't mean access to the source code is pointless. It's crucial for four reasons:
- Bias can be due to something in the software generating the AI
- Maybe the software needs explicit counter-bias to be programmed in
- Access to change the source code means the service provider can address the issue when bias is detected
- Public access to the source code allows an ecosystem of experts to emerge, beyond those who work for companies selling AI
Extra care for public administrations
Public administrations may be held to an even higher standard than the private sector, for two reasons:
- Public administrations have a monopoly on providing certain services
- In some cases, the citizens have an obligation to use the service
Trivia
Free software, as a defined concept, and before the additional term open source was even invented, was born in an AI lab. The MIT AI lab is where Richard Stallman was working before he resigned to work on the GNU Project. (More at Wikipedia's History of free and open-source software.)
Further reading
The above is a simple explanation of the main issues, but there are also discussions ongoing about open data, verifiability, the copyright of the generated AI, the copyright of data that an AI generates, liability for actions taken based on AI, and many more topics.
The following links may be of interest:
- Open source data science: How to reduce bias in AI (WeForum, 14 Oct 2022)
- Defining an open source AI for the greater good (opensource.com, 10 October 2022
- Solving for AI’s black box problem (opensource.com, 12 July 2022)
- Unjust Algorithms (FSF, 6 July 2022)
- Publication of the FSF-funded white papers on questions around Copilot, scroll down for links to the five papers (FSF, 24 Feb 2022)
- Five ways that Open Source Software shapes AI policy (Brookings, 18 Aug 2021)