Auditing Machine Learning Algorithms: A White Paper for Public Auditors

Auditing Machine Learning Algorithms

by Jan Roar Beckstrom, Chief Data Scientist—The Innovation Lab, Office of the Auditor General of Norway

Public authorities and government entities have already started developing and implementing Artificial Intelligence (AI) and Machine Learning (ML) algorithms to improve public services and reduce costs.

While prospective gains are immense, this technology also presents new challenges and risks, such as data security, the possibility of automated and institutionalized unequal treatment, and mass production of incorrect or discriminatory decisions.

As AI becomes more prevalent, it will become increasingly necessary for Supreme Audit Institutions (SAIs) to audit applications that are based on AI and ML algorithms—usually performed as special performance or compliance audit cases. Additionally, AI models tend to be embedded in broader Information Technology (IT) infrastructures, which signals a need to incorporate IT audit elements.

Currently, limited guidance exists for public auditors on how to audit AI and ML algorithms. To bridge this gap, the Office of the Auditor General of Norway—together with data science colleagues from the SAIs of Finland, Germany, the Netherlands and the United Kingdom—developed “Auditing Machine Learning Algorithms: A White Paper for Public Auditors.”

The paper, available online at, summarizes key risks connected to using AI and ML in public services. Based on cumulative experience with AI audits and audits of other software development projects, the white paper also suggests an audit catalogue that includes methodological approaches for AI-application audits.

This article briefly touches on some of the key points.

Project Management & Governance of AI Systems
Is highly specialized technical knowledge of AI models required to audit algorithms? Not necessarily.

Auditing an AI-system’s development has much in common with any project management audit. If a government agency has introduced AI in a specific setting, a very good and simple question may be, “Is there a clear goal on the desired achievement?” Further, if external consultants implemented the AI system, “Is there a sustainable structure to maintain the model once the consultants leave?”

To alleviate the need for specialized skills, it is essential the agency have ample documentation of model development and personnel in place who understand the model.

Data Considerations
Data quality is always important, but in AI modeling it is crucial. Simplified, biased data can lead to unintentional flawed results.

An example: if the same data is used to both build the model (during the training phase) and verify performance (during testing or validation), performance metrics will most likely be inflated. This “overfitting” leads to performance loss when used on new, unknown production data.

Another important data consideration relates to privacy and the use of personal data. The European Union instituted the General Data Protection Regulation (GDPR), which maintains data minimization (limiting the amount of personal information used to what is necessary to reach the relevant goal) as a central principle. In an AI setting, this equates to restricting the broad use of personal information when training or testing models. Though countries in other parts of the world will have varying regulations, minimizing the use of personal data to what is strictly essential is a good rule of thumb.

Model Development
Transparent, well-documented model development facilitates reproducibility, which can be easily tested by an auditor with sufficient AI and ML knowledge conducting a documentation review.

Preferably, the documentation will include a well-structured and well-commented codebase (according to the coding language’s standards), extensive records of hardware and software used, and explanations as to how the model will be maintained once put into production.

It is equally important that the selected AI or ML algorithm be well-articulated, particularly if a hard-to-explain model is used. Training and testing the chosen model against other models can be useful for auditors in verifying the model that was chosen.

Fairness and equal treatment remain at the forefront of model development, as algorithmic bias can potentially lead to institutionalized discrimination.

If data used to build a model is slightly biased, a carelessly developed model may amplify such properties. Group-based fairness requires ML models to treat different groups in a similar manner. Equity can be a bit more complex. For example, if the data sourced to train an AI model includes group-level demographic disparities, the model will learn these disparities, which can result in misleading predictions.

Constructing an AI model that is based on biased data can lead to distorted results, which, in turn, become the basis for automated decisions that may generate even greater prejudiced conclusions.

Using AI and ML in the public sector can provide enormous rewards. At the same time, there is real danger that failed deployment can damage democracy and the social fabric by potentially promoting discrimination and unequal treatment on a vast scale.

As AI and ML deployment intensifies, it will be imperative for public auditors to address the challenges posed by this progressively invasive technology.

“Auditing Machine Learning Algorithms: A White Paper for Public Auditors” aims to assist SAIs in learning more about auditing AI and ML algorithms and helping auditors become better equipped to face the challenges.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.