4 August 2020
Salsa Digital

Machine learning to anonymise court records

In many countries around the world, court documents must be anonymised and then made publicly available. This makes the justice system more transparent and helps contribute to a more open government (justice is one of the policy areas in the Open Government Partnership). However, anonymising court documents takes a lot of time and personnel. This is where machine learning comes in.

Austria’s machine learning approach

Austria decided to address the anonymisation process of court documents with machine learning. In an apolitical article, Martin Hackl, Chief Digital Officer of the Austrian Federal Ministry of Justice, explained the process. He identified three key steps:

  1. Collecting existing data and analysing the data’s quality
  2. Choosing a machine learning model
  3. Evaluating and iterating

The data

In Austria’s case, the data came from 66,000 Supreme Court verdicts. For these verdicts, they had the original versions and the anonymised verdicts, which had been manually redacted using black texta. They also had a database of digital verdicts from other courts, and a database of information about the parties involved in the proceedings.

These three databases were used to create and train the algorithm.

The machine learning model

The next step Hackl highlights is choosing a machine learning model. In Austria’s case, they started with the open source natural language processing (NLP) libraries and built on this open source approach. The final solution uses:

  • Out-of-the box NLP
  • Customised NLP
  • Machine learning algorithms
  • Fuzzy search

Testing and improving

Austria has now started refining the machine-learning process. Firstly, they used short sprints to anonymise sample court decisions using the system, which were then manually checked. This then fed into a round of improvements. Currently, the system is anonymising too much, and they’re working on the next round of improvements before going live with the system. They’re also looking at classifying complexity into levels — green, yellow and red — where red represents the court proceedings most likely to need a human review.

Salsa Digital’s take

Machine learning in government has the potential to significantly impact both GovTech and CivicTech. In this Austrian example, it’s being used to help drive transparency and a more open government. These are all areas of great importance in the Australian setting, and areas that we’re passionate about at Salsa. As advocates for open source in government, it was also very encouraging to find out that Austria based its solution on open source NLP. This case study shows how the convergence of emerging tech (AI and machine learning) and open source can contribute to an open government, create value for citizens and help free-up public sector resources to drive value. As far as we know this particular use-case for machine learning is not being investigated or used in Australia yet. However, it would be a good area to target in the future.

Subscribe to DTIG

Subscribe to our Digital Transformation in Government series to keep up with how technology is transforming government.