Developing Industrially-Applicable AI-based Solutions for GDPR Compliance
A Conversation with Sallam Abualhaija, Research Scientist at the University of Luxembourg
You may have recently received an e-mail from ViLawPortal stating that your personal information and application documents will be deleted annually in November, forcing you to create a new account every year. This protective measure was implemented to comply with the EU General Data Protection Regulation (GDPR), and ViLawPortal is not the only platform affected. Since 2018, practically any business that processes the personal data of EU citizens, domestic or abroad, has been forced to adapt its policies to comply with the legislation. Given the scope of the regulation and the significant penalties that it entails, GDPR compliance is a universal requirement for businesses across the world. However, the process of drafting and implementation of GDPR-compliant documents is a tedious process that can prove difficult in practice.
In August, I started working on a project with researchers at the University of Luxembourg performing legal annotation on Data Processing Agreements (DPAs). DPAs are legally-binding documents between organizations and third-party data processors that regulate the scope and purpose of the data processing and the obligations for all parties involved. On behalf of the Ottawa Legal Innovation Hub (OLIH), I recently sat down with Sallam to gain an insight into how the project works and her thoughts about AI in the legal sphere. Sallam Abualhaija is a Research scientist at the University of Luxembourg working on the ARTAGO project with Orlando Amaral and Muhammad Ilyas Azeem in collaboration with Mehrdad Sabetzadeh and Lionel Briand from the University of Ottawa. The project develops AI-based industrially-applicable solutions for the computer-assisted assessment of GDPR compliance.
Could you give an overview of the project and the development process?
Linklaters, a global law firm with a large base in Luxembourg, provides the researchers with the legal documents which are first manually analyzed for compliance. These analyzed documents then serve as training examples for the development of the AI-based automation tool.
This AI-based automation differs from basic methods such as keyword search, where the process of scanning for keywords can limit the scope of the technology. Too many keywords will not be generalizable enough, but too few keywords will not produce enough accurate results. Natural language processing technologies (a subfield of this AI) enable the textual content of a legal document to be represented using numbers in a form of mathematical vectors which capture the semantics and syntax of the text. Machine learning (another important sub-field of AI) can then be used to train classifiers over numerous manually-annotated examples. These classifiers can predict unseen instances of GDPR-compliant expressions, even in the absence of identifiable keywords. This means that the algorithm can accurately assess new documents that contain erratic wording or unusual structure at a much faster speed than a human.
What are the constraints of this kind of AI? What are the risks of the development of biases or errors?
The risk of an AI developing biases differs in terms of the subject and would be more relevant in instances of human data. The current project pertains to documents primarily drafted between financial institutions and does not contain as much human information as other privacy agreements.
However, disagreements between human annotators in the annotation process are to be expected. To ensure the reasoning accuracy of the AI-based tool, human annotators usually discuss with each other after the annotation process to resolve such disagreements. For clarity, any fundamental disagreements (if any) can be removed from the dataset and researchers can compute an acceptable level of agreement to develop functional reasoning in the algorithm.
While unable to guarantee 100% accuracy, the technology will be subject to rigorous analysis by researchers. With the only two possible outcomes being “compliant” or “non-compliant”, errors can easily be quantified and the success rate of the technology can be objectively determined. In Sallam’s opinion, the possibility of error is marginal in comparison to the practical benefits of the technology.
How will the technology be applied in practice?
The AI will be used to expedite the repetitive and monotonous review process of legal documents. Unlike a hyper-efficient black box technology that lacks explainability, the entire decision-making process of this tool will be visible to the user. The classification of every line of the document will be justified and will explicitly describe which legal requirement was not satisfied.
However, Sallam is adamant that the AI will only serve to assist legal experts, rather than to replace them. Since humans perform manual compliance-checking work at a much slower rate than an AI-based tool, automated solutions reduce the role of the human in the compliance-checking process to simply reviewing the automatically generated results.
What are your final thoughts on the future of artificial intelligence in the legal sector?
Sallam strongly believes that the use of automated solutions will be ubiquitous in the future of the legal sector. While already in place in many different industries, Sallam envisions a future where AI-based technology is integrated into every laborious task. In 10-20 years, this evolutionary leap will be comparable to the previous generation's lack of knowledge of contemporary technology such as the smartphone. While it will not be necessary for the future lawyer to specialize in computer science, they will have to know how to use technology to succeed.
Finally, Sallam identifies a gap in understanding of artificial intelligence in the legal sphere. While it is capable of many things, it is important to be cognizant of the limitations of such technology. The notion of AI may conjure up ideas of a hyper-intelligent robot that can replicate, and perhaps replace, the work of a human. However, AI-assisted tools are meant to help us, and should never replace the rational knowledge of a human being. While a machine can identify patterns in datasets at a much faster and efficient rate, individuals have the unique ability to make sense of data. AI should never assume important decision-making roles, for example, as judges in courtrooms. When an AI encounters something beyond the constraints of its datasets, it lacks the cognitive ability to make sense of the new situation. Rational thought still prevails over artificial intelligence- this is why we still have AI analyzing legislation drafted by humans and not the other way around.
The Ottawa Legal Innovation Hub (OLIH) is a student-run organization within the Faculty of Law at the University of Ottawa. At OLIH, we believe the legal profession should evolve to reflect the realities of the 21st century meet the needs of the future. OLIH is committed to transforming how the law is practiced by inspiring the next generation of lawyers to think outside the box.