Explainable Text Classification in Legal Document Review: A Case Study of Explainable Predictive Coding

Predictive Coding

 

Information management has become a significant business challenge, with the global volume of electronically stored information (ESI) growing at a rapid pace (doubling roughly three times since 2010).Companies regularly spend millions of dollars producing responsive electronically stored documents for litigation matters and the review process to find these responsive documents generates the bulk of e-discovery costs.Given the exponential growth of ESI falling within the scope of legal cases, the traditional manual review approach is neither economically feasible nor comprehensive enough to meet courts’ or regulator’s requirements. To confront these challenges, legal practitioners are increasingly embracing the text classification approach, called in the legal community, to cull through massive volumes of documents for relevant information.

 

This group of collaborators are both researchers and practitioners in applying machine learning and artificial intelligence in the legal industry and we have extensively studied applications of text classification to legal matters.We have published our research across a wide variety of academic conferences and legal journals.Some examples of our research include:

Our most recent paper, Explainable Text Classification in Legal Document Review: A Case Study of Explainable Predictive Coding, examines how Explainable AI can help overcome the legal industry’s perception that text classification models are “black boxes”. 

 

Explainable Predictive Coding

 

In this paper, we propose the novel concept of explainable predictive coding and empirically demonstrate simple but effective approaches to locate responsive text snippets within a responsive document that help explain why the document was classified as responsive by the model.A responsive text snippet is also known as the “rationale” or the “explanation” behind the classification decision.We apply text classification models to identify the responsive text snippets within responsive documents striving for “explainability”.In our experiments, we tested both a Document Model, which was trained using the text of entire training documents, and a Rationale Model, which was trained using only responsive text snippets and non-responsive text snippets from documents.

 

Our experiments were conducted on a large data set from a real legal document review project. In this data set, for each of the responsive documents, we have text snippets annotated by review attorneys to indicate the justification of the responsiveness decision. These annotated text snippets are the ground truth for evaluating our proposed explainable predictive coding methods.We evaluated the impact of both the candidate snippet size (i.e., number of words in a snippet) and the number of top scoring snippets on the performance of the Rationale and Document models. The Rationale Models successfully identified responsive snippets for close to 50% of the responsive documents with the first top scoring text snippet for all sizes of snippets. The recall rate was even higher when examining the top three to five scoring text snippets. Additionally, Rationale Models performed better than Document Models for snippets with 50 words and performed worse than Document Models for snippets of longer sizes – we attribute this observation to text training size of the Document Models.The Document Models were trained using the entire text of the training document, including words both within the annotated text snippet and the rest of the document, thus they tolerant more noise.

 

Our results demonstrate that it is possible to build text classification models to identify responsive text snippets (rationales) automatically, with or without the use of annotated text snippets for training. In practical terms, this means that legal teams can evaluate responsive text snippets generated by a Rationale Model or Document Model to substantially reduce the number of words an attorney must review to evaluate the responsiveness of a document.And, more importantly, so that an attorney can understand the model’s results.This has practical potential to significantly advance the application of text classification in legal document reviews.

 

Lastly, again this year, we have organized The Third Annual Workshop on Applications of Artificial Intelligence in the Legal Industry at the 2019 IEEE International Conference on Big Data.  We invite anyone interested in our work and in applications of AI to the legal industry to submit research and / or participate in the workshop with us.

 

Rishi Chhatwal, AT&T Services, Inc., Washington DC

Peter Gronvall, Ankura Consulting Group, LLC, Washington DC

Nathaniel Huber-Fliflet, Ankura Consulting Group, LLC, Washington DC

Robert Keeling, Sidley Austin LLP, Washington DC

Jianping Zhang, Ankura Consulting Group, LLC, Washington DC

Haozhen Zhao, Ankura Consulting Group, LLC, Washington DC

Share on Facebook
Share on Twitter
Please reload

Featured Posts

Background and research questions:

Advances in technology have transformed the way individuals access credit and make purchases. Only a decade ago in C...

Financial Literacy and Self-Control in FinTech Borrowing

19 Nov 2019

1/5
Please reload

Search Archive
Please reload

The Chinese University of Hong Kong

Copyright © 2018 All Rights Reserved. Faculty of Law, The Chinese University of Hong Kong