One of the most promising ways to lower the cost of review in large e-discovery projects is the use of advanced search technologies, primarily active machine learning, known as Predictive Coding, TAR (Technology Assisted Review) or CAR (Computer Assisted Review), to cull down and code documents. Using this kind of artificial intelligence to augment legal review is catching on quickly. This chart illustrates the work-flow I have developed to use predictive coding in our firm’s large review projects.

Done properly, predictive coding can not only realize dramatic savings, but can find more relevant documents in a large collection than any other method. See e.g. Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L. & TECH. 11 (2011).

A recent Rand Corporation report estimated savings of 70% in the cost of review were possible using such advanced analytics. RAND Corporation, Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery (2012). We have found similar savings in our projects.

The use of machine learning, a type of artificial intelligence, to train a computer to identify relevant documents in a large collection has been accepted by many courts. This is not new technology. Most of us have seen it in action without realizing it. For instance, it is the basis for such things as spam filters, Amazon book recommendations, Pandora song selections, photo recognition, and the like. We train the computer on what we like or not, or what we consider spam or not. Most software will have smart features like that; legal review software already does.

My study of different predictive coding review methods, quality control procedures, and legal review software led me from the white papers of e-discovery vendors to the sometimes arcane, but important, publications of scientists and academics with an interest in the field.

There is a growing body of good literature on predictive coding and how it works. For example, a good explanation of the so called black-box magic behind predictive coding software is written by Jason R. Baron and Jessie B. Freeman, Cooperation, Transparency, and the Rise of Support Vector Machines in E-Discovery: Issues Raised by the Need to Classify Documents as Either Responsive or Nonresponsive.

Jason Baron, Director of Litigation at the U.S. National Archives and Records Administration, is well-established as the government’s leading attorney expert on advanced search. He is a co-founder of TREC Legal Track, a government-sponsored research program that has led many information scientists to do objective scientific research on the application of machine learning in e-discovery. Jason and I have done CLE presentations together and he got me hooked on the scientific aspects. For my own recent contribution to the science of search, see: Losey, R., A Modest Contribution to the Science of Search: Report and Analysis of Inconsistent Classifications in Two Predictive Coding Reviews of 699,082 Enron Documents (2013).

Scientific research on legal search is still in its early stages, but we have already learned many valuable lessons. We have a good idea which methods and software work.