Predictive Coding: Dozens of Names, No Definition, Lots of Controversy

Litigation research: Market challenges, technology, and the advantage of GenAI Editor With case complexities and data volumes on the rise, litigation teams must find new ways to manage, (...)
“Agentic AI” is on the cards Editor It's been two years since ChatGPT enamoured the world. When ChatGPT first launched, everybody was am(...)

Newswire

24 March 2025
Relativity Brings Together APAC Legal Community for Inaugural Relativity Fest Sydney

EzeScan to Showcase AI-Powered Capture & Workflow Automation for iManage Users in Upcoming Webinar
19 March 2025
Lexpo’25: Building the Foundations for Legal Innovation and Tech Adoption
18 March 2025
Exterro & Integreon Partner to Set the Gold Standard in Managed eDiscovery & Document Review

Everlaw Launches GenAI for Instant Terabyte-Scale eDiscovery Search

Relativity Announces its Fourth Annual AI Visionaries List

Veritext Legal Solutions Introduces AI-Generated Smart Summaries for Arbitrations, Hearings and Court Trials
14 March 2025
ProSearch Transforms Its Comprehensive Suite of Discovery Solutions with Significant Updates to Key Offerings
13 March 2025
Clio Accelerates Global Expansion with Strategic Acquisition of ShareDo, Enters Enterprise Legal Market
26 February 2025
ASML’s Douwe Groenevelt to Deliver Keynote at Lexpo’25

PRO Partners

Opus2

Elite

TravelingCoaches

LTC4

Apperio

Bundledocs

Relativity

Ascertus

BigHand

TigerEye

TIQTime

Peppermint

RBROSolutions

iTrainLegal

Katchr

PracticeEvolve

iComplibyLegalRM

iManage

NetDocuments

Advanced

: 13
Jun
2012; Sharon D. Nelson, Esq. and John W. Simek

LTO logo Introduction by Christy Burke, Host of Legal Technology Observer

Predictive coding is one of those legal technology terms that seems to mean everything and nothing all at once - and it brings up the term “algorithm,” which is enough to send math phobics running for the proverbial hills. Here to make sense of predictive coding for LTO readers are eDiscovery experts Sharon Nelson and John Simek. In this guest post, Sharon and John explain the challenges of defining predictive coding and point to the recent incident with Judge Peck and the Da Silva Moore case which shows that today’s courts are grappling with this topic, too.

Is there general agreement about what predictive coding is? No.

Is there general agreement about what to call it? No.

Is it the biggest and most talked about development in e-discovery today? Yes.

E-discovery expert Craig Ball has defined it as the use of more sophisticated algorithms – math – and advanced analytics to replace or supplement the individualized judgment of lawyers respecting the responsiveness, non-responsiveness and privilege of documents and data sets. We would add the critical human element - humans help machines to understand what documents are relevant to a case. With enough iterations of the sampling process, the computers can learn enough so they are fairly reliable in being able to judge the responsiveness of a document.

It is a “wash, rinse, repeat” process until the machines get it right.

We’ve seen the technology called a wide variety of names. As time has gone on, it seems to us that most folks are slowly settling on technology-assisted review, with a substantial minority using predictive coding as their term of choice.

While we tried to give a definition of this new technology above, vendors have their own definitions, so buyer beware. Some vendors are just using automated review or perhaps just predicting the percentage of relevant documents without identifying which ones are relevant. In true predictive coding, the machines will tell you what documents you should be looking at next, identifying which ones are the most relevant and important.

Craig is reminded of the ECA (early case assessment) hoopla and says that predictive coding is being oversold and overheated by marketers selling something they can dress up to look like something more than a keyword search. As he notes, many of these technologies don’t assess the meaning of a document like a human would. Some only look at the frequency and geometric juxtaposition of words in a way that might be described as “I don’t know what it says, but it uses the same sorts of words with about the same incidence and arrangement, so it’s likely to be saying much the same thing.”

Senior lawyers are critical to the process of teaching the machines, but it certainly means less human hours overall, something which contract attorneys who do document review have noted glumly.

How much money can it save? According to a survey by the Electronic Discovery Institute, it can save an average of 45% with some respondents reporting savings of up to 70%. How much does it cost? Good luck questioning vendors about specific costs. Everyone seems to agree that it is an appropriate solution for large volume cases, which means we know it is expensive. There is a great disagreement about whether it is appropriate for smaller cases – our own suspicion is that it is not appropriate for garden-variety cases – but once again, it depends on the definition being used by vendors.

Recently, the e-discovery world was rocked by an order issued by Magistrate Judge Andrew J. Peck in the Da Silva Moore et al v. Publicis Groupe & MSL Groupe employment discrimination case in the Southern District of N.Y. Judge Peck has been a vocal advocate of predictive coding. One line in the opinion certainly caught everyone’s eye: “Computer-assisted review now can be considered judicially-approved for use in appropriate cases.”

As we write this, there is a still a motion for Judge Peck’s recusal outstanding, based on a perceived appearance of impropriety as documented by the plaintiff. Further information about that and the controversial role of the certification body ACEDS in investigating the judge is documented in author Nelson’s blog Ride the Lightning (.http://ridethelightning.senseient.com/) Just search on “Da Silva Moore” to get the related posts.

If you want to stay fully informed on the case, a comprehensive assembly of all documents in the case has been prepared by our friend Rob Robinson and is available here: http://www.complexdiscovery.com/info/2012/03/02/peck-parties-and-predictive-coding/.

News stories about the case may be found here: http://www.orangelt.us/info/2012/04/30/technology-assisted-review-backgrounder/.

Vendors offering technology assisted review may be found here: http://www.complexdiscovery.com/info/2012/04/01/20-predictive-coding-technology-providers/.

No doubt this technology is here to stay, but questions about whether some variants of this technology can survive a Daubert challenge remain – and it appears that some vendor claims may have been over-hyped and without scientific validation.

The dust will ultimately settle and the landscape will become more clear – but not quite yet.

By Sharon D. Nelson, Esq. and John W. Simek
© 2012 Sensei Enterprises, Inc.

The authors are the President and Vice President of Sensei Enterprises, Inc., a legal technology, information security and digital forensics firm based in Fairfax, VA. 703-359-0700 (phone) www.senseient.com

Media Partnerships

We offer organizers of legal IT seminars, events and conferences a unique marketing and promotion opportunity. Legal IT Professionals has been selected official media partner for many events.

development by motivus.pt