Introduction by Christy Burke, Host of Legal Technology Observer
Predictive coding is one of those legal technology terms that seems to mean everything and nothing all at once - and it brings up the term “algorithm,” which is enough to send math phobics running for the proverbial hills. Here to make sense of predictive coding for LTO readers are eDiscovery experts Sharon Nelson and John Simek. In this guest post, Sharon and John explain the challenges of defining predictive coding and point to the recent incident with Judge Peck and the Da Silva Moore case which shows that today’s courts are grappling with this topic, too.
Is there general agreement about what predictive coding is? No.
Is there general agreement about what to call it? No.
Is it the biggest and most talked about development in e-discovery today? Yes.
E-discovery expert Craig Ball has defined it as the use of more sophisticated algorithms – math – and advanced analytics to replace or supplement the individualized judgment of lawyers respecting the responsiveness, non-responsiveness and privilege of documents and data sets. We would add the critical human element - humans help machines to understand what documents are relevant to a case. With enough iterations of the sampling process, the computers can learn enough so they are fairly reliable in being able to judge the responsiveness of a document.
It is a “wash, rinse, repeat” process until the machines get it right.
We’ve seen the technology called a wide variety of names. As time has gone on, it seems to us that most folks are slowly settling on technology-assisted review, with a substantial minority using predictive coding as their term of choice.
While we tried to give a definition of this new technology above, vendors have their own definitions, so buyer beware. Some vendors are just using automated review or perhaps just predicting the percentage of relevant documents without identifying which ones are relevant. In true predictive coding, the machines will tell you what documents you should be looking at next, identifying which ones are the most relevant and important.
Craig is reminded of the ECA (early case assessment) hoopla and says that predictive coding is being oversold and overheated by marketers selling something they can dress up to look like something more than a keyword search. As he notes, many of these technologies don’t assess the meaning of a document like a human would. Some only look at the frequency and geometric juxtaposition of words in a way that might be described as “I don’t know what it says, but it uses the same sorts of words with about the same incidence and arrangement, so it’s likely to be saying much the same thing.”
Senior lawyers are critical to the process of teaching the machines, but it certainly means less human hours overall, something which contract attorneys who do document review have noted glumly.
How much money can it save? According to a survey by the Electronic Discovery Institute, it can save an average of 45% with some respondents reporting savings of up to 70%. How much does it cost? Good luck questioning vendors about specific costs. Everyone seems to agree that it is an appropriate solution for large volume cases, which means we know it is expensive. There is a great disagreement about whether it is appropriate for smaller cases – our own suspicion is that it is not appropriate for garden-variety cases – but once again, it depends on the definition being used by vendors.
Recently, the e-discovery world was rocked by an order issued by Magistrate Judge Andrew J. Peck in the Da Silva Moore et al v. Publicis Groupe & MSL Groupe employment discrimination case in the Southern District of N.Y. Judge Peck has been a vocal advocate of predictive coding. One line in the opinion certainly caught everyone’s eye: “Computer-assisted review now can be considered judicially-approved for use in appropriate cases.”
As we write this, there is a still a motion for Judge Peck’s recusal outstanding, based on a perceived appearance of impropriety as documented by the plaintiff. Further information about that and the controversial role of the certification body ACEDS in investigating the judge is documented in author Nelson’s blog Ride the Lightning (.http://ridethelightning.senseient.com/) Just search on “Da Silva Moore” to get the related posts.
If you want to stay fully informed on the case, a comprehensive assembly of all documents in the case has been prepared by our friend Rob Robinson and is available here: http://www.complexdiscovery.com/info/2012/03/02/peck-parties-and-predictive-coding/.
News stories about the case may be found here: http://www.orangelt.us/info/2012/04/30/technology-assisted-review-backgrounder/.
Vendors offering technology assisted review may be found here: http://www.complexdiscovery.com/info/2012/04/01/20-predictive-coding-technology-providers/.
No doubt this technology is here to stay, but questions about whether some variants of this technology can survive a Daubert challenge remain – and it appears that some vendor claims may have been over-hyped and without scientific validation.
The dust will ultimately settle and the landscape will become more clear – but not quite yet.
Copyright © 2023 Legal IT Professionals. All Rights Reserved.