Predictive coding has been receiving a great deal of press lately (for good reason), especially with the ongoing case; Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012). On May 21, the plaintiffs filed Rule 72(a) objections to Magistrate Judge Peck’s May 7, 2012 discovery rulings related to the relevance of certain documents that comprise the seed set of the parties’ ESI protocol.
This Rule 72(a) objection highlights an important point in the adoption of predictive coding technologies; the technology is only as good as the people AND processes supporting it.
To review, predictive coding is a process where a computer (with the requisite software), does the vast majority of the work of deciding whether data is relevant, responsive or privileged to a given case.
Beyond simply searching for keyword matching (byte for byte), predictive coding adopts a computer self-learning approach. To accomplish this, attorneys and other legal professionals provide example responsive documents/data in a statistically sufficient quantity which in turn “trains”the computer as to what relevant documents/content should be flagged and set aside for discovery. This is done in an iterative process where legally trained professionals fine-tune the seed set over a period of time to a point where the seed set represents a statistically relevant sample which includes examples of all possible relevant content as well as formats. This capability can also be used to find and secure privileged documents. Instead of legally trained people reading every document to determine if a document is relevant to a case, the computer can perform a first pass of this task in a fraction of the time with much more repeatable results. This technology is exciting in that it can dramatically reduce the cost of the discovery/review process by as much as 80% according to the RAND Institute of Civil Justice.
By now you may be asking yourself what this has to do with Information Governance?…
For predictive coding to become fully adopted across the legal spectrum, all sides have to agree 1. the technology works as advertised, and 2. the legal professionals are providing the system with the proper seed sets for it to learn from. To accomplish the second point above, the seed set must include content from all possible sources of information. If the seed set trainers don’t have access to all potentially responsive content to draw from, then the seed set is in question.
Knowing where all the information resides and having the ability to retrieve it quickly is imperative to an effective discovery process. Records/Information Management professionals should view this new technology as an opportunity to become an even more essential partner to the legal department and entire organization by not just focusing on “records” but on information across the entire enterprise. With full fledged information management programs in place, the legal department will be able to fully embrace this technology to drastically reduce their cost of discovery.
2 thoughts on “Successful Predictive Coding Adoption is Dependent on Effective Information Governance”
I would add that the technology and statistical methodology itself is not the issue. The courts are most keen on how the organization documented the entire process or put another way how the humans made their decisions along the way.
[…] May 31, 2012 Bill Tolson Leave a comment Go to comments In my last blog entry titled “Successful Predictive Coding Adoption is Dependent on Effective Information Governance”, a question was posted which I thought deserved a wider sharing with the group; “What is the […]