Successful Predictive Coding Adoption is Dependent on Effective Information Governance


Predictive coding has been receiving a great deal of press lately (for good reason), especially with the ongoing case; Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012). On May 21, the plaintiffs filed Rule 72(a) objections to Magistrate Judge Peck’s May 7, 2012 discovery rulings related to the relevance of certain documents that comprise the seed set of the parties’ ESI protocol.

This Rule 72(a) objection highlights an important point in the adoption of predictive coding technologies; the technology is only as good as the people AND processes supporting it.

To review, predictive coding is a process where a computer (with the requisite software), does the vast majority of the work of deciding whether data is relevant, responsive or privileged to a given case.

Beyond simply searching for keyword matching (byte for byte), predictive coding adopts a computer self-learning approach. To accomplish this, attorneys and other legal professionals provide example responsive documents/data in a statistically sufficient quantity which in turn “trains”the computer as to what relevant documents/content should be flagged and set aside for discovery. This is done in an iterative process where legally trained professionals fine-tune the seed set over a period of time to a point where the seed set represents a statistically relevant sample which includes examples of all possible relevant content as well as formats. This capability can also be used to find and secure privileged documents. Instead of legally trained people reading every document to determine if a document is relevant to a case, the computer can perform a first pass of this task in a fraction of the time with much more repeatable results. This technology is exciting in that it can dramatically reduce the cost of the discovery/review process by as much as 80% according to the RAND Institute of Civil Justice.

By now you may be asking yourself what this has to do with Information Governance?…

For predictive coding to become fully adopted across the legal spectrum, all sides have to agree 1. the technology works as advertised, and 2. the legal professionals are providing the system with the proper seed sets for it to learn from. To accomplish the second point above, the seed set must include content from all possible sources of information. If the seed set trainers don’t have access to all potentially responsive content to draw from, then the seed set is in question.

Knowing where all the information resides and having the ability to retrieve it quickly is imperative to an effective discovery process. Records/Information Management professionals should view this new technology as an opportunity to become an even more essential partner to the legal department and entire organization by not just focusing on “records” but on information across the entire enterprise. With full fledged information management programs in place, the legal department will be able to fully embrace this technology to drastically reduce their cost of discovery.

Automatic Deletion…A Good Idea?


In my last blog, I discussed the concept of Defensible Disposal; getting rid of data which has no value to lower the cost and risk of eDiscovery as well as overall storage costs (IBM has been a leader in Defensive Disposal for several years). Custodians keep data because they might need to reuse some of the content later or they might have to produce it later for CYA reasons. I have been guilty of over the years and because of that I have a huge amount of old data on external disks that I will probably never, ever look at again. For example, I have over 500 GB of saved data, spreadsheets, presentations, PDFs, .wav files, MP3s, Word docs, URLs etc. that I have saved for whatever reason over the years. Have I ever really, reused any of the data…maybe a couple of times, but in reality they just site there. This brings up the subject of the Data Lifecycle. Fred Moore, Founder of Horison Information Strategies wrote about this concept years ago, referring to the Lifecycle of Data and the probability that the saved data will ever be re-used or even looked at again. Fred created a graphic showing this lifecycle of data.

Figure 1: The Lifecycle of data – Horison Information Systems

The above chart shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 90 days, its probability of reuse approaches 1% and after 1 year is well under 1%.

You’re probably asking yourself, so what!…storage is cheap, what’s the big deal? Storage is cheap. I have 500 GB of storage available to me on my new company supplied laptop. I have share drives available to me. And I have 1 TB of storage in my home office. I can buy 1TB of external disk for approximately $100, so why not keep everything forever?

For organizations, it’s a question of storage but more importantly, it’s a question of legal risk and the cost of eDiscovery. Any existing data could be a subject of litigation and therefore reviewable. You may recall in my last blog, I mentioned a recent report from the RAND Institute for Civil Justice which discussed the costs of eDiscovery including the estimate that the cost of reviewing records/files is approximately 73% of every eDiscovery dollar spent. By saving everything because you might someday need to reuse or reference it drive the cost of eDiscovery way up.

The key question to ask is; how do you get employees to delete stuff instead of keeping everything? In most organizations the culture has always been one of “save whatever you want until your hard disk and share drive is full”. This culture is extremely difficult to change…quickly. One way is to force new behavior with technology. I know of a couple of companies which only allow files to be saved to a specific folder on the users desktop. For higher level laptop users, as the user syncs to the organization’s infrastructure, all files saved to the specific folder are copied to a users sharedrive where an information management application applies retention policies to the data on the sharedrive as well as the laptop’s data folder.

In my opinion this extreme process would not work in most organizations due to culture expectations. So again we’re left with the question of how do you get employees to delete stuff?

Organizational cultures about data handling and retention have to be changed over time. This includes specific guidance during new employee orientation, employee training, and slow technology changes. An example could be reducing the amount of storage available to an employee on the share or home drive.

Another example could be some process changes to an employee’s workstation of laptop. Force the default storage target to be the “My Documents” folder. Phase 1 could be you have to save all files to the “My Documents” folder but can then be moved anywhere after that.

Phase 2 could include a 90 day time limit on the “My Documents” folder so that anything older than 90 days is automatically deleted (with litigation hold safeguards in place). This would cause files not deemed to be important enough to moved to be of little value and “disposable”. The 3rd Phase could include the inability to move files out of the “My Documents” folder (but with the ability for users to create subfolders with no time limit) thereby ensuring a single place of discoverable data.

Again, this strategy needs to be a slow progression to minimalize the perceived changes to the user population.

The point is it’s a end user problem, not necessarily an IT problem. End users have to be trained, gently pushed, and eventually forced to get rid of useless data…

Defensible Disposal and Predictive Coding Reduces (?) eDiscovery by 65%


Following Judge Peck’s decision on predictive coding in February of 2012, yet another Judge has gone in the same direction. In Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center, et al (April 23, 2012), Judge Chamblin, a state judge in the 20th Judicial Circuit of Virginia’s Loudoun Circuit Court, wrote:

“Having heard argument with regard to the Motion of Landow Aviation Limited Partnership, Landow Aviation I, Inc., and Landow & Company Builders, Inc., pursuant to Virginia Rules of Supreme Court 4:1 (b) and (c) and 4:15, it is hereby ordered Defendants shall be allowed to proceed with the use of predictive coding for the purposes of the processing and production of electronically stored information.”

This decision was despite plaintiff’s objections the technology is not as effective as purely human review.

This decision comes on top of a new RAND Institute for Civil Justice report which highlights a couple of important points. First, the report estimated that $0.73 of every dollar spent on eDiscovery can be attributed to the “Review” task.RAND also called out a study showing an 80% time savings in Attorney review hours when predictive coding was utilized.

This suggests that the use of predictive coding could, optimistically, reduce an organization’s eDiscovery costs by 58.4%.

The barriers to the adoption of predictive coding technology are (still):

  • Outside counsel may be slow to adopt this due to the possibility of loosing a large revenue stream
  • Outside and Internal counsel will be hesitant to rely on new technology without a track record of success
  • Additional guidance from Judges

These barriers will be overcome relatively quickly.

Let’s take this cost saving projection further. In my last blog I talked about “Defensible Disposal” or in other words, getting rid of old data not needed by the business. It is estimated the cost of review can be reduced by 50% by simply utilizing an effective Information Governance program. Utilizing the Defensible Disposal strategy brings the $0.73 of every eDiscovery review dollar down to $0.365.

Now, if predictive coding can reduce the remaining 50% of the cost of eDiscovery review by 80% as was suggested in the RAND report, between the two strategies, a total eDiscovery savings of approximately 65.7% could be achieved. To review, lets look at the math.

Starting with $0.73 of every eDiscovery dollar is attributed to the review process

Calculating a 50% saving due to Defensible Disposal brings the cost of review down to $0.365.

Calculating the additional 80% review savings using predictive coding we get:

$0.365 * 0.2 (1-.8) = $0.073 (total cost of review after savings from both strategies)

To finish the calculations we need to add back in the cost not related to review (processing and collection) which is $0.27

Total cost of eDiscovery = $0.073 + $0.27 = $0.343 or a savings of: $1.0 – $0.343 = 0.657 or 65.7%.

 As with any estimates…your mileage may vary, but this exercise points out the potential cost savings utilizing just two strategies, Defensible Disposal and Predictive Coding.

Information Management Cost Reduction Strategies for Litigation


In these still questionable economic times, most legal departments are still looking for ways to reduce, or at least stop the growth, of their legal budgets. One of the most obvious targets for cost reduction in any legal department is the cost of responding to eDiscovery including the cost of finding all potentially responsive ESI, culling it down and then having in-house or external attorneys review it for relevance and privilege. Per a CGOC survey, the average GC spends approximately $3 million per discovery to gather and prepare information for opposing counsel in litigation.

Most organizations are looking for ways to reduce these growing costs of eDiscovery. The top four cost reduction strategies legal departments are considering are:

  • Bring more evidence analysis and do more ESI processing internally
  • Keep more of the review of ESI in house rather that utilize outside law firms
  • Look at off-shore review
  • Pressure external law firms for lower rates

I don’t believe these strategies address the real problem, the huge and growing amount of ESI.

Several eDiscovery experts have told me that the average eDiscovery matter can include between 2 and 3 GB of potentially responsive ESI per employee. Now, to put that in context, 1 GB of data can contain between 10,000 and 75,000 pages of content. Multiply that by 3 and you are potentially looking at between 30,000 and 225,000 pages of content that should be reviewed for relevancy and privilege per employee. Now consider that litigation and eDiscovery usually includes more than one employee…ranging from two to hundreds.

It seems to me the most straight forward and common sense way to reduce eDiscovery costs is to better manage the information that could be pulled into an eDiscovery matter, proactively.

To illustrate this proactive information management strategy for eDiscovery, we can look at the overused but still appropriate DuPont case study from several years ago.

DuPont re-looked at nine cases. They determined that they had reviewed a total of 75,450,000 pages of content in those nine cases. A total of 11,040,000 turned out to be responsive to the cases. DuPont also looked at the status of these 75 million pages of content to determine their status in their records management process. They found that approximately 50% of those 75 million pages of content were beyond their documented retention period and should have been destroyed and never reviewed for any of the 9 cases. They also calculated they spent $11, 961,000 reviewing this content. In other words, they spent &11.9 million reviewing documents that should not have existed if their records retention schedule and policy had been followed.

An information management program, besides capturing and making ESI available for use, includes the defensible deletion of ESI that has reached the end of its retention period and therefore is valueless to the organization.

Corporate counsel should be the biggest proponents of information governance in their organizations simply due to the fact that it affects their budgets directly.

Information Governance and Predictive Coding


Predictive coding, also known as computer assisted coding and technology assisted review, all refer to the act of using computers and software applications which use machine learning algorithms to enable a computer to learn from records presented it (usually from human attorneys) as to what types of content are potentially relevant to a given legal matter. After a sufficient number of examples are provided by the attorneys, the technology is given access to the entire potential corpus (records/data) to sort through and find records that, based on its “learning”, are potentially relevant to the case.

This automation can dramatically reduce costs due to the fact that computers, instead of attorneys conduct the first pass culling of potentially millions of records.

Predictive coding has several very predictable dependencies that need to be addressed to be accepted as a useful and dependable tool in the eDiscovery process. First, which documents/records are used and who chooses them to “train the system”? This training selection will almost always be conducted by attorneys involved with the case.

The second dependency revolves around the number of documents used for the training. How many training documents are needed to provide the needed sample size to enable a dependable process?

And most importantly, do the parties have access to all potentially relevant documents in the case to draw the training documents from? Remember, potentially relevant documents can be stored anywhere. For predictive coding, or any other eDiscovery process to be legally defensible, all existing case related documents need to be available. This requirement highlights the need for effective information management by all in a given organization.

As the courts adopt, or at least experiment with predictive coding, as Judge Peck did in Monique Da Silva Moore, et al., v. Publicis Groupe & MSL Group, Civ. No. 11-1279 (ALC)(AJP) (S.D.N.Y. February 24, 2012, an effective information management program will become key to he courts adopting this new technology.

The Facebook Timeline Button That Hides Your Past From Strangers’ Prying Eyes


Over the next few weeks, Facebook Timeline becomes mandatory for everyone. Yes, that means you who been holding onto your old profile while standing very still in a dark digital corner hoping that Facebook might not notice you among its 800 million users. This will require a little privacy housekeeping thanks to Facebook’s surfacing old posts and photos you had forgotten about. Whether you’ve already switched to Timeline or will be ushered in against your will soon, there’s one very important button for those who want to optimize their privacy after the switch: The entire Forbes article can be viewed here

Your organization’s social media problem can’t be cured with antibiotics


You can’t control what employees do away from work on their own time and using their own equipment but companies do have a right to control their brand and that includes how they are represented by their employees on social media sites. For that reason, every organization should develop, implement and enforce a corporate-wide social media policy for all employees (because if you don’t enforce it, then do you really have a policy?).

Gary MacFadden was kind enough to pose a great question in response to my last blog posting titled “Did you hear the one about the Attorney who thought social media was a dating website for singles over 40?”. Gary pointed out that it would be helpful if I could give examples of a corporate social media policy (what it involved) and what the employee education process would be to make employees aware of the policy. With that in mind, here are some aspects of a corporate social media policy:

  1. A policy author with contact information in case employees have questions
  2. An effective date
  3. A definition of what social media is
  4. A description as to why this policy is being developed (for legal defense, brand protection etc)
  5. A description of  what social media sites the company officially participates in
  6. A listing of those employees approved to participate on those sites
    1. The fact that any and all approved social media participations will be done only from corporate infrastructure (this is to protect approved employees from discovery of their personal computers)
    2. A description of topics approved to be used
    3. A description of those topics not approved to be used
    4. A description of any approval authority process
    5. A description of what will happen to the employee if they don’t follow the approved process
  7. A direct statement that unapproved employees that make derogatory remarks about the organization, publish identifying information about clients, employees, or organization financials, talk about organization business or strategy etc. in any social media venue will be punished in the following manner…
  8. A description of how these policies will be audited and enforced

Once the policy is developed, it needs to be communicated to all employees and updated by legal representative on an annual basis. This education process could include steps like:

  1. A regularly updated company intranet site explaining the policy.
  2. A description and discussion of the policy in new employee orientation activities.
  3. A printed description of the policy which the employee signs and returns to the organization.
  4. An annual revisiting of the policy in department meetings.
  5. The publishing of an organization “hot line” to your corporate legal department for real-time questions.

On a related topic, for legal reasons you should be archiving all approved social media participations much like many companies now archive their email and instant message content.

This practice will seem rather draconian to many employees but in reality the organization needs to protect the brand and always have a proactive strategy for potential litigation.

A sampling of various organizations social media policies can be found here. I was particularly impressed withDell’s.