Coming to Terms with Defensible Disposal; Part 1


Last week at LegalTech New York 2013 I had the opportunity to moderate a panel titled: “Defensible Disposal: If it doesn’t exist, I don’t have to review it…right?” with an impressive roster of panelists. They included: Bennett Borden, Partner, Chair eDiscovery & Information Governance Section, Williams Mullen, Clifton C. Dutton, Senior Vice President, Director of Strategy and eDiscovery, American International Group and John Rosenthal, Chair, eDiscovery and Information Management Practice, Winston & Strawn and Dean Gonsowski, Associate General Counsel, Recommind Inc.

During the panel session it was agreed that organizations have been over-retaining ESI (which accounts for at least 95% of all data in organizations) even if it’s no longer needed for business or legal reasons. Other factors driving this over-retention of ESI were the fear of inadvertently deleting evidence, otherwise called spoliation. In fact an ESG survey published in December of 2012 showed that the “fear of the inability to furnish data requested as part of a legal or regulatory matter” was the highest ranked reason organizations chose not to dispose of ESI.

Other reasons cited included not having defined policies for managing and disposing of electronic information and adversely, organizations having defined retention policies to actually keep all data indefinitely (usually because of the fear of spoliation).

One of the principal information governance gaps most organizations haven’t yet addressed is the difference between “records” and “information”. Many organizations have “records” retention/disposition policies to manage those official company records required to be retained under regulatory or legal requirements. But those documents and files that fall under legal hold and regulatory requirements amount to approximately 6% of an organization’s retained electronic data (1% legal hold and 5% regulatory).

Another interesting survey published by Kahn Consulting in 2012 showed levels of employee understanding of their information governance-related responsibilities. In this survey only 21% of respondents had a good idea of what information needed to be retained/deleted and only 19% knew how  information should be retained or disposed of. In that same survey, only 15% of respondents had a general idea of their legal hold and eDiscovery responsibilities.

The above surveys highlight the fact that organizations aren’t disposing of information in a systematic process mainly because they aren’t managing their information, especially their electronic information and therefore don’t know what information to keep and what to dispose of.

An effective defensible disposal process is dependent on an effective information governance process. To know what can be deleted and when, an organization has to know what information needs to be kept and for how long based on regulatory, legal and business value reasons.

Over the coming weeks, I will address those defensible disposal questions and responses the LegalTech panel discussed. Stay tuned…

Advertisement

The Dangers of Infobesity at LegalTech


LegalTech just concluded in New York and one of the popular hot buttons many vendors were talking about was the idea that too much corporate, especially valueless, ungoverned, unstructured information is both risky as well as costly to organizations… I agree. The answer to this “infobesity” (the unrestricted saving of ESI because storage is supposedly cheap and saving everything is easier than checking with others to see if its ok to delete) is a defensible process to systematically dispose of information that’s not subject to regulatory requirements, litigation hold requirements or because it still has business value. In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI can be disposed of.

Several vendors at LegalTech were highlighting Defensible Disposal solutions, also known as defensible disposition and defensible deletion, as the answer to the problem of infobesity. Defensible Disposal is defined by many as a process (manual, automated or both) of identifying and permanently disposing of unneeded or valueless data in a way that will standup in court as reasonable and consistent. The key to this process is to be able to identify valueless information (not subject to regulatory retention or legal hold) with enough certainty to be able to actually follow through and delete the data. This may sound easy… its not. Many organizations are sitting on huge amounts of data because their legal department doesn’t want to be accused of spoliation, so has standing orders to “keep everything forever”. Corporate legal has to be convinced that the defensible disposal processes and solutions billed as being the answer to infogluttony can actually tell the difference, accurately and consistently, between information that should be kept and that information that’s truly valueless.

To automate this defensible disposal process, the solution needs to be able to be able to understand and differentiate content conceptually; that an apple is a fruit as well as a huge high tech company. The automated classification/categorization of content cannot accurately or consistently differentiate the meaning in unstructured content by just relying on keywords or simple rules.

An even less consistent approach to categorization is to base it on simple rules such as “delete everything from/to Bill immediately” or “keep everything to/from any accounting employee for 3 years”. This kind of rules based retention/disposition process will quickly have your GC explaining to a Judge why data that should have been retained was “inadvertently” deleted.

To truly automate disposal of valueless information in a consistently defensible manner, categorization applications must have the ability to first, conceptually understand the meaning in unstructured content so that only content meeting your intended intentions, regardless of language, is classified as “of value” to the organization not because it shares a keyword with other records but because it truly meets your definition of content that needs to be kept. Second, because unstructured data by definition is “free-flowing” (not structured into specific rows and columns) extremely high categorization accuracy rates and defensibly can only be achieved with defensible disposal solutions which incorporate an iterative training processes including “train by example” in a human supervised workflow.

Do organizations really have formal information disposal processes…I think NOT!


Do organizations really have formal information disposal processes…I think NOT!

Do organizations regularly dispose of information in a systematic, documented manner? If the answer is “sure we do”, do they do it via a standardized and documented process or “just leave it to the employees”?

If they don’t…who cares – storage is cheap!

When I ask customers if they have a formal information disposal process, 70 to 80 percent of the time the customer will answer “yes” but when pressed on their actual process, I almost always hear one of the following:

1.    We have mailbox limits, so employees have to delete emails when they reach their mailbox limit
2.    We tell our employees to delete content after 1,2, or 3 years
3.    We store our records (almost always paper) at Iron Mountain and regularly send deletion requests

None of these answers rise to an information governance and disposal process. Mailbox limits only force employees into stealth archiving, i.e. movement of content out of the organization’s direct control. Instructing employees to delete information without enforcement and auditing is as good as not telling them to do anything at all. And storing paper records at Iron Mountain does not address the 95%+ of the electronic data which resides in organizations.

Data center storage is not cheap. Sure, I can purchase 1 TB of external disk at a local electronics store for $150 but that 1 TB is not equal to 1 TB of storage in a corporate data center. It also doesn’t include annual support agreements, the cost of allocated floor space, the cost of power and cooling, or IT resource overhead including nightly backups. Besides, the cost of storage is not the biggest cost organizations who don’t actively manage their information face.

The astronomical costs arise when considering litigation and eDiscovery. A recent RAND survey highlighted the fact that it can cost $18,000 to review 1 GB of information for eDiscovery. And considering many legal cases include the collection and review of terabytes of information, you can imagine the average cost per case can be in the millions of dollars.

So what’s the answer? First, don’t assume information is cheap to keep. Data center storage and IT resources are not inexpensive, take human resources to keep up and running, and consume floor space. Second, information has legal risk and cost associated with it. The collection and review of information for responsiveness is time consuming and expensive. The legal risks associated with unmanaged information can be even more costly. Imagine your organization is sued. One of the first steps in responding to the suit is to find and secure all potentially responsive data. What would happen if you didn’t find all relevant data and it was later discovered you didn’t turn over some information that could have helped the other side’s case? The Judge can overturn an already decided case, issue an adverse inference, assign penalties etc. The withholding or destruction of evidence is never good and always costs the losing side a lot more.

The best strategy is to put policies, processes and automation in place to manage all electronic data as it occurs and to dispose of data deemed not required anymore. One solution is to put categorization software in place to index, understand and categorize content in real time by the conceptual meaning of the content.  Sophisticated categorization can also find, tag and automatically dispose of information that doesn’t need to be kept anymore.  Given the amount of information created daily, automating the process is the only definitive way to answer ‘yes we have a formal information disposal process’.

A Fox, a Henhouse, and Custodial Self-Collection


Judge Scheindlin just issued an opinion in the Freedom of Information Act (FOIA) case National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al. 2012 U.S. Dist. Lexis 97863 (SDNY, July 13, 2012). This dispute focuses on plaintiffs’ attempts to obtain information from several US government agencies including the Federal Bureau of Investigation, the Immigration and Customs Enforcement Agency,   and the Department of Homeland Security. Specifically, the plaintiffs have sought information regarding “Secure Communities”, a federal immigration enforcement program launched in 2008.

In December 2010, after the defendants failed to comply with their obligations under the agreement, Judge Scheindlin ordered them to produce the records on a new “drop dead date”. With the new date in mind, the defendants’ searched hundreds of employees expending thousands of hours and resulted in the production of tens of thousands of responsive records.

The plaintiffs argued the searches had been insufficient i.e. that the agencies failed to conduct any searches of the files of certain custodians who were likely to possess responsive records. Another complaint was that the defendants had not established that the searches that they did conduct were adequate.

On the issue of relying on custodians to “self-collect” i.e., conduct appropriate and legally defensible searches themselves, she writes:

“There are two answers to defendants’ question. First, custodians cannot ‘be trusted to run effective searches,’ without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that ‘contain reasonable specificity of detail rather than merely conclusory statements.’ Defendants’ counsel recognize that, for over twenty years, courts have required that these affidavits ‘set [ ] forth the search terms and the type of search performed.’ But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants’ question has emerged from scholarship and case law only in recent years: most custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: ‘Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.’ There is increasingly strong evidence that ‘[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.’ As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: ‘In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of ‘Go Fish’ … keyword searches usually are not very effective.’”

Custodial self-discovery has been falling out of favor with some Judges for several reasons. First, the defense attorney should be overseeing the discovery process to ensure correctness and completeness. In many courts, the attorney has to certify that the discovery process was done correctly… and what attorney wants to do that if they didn’t really manage it?

In a recent Law.com article written by Ralph Losey, Ralph pointed out that custodial self-discovery was “equivalent to the fox guarding the hen house”.

Inadequate Information Management Policy Leads to Third Party eDiscovery


Many organizations have adopted an Information Governance policy of “not having a policy” for many reasons such as to save on costs associated with managing information or… to frustrate eDiscovery requests, i.e. if I can’t find it, then I can’t produce it.

The policy of purposely deleting (not retaining) business records is not necessarily illegal, unless you put that policy in place to thwart eDiscovery or you have federal or state retention requirements. The “no information governance” policy can have unforeseen consequences. Case in point: Peter Kiewit Sons’, Inc. v. Wall Street Equity Group, Inc., No. 8:10CV365, 2012 WL 1852048 (D. Neb. May 18, 2012).

 The case involves claims by the Plaintiff Peter Kiewit Sons’, Inc. against Defendants Wall Street Equity Group, Inc., Wall Street Group of Companies, Inc., Shepherd Friedman, and Steven West for the alleged violation of various aspects of federal and state trademark law, unfair competition, and commercial misrepresentation including the misuse of the “Kiewit” brand.

The Defendants assist business owners in marketing their businesses to prospective buyers. One strategy the Defendants allegedly use is to suggest to some of their potential clients that Kiewit may be a potential buyer of the client’s business

During the course of discovery, the defendants had objected to a number of the plaintiff’s interrogatories and requests for production on the grounds that the requests are “not reasonably calculated to lead to admissible evidence” – that is, the information requested in not relevant. The Defendants also argue that many of the requests are meant solely to harass the defendants (The court found no merit to that claim). This case has many interesting aspects but the one piece that interested me was the part where the defendants record retention policy and practices were called out.

In a section of the court’s memorandum titled “Defendant’s Inability to Serve as a Reliable Source of Discovery”, the Judge remarks;

 “The corporate defendants claim that, consistent with their standard practice, they do not retain any correspondence unless it results in a completed sale. Thus, the defendants claim they would have no documents showing how often and to what entities they sent proposals (to include any proposals using the Kiewit mark for marketing). The record itself includes evidence of this practice; specifically, defendant West acknowledges that he submitted a proposed merger to the plaintiff concerning a company called Agra, but the defendants have produced no documents regarding that proposal, presumably because it did not result in a sale.

 Since the defendants have a document retention practice of destroying all marketing records unless a closing occurs, to obtain a complete picture of the extent to which the Kiewit mark may have been used by the defendants, records identifying those who received the defendants’ marketing must be obtained. As a result of the defendants’ own document destruction practices, the only remaining sources for information regarding the content of defendants’ marketing materials are the recipient third parties. Information regarding the identity of these third party business contacts, whether obtained by a third party subpoena or in response to written discovery served on the defendants, is relevant to determining the extent of defendants’ use of the Kiewit service mark.

 There is nothing necessarily improper about a company’s reasonable pre-litigation document retention policy whereby documents are disposed of in periodic intervals. Generally speaking, spoliation arguments are unsuccessful if relevant documents were destroyed in accordance with the business’ reasonable document retention policy and/or practices.

 However, even a reasonable practice of destroying documents may have unintended consequences. By failing to retain any documentation, a defendant may lose its ability to credibly defend claims asserted against it, and it may open avenues of third party discovery which would have been closed had the defendant retained documents consistent with standard business practices, and thereby been considered a reliable and complete source of the relevant discovery.”

 As the Judge stated, there is nothing potentially improper about deleting records on an on-going basis, but those who choose this policy should be aware of the third part consequences. Do you really want opposing attorneys causing your customers and suppliers to have to respond to your eDiscovery?

In this case the Judge granted the plaintiff’s request to identify the defendant’s contacts and client lists to and to proceed with third party discovery on some or all of these clients. The Judge also ordered the defendants to turn over financial information, such as the defendants’ accounts receivable and record of its various completed business transactions to further help in identifying potential targets of third party discovery.

Conceptual Search verses Predictive Coding


In my last blog entry titled Successful Predictive Coding Adoption is Dependent on Effective Information Governance”, a question was posted which I thought deserved a wider sharing with the group; “What is the difference between predictive coding and conceptual search?” Being an individual not directly associated with either technology but with some interesting background, I believe I can attempt to explain the differences, at least as it pertains to discovery processes.

Conceptual search technologies allow a user to search on concepts…(pretty valuable insight, right?) instead of searching on a keyword such as “dog”. In the case of a keyword search on “dog”, the user would generate a results set of every document/file/record with the three letters D-O-G present in that specific sequence. The results could include returns on “dogs”, the 4- legged animals, references to “frankfurters”, references to movies (Dog Day Afternoon) etc. in no particular priority.

True conceptual search capability understands (based on search criteria) that the user was looking for information on the 4-legged animals so would return references to not just “dogs” but would also include references to “Golden Retrievers”, “Animal Shelters”, “Pet Adoption” etc.. Some conceptual search solutions will also cluster concepts to give the user the ability to quickly fine-tune their search; for example create a cluster of all dog (animal) references, a cluster for all food related references and so on. Many eDiscovery analytic solutions include this clustering capability.

Predictive coding is a process which includes both automation and human interaction to best produce a results set of potentially responsive documents that trained human reviewers can check.

Predictive coding takes the conceptual search and clustering idea much further than just understanding concepts. A predictive coding solution is “trained” in a very specific manner for each case. For example, the legal team with additional subject matter expertise, manually choose document/records/files that they deem as responsive examples for the particular case and input them to the predictive coding system as examples of content/format which should be found and coded as responsive to the case. Most predictive coding processes include several iterative cycles to fine-tune the example training examples. An iterative cycle would include legal professionals sampling/reviewing those records coded as responsive by the solution and determining if they are truly responsive in the opinion of the human reviewer. If the reviewers find examples of documents that are not deemed responsive, then those documents would then in turn be used to train the solution to disregard or not code as responsive specific content based on the iterative examples. This iterative cycle could be processed several times until the human professionals agree the system has reached the desired level of capability. By the way, this iterative process can and is also used to sample results sets of documents deemed non-responsive to determine if the solution is not finding potentially responsive content. This process is called “Elusion”. Elusion is the process to count the proportion of misses that a system yielded. The proportion of misses, is the proportion of responsive documents that were not marked responsive by the solution. Elusion is the proportion of missed documents that are in fact responsive. This elusion process can also be used in the iterative cycle to further train the system.

The obvious benefit of a predictive coding solution in the eDiscovery process is to dramatically reduce the time spent on legal professionals reading each and every document to determine its responsiveness. A 2012 RAND Institute for Civil Justice report estimated a savings of 80% for the eDiscovery review process (73% of the total cost of eDiscovery) when using a predictive coding solution.

So, to answer the question, conceptual search is an automated information retrieval method which is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Predictive coding is a process (which can include conceptual search) which uses machine learning technologies to categorize (or code) an entire corpus of documents as responsive, non-responsive, or privileged based on human chosen examples used to train the system in an iterative process. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc1.

1 Partial definition from the eDiscovery Daily Blog: http://www.ediscoverydaily.com/2010/12/ediscovery-trends-what-the-heck-is-predictive-coding.html

Successful Predictive Coding Adoption is Dependent on Effective Information Governance


Predictive coding has been receiving a great deal of press lately (for good reason), especially with the ongoing case; Da Silva Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC) (AJP), 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012). On May 21, the plaintiffs filed Rule 72(a) objections to Magistrate Judge Peck’s May 7, 2012 discovery rulings related to the relevance of certain documents that comprise the seed set of the parties’ ESI protocol.

This Rule 72(a) objection highlights an important point in the adoption of predictive coding technologies; the technology is only as good as the people AND processes supporting it.

To review, predictive coding is a process where a computer (with the requisite software), does the vast majority of the work of deciding whether data is relevant, responsive or privileged to a given case.

Beyond simply searching for keyword matching (byte for byte), predictive coding adopts a computer self-learning approach. To accomplish this, attorneys and other legal professionals provide example responsive documents/data in a statistically sufficient quantity which in turn “trains”the computer as to what relevant documents/content should be flagged and set aside for discovery. This is done in an iterative process where legally trained professionals fine-tune the seed set over a period of time to a point where the seed set represents a statistically relevant sample which includes examples of all possible relevant content as well as formats. This capability can also be used to find and secure privileged documents. Instead of legally trained people reading every document to determine if a document is relevant to a case, the computer can perform a first pass of this task in a fraction of the time with much more repeatable results. This technology is exciting in that it can dramatically reduce the cost of the discovery/review process by as much as 80% according to the RAND Institute of Civil Justice.

By now you may be asking yourself what this has to do with Information Governance?…

For predictive coding to become fully adopted across the legal spectrum, all sides have to agree 1. the technology works as advertised, and 2. the legal professionals are providing the system with the proper seed sets for it to learn from. To accomplish the second point above, the seed set must include content from all possible sources of information. If the seed set trainers don’t have access to all potentially responsive content to draw from, then the seed set is in question.

Knowing where all the information resides and having the ability to retrieve it quickly is imperative to an effective discovery process. Records/Information Management professionals should view this new technology as an opportunity to become an even more essential partner to the legal department and entire organization by not just focusing on “records” but on information across the entire enterprise. With full fledged information management programs in place, the legal department will be able to fully embrace this technology to drastically reduce their cost of discovery.

Automatic Deletion…A Good Idea?


In my last blog, I discussed the concept of Defensible Disposal; getting rid of data which has no value to lower the cost and risk of eDiscovery as well as overall storage costs (IBM has been a leader in Defensive Disposal for several years). Custodians keep data because they might need to reuse some of the content later or they might have to produce it later for CYA reasons. I have been guilty of over the years and because of that I have a huge amount of old data on external disks that I will probably never, ever look at again. For example, I have over 500 GB of saved data, spreadsheets, presentations, PDFs, .wav files, MP3s, Word docs, URLs etc. that I have saved for whatever reason over the years. Have I ever really, reused any of the data…maybe a couple of times, but in reality they just site there. This brings up the subject of the Data Lifecycle. Fred Moore, Founder of Horison Information Strategies wrote about this concept years ago, referring to the Lifecycle of Data and the probability that the saved data will ever be re-used or even looked at again. Fred created a graphic showing this lifecycle of data.

Figure 1: The Lifecycle of data – Horison Information Systems

The above chart shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 90 days, its probability of reuse approaches 1% and after 1 year is well under 1%.

You’re probably asking yourself, so what!…storage is cheap, what’s the big deal? Storage is cheap. I have 500 GB of storage available to me on my new company supplied laptop. I have share drives available to me. And I have 1 TB of storage in my home office. I can buy 1TB of external disk for approximately $100, so why not keep everything forever?

For organizations, it’s a question of storage but more importantly, it’s a question of legal risk and the cost of eDiscovery. Any existing data could be a subject of litigation and therefore reviewable. You may recall in my last blog, I mentioned a recent report from the RAND Institute for Civil Justice which discussed the costs of eDiscovery including the estimate that the cost of reviewing records/files is approximately 73% of every eDiscovery dollar spent. By saving everything because you might someday need to reuse or reference it drive the cost of eDiscovery way up.

The key question to ask is; how do you get employees to delete stuff instead of keeping everything? In most organizations the culture has always been one of “save whatever you want until your hard disk and share drive is full”. This culture is extremely difficult to change…quickly. One way is to force new behavior with technology. I know of a couple of companies which only allow files to be saved to a specific folder on the users desktop. For higher level laptop users, as the user syncs to the organization’s infrastructure, all files saved to the specific folder are copied to a users sharedrive where an information management application applies retention policies to the data on the sharedrive as well as the laptop’s data folder.

In my opinion this extreme process would not work in most organizations due to culture expectations. So again we’re left with the question of how do you get employees to delete stuff?

Organizational cultures about data handling and retention have to be changed over time. This includes specific guidance during new employee orientation, employee training, and slow technology changes. An example could be reducing the amount of storage available to an employee on the share or home drive.

Another example could be some process changes to an employee’s workstation of laptop. Force the default storage target to be the “My Documents” folder. Phase 1 could be you have to save all files to the “My Documents” folder but can then be moved anywhere after that.

Phase 2 could include a 90 day time limit on the “My Documents” folder so that anything older than 90 days is automatically deleted (with litigation hold safeguards in place). This would cause files not deemed to be important enough to moved to be of little value and “disposable”. The 3rd Phase could include the inability to move files out of the “My Documents” folder (but with the ability for users to create subfolders with no time limit) thereby ensuring a single place of discoverable data.

Again, this strategy needs to be a slow progression to minimalize the perceived changes to the user population.

The point is it’s a end user problem, not necessarily an IT problem. End users have to be trained, gently pushed, and eventually forced to get rid of useless data…

Defensible Disposal and Predictive Coding Reduces (?) eDiscovery by 65%


Following Judge Peck’s decision on predictive coding in February of 2012, yet another Judge has gone in the same direction. In Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center, et al (April 23, 2012), Judge Chamblin, a state judge in the 20th Judicial Circuit of Virginia’s Loudoun Circuit Court, wrote:

“Having heard argument with regard to the Motion of Landow Aviation Limited Partnership, Landow Aviation I, Inc., and Landow & Company Builders, Inc., pursuant to Virginia Rules of Supreme Court 4:1 (b) and (c) and 4:15, it is hereby ordered Defendants shall be allowed to proceed with the use of predictive coding for the purposes of the processing and production of electronically stored information.”

This decision was despite plaintiff’s objections the technology is not as effective as purely human review.

This decision comes on top of a new RAND Institute for Civil Justice report which highlights a couple of important points. First, the report estimated that $0.73 of every dollar spent on eDiscovery can be attributed to the “Review” task.RAND also called out a study showing an 80% time savings in Attorney review hours when predictive coding was utilized.

This suggests that the use of predictive coding could, optimistically, reduce an organization’s eDiscovery costs by 58.4%.

The barriers to the adoption of predictive coding technology are (still):

  • Outside counsel may be slow to adopt this due to the possibility of loosing a large revenue stream
  • Outside and Internal counsel will be hesitant to rely on new technology without a track record of success
  • Additional guidance from Judges

These barriers will be overcome relatively quickly.

Let’s take this cost saving projection further. In my last blog I talked about “Defensible Disposal” or in other words, getting rid of old data not needed by the business. It is estimated the cost of review can be reduced by 50% by simply utilizing an effective Information Governance program. Utilizing the Defensible Disposal strategy brings the $0.73 of every eDiscovery review dollar down to $0.365.

Now, if predictive coding can reduce the remaining 50% of the cost of eDiscovery review by 80% as was suggested in the RAND report, between the two strategies, a total eDiscovery savings of approximately 65.7% could be achieved. To review, lets look at the math.

Starting with $0.73 of every eDiscovery dollar is attributed to the review process

Calculating a 50% saving due to Defensible Disposal brings the cost of review down to $0.365.

Calculating the additional 80% review savings using predictive coding we get:

$0.365 * 0.2 (1-.8) = $0.073 (total cost of review after savings from both strategies)

To finish the calculations we need to add back in the cost not related to review (processing and collection) which is $0.27

Total cost of eDiscovery = $0.073 + $0.27 = $0.343 or a savings of: $1.0 – $0.343 = 0.657 or 65.7%.

 As with any estimates…your mileage may vary, but this exercise points out the potential cost savings utilizing just two strategies, Defensible Disposal and Predictive Coding.

Information Management Cost Reduction Strategies for Litigation


In these still questionable economic times, most legal departments are still looking for ways to reduce, or at least stop the growth, of their legal budgets. One of the most obvious targets for cost reduction in any legal department is the cost of responding to eDiscovery including the cost of finding all potentially responsive ESI, culling it down and then having in-house or external attorneys review it for relevance and privilege. Per a CGOC survey, the average GC spends approximately $3 million per discovery to gather and prepare information for opposing counsel in litigation.

Most organizations are looking for ways to reduce these growing costs of eDiscovery. The top four cost reduction strategies legal departments are considering are:

  • Bring more evidence analysis and do more ESI processing internally
  • Keep more of the review of ESI in house rather that utilize outside law firms
  • Look at off-shore review
  • Pressure external law firms for lower rates

I don’t believe these strategies address the real problem, the huge and growing amount of ESI.

Several eDiscovery experts have told me that the average eDiscovery matter can include between 2 and 3 GB of potentially responsive ESI per employee. Now, to put that in context, 1 GB of data can contain between 10,000 and 75,000 pages of content. Multiply that by 3 and you are potentially looking at between 30,000 and 225,000 pages of content that should be reviewed for relevancy and privilege per employee. Now consider that litigation and eDiscovery usually includes more than one employee…ranging from two to hundreds.

It seems to me the most straight forward and common sense way to reduce eDiscovery costs is to better manage the information that could be pulled into an eDiscovery matter, proactively.

To illustrate this proactive information management strategy for eDiscovery, we can look at the overused but still appropriate DuPont case study from several years ago.

DuPont re-looked at nine cases. They determined that they had reviewed a total of 75,450,000 pages of content in those nine cases. A total of 11,040,000 turned out to be responsive to the cases. DuPont also looked at the status of these 75 million pages of content to determine their status in their records management process. They found that approximately 50% of those 75 million pages of content were beyond their documented retention period and should have been destroyed and never reviewed for any of the 9 cases. They also calculated they spent $11, 961,000 reviewing this content. In other words, they spent &11.9 million reviewing documents that should not have existed if their records retention schedule and policy had been followed.

An information management program, besides capturing and making ESI available for use, includes the defensible deletion of ESI that has reached the end of its retention period and therefore is valueless to the organization.

Corporate counsel should be the biggest proponents of information governance in their organizations simply due to the fact that it affects their budgets directly.