Dark Data Archiving…Say What?


Dark door 2

In a recent blog titled “Bring your dark data out of the shadows”, I described what dark data was and why its important to manage it. To review, the reasons to manage were:

  1. It consumes costly storage space
  2. It consumes IT resources
  3. It masks security risks
  4. And it drives up eDiscovery costs

For the clean-up of dark data (remediation) it has been suggested by many, including myself, that the remediation process should include determining what you really have, determine what can be immediately disposed of (obvious stuff like duplicates and any expired content etc.), categorize the rest, and move the remaining categorized content into information governance systems.

But many “conservative” minded people (like many General Counsel) hesitate at the actual deletion of data, even after they have spent the resources and dollars to identify potentially disposable content. The reasoning usually centers on the fear of destroying information that could be potentially relevant in litigation. A prime example is seen in the Arthur Andersen case where a Partner famously sent an email message to employees working on the Enron account, reminding them to “comply with the firm’s documentation and retention policy”, or in other words – get rid of stuff. Many GCs don’t want to be put in the position of rightfully disposing of information per policy and having to explain later in court why potentially relevant information was disposed of…

For those that don’t want to take the final step of disposing of data, the question becomes “so what do we do with it?” This reminds me of a customer I was dealing with years ago. The GC for this 11,000 person company, a very distinguished looking man, was asked during a meeting that included the company’s senior staff, what the company’s information retention policy was. He quickly responded that he had decided that all information (electronic and hardcopy) from their North American operations would be kept for 34 years. Quickly calculating the company’s storage requirements over 34 years with 11,000 employees, I asked him if he had any idea what his storage requirements would be at the end of 34 years. He replied no and asked what the storage requirements would be. I replied it would be in the petabytes range and asked him if he understood what the cost of storing that amount of data would be and how difficult it would be to find anything in it.

He smiled and replied “I’m retiring in two years, I don’t care”

The moral of that actual example is that if you have decided to keep large amounts of electronic data for long periods of time, you have to consider the cost of storage as well as how you will search it for specific content when you actually have to.

In the example above, the GC was planning on storing it on spinning disk which is costly. Others I have spoken to have decided that most cost effective way to store large amounts of data for long periods of time is to keep backup tapes. Its true that backup tapes are relatively cheap (compared to spinning disk) but are difficult to get anything off of, they have a relatively high failure rate (again compared to spinning disk)  and have to be rewritten every so many years because backup tapes slowly lose their data over time.

A potential solution is moving your dark data to long term hosted archives. These hosted solutions can securely hold your electronically stored information (ESI) at extremely low costs per gigabyte. When needed, you can access your archive remotely and search and move/copy data back to your site.

An important factor to look for (for eDiscovery) is that data moved, stored, indexed and recovered from the hosted archive cannot alter the metadata in anyway. This is especially important when responding to a discovery request.

For those of you considering starting a dark data remediation project, consider long term hosted archives as a staging target for that data your GC just won’t allow to be disposed of.

About these ads

Discoverable versus Admissible; aren’t they the same?


This question comes up a lot, especially from non-attorneys. The thought is that if something is discoverable, then it must be admissible; the assumption being that a Judge will not allow something to be discovered if it can’t be used in court. The other thought is that everything is discoverable if it pertains to the case and therefor everything is admissible.

Let’s first address what’s discoverable. For good cause, the court may order discovery of any matter (content) that’s not privileged relevant to the subject matter involved in the action. In layman’s terms, if it is potentially relevant to the case, you may have to produce it in discovery or in other words, anything and everything is potentially discoverable.  All discovery is subject to the limitations imposed by FRCP Rule 26(b)(2)(C).

With that in mind, let’s look at the subject of admissibility.

In Lorraine v. Markel Am. Ins. Co., 241 F.R.D. 534, 538 (D. Md. 2007), the court started with the premise that the admissibility of ESI is determined by a collection of evidence rules “that present themselves like a series of hurdles to be cleared by the proponent of the evidence”.  “Failure to clear any of these evidentiary hurdles means that the evidence will not be admissible”. Whenever ESI is offered as evidence, five evidentiary rules need to be considered. They are:

  • is relevant to the case
  • is authentic
  • is not hearsay pursuant to Federal Rule of Evidence 801
  • is an original or duplicate under the original writing rule
  • has probative value that is substantially outweighed by the danger of unfair prejudice or one of the other factors identified by Federal Rule of Evidence 403, such that it should be excluded despite its relevance.

Hearsay is defined as a statement made out of court that is offered in court as evidence to prove the truth of the matter asserted. Hearsay comes in many forms including written or oral statements or even gestures.

It is the Judge’s job to determine if evidence is hearsay or credible. There are three evidentiary rules that help the Judge make this determination:

  1. Before being allowed to testify, a witness generally must swear or affirm that his or her testimony will be truthful.
  2. The witness must be personally present at the trial or proceeding in order to allow the judge or jury to observe the testimony firsthand.
  3. The witness is subject to cross-examination at the option of any party who did not call the witness to testify.

The Federal Rules of Evidence Hearsay Rule prohibits most statements made outside of court from being used as evidence in court. Looking at the three evidentiary rules mentioned above – usually a statement made outside of the courtroom is not made under oath, the person making the statement outside of court is not present to be observed by the Judge, and the opposing party is not able to cross examine the statement maker. This is not to say all statements made outside of court are inadmissible. The Federal Rule of Evidence 801 does provide for several exclusions to the Hearsay rule.

All content is discoverable if it potentially is relevant to the case and not deemed privileged, but discovered content may be ruled inadmissible if it is deemed privileged (doctor/patient communications), unreliable or hearsay. You may be wondering how an electronic document can be considered hearsay? The hearsay rule refers to “statements” which can either be written or oral. So, as with paper documents, in order to determine whether the content of electronic documents are hearsay or fact, the author of the document must testify under oath and submit to cross-examination in order to determine whether the content is fact and can stand as evidence.

This legal argument between fact and hearsay does not relieve the discoveree from finding, collecting and producing all content in that could be relevant to the case.

Next Generation Technologies Reduce FOIA Bottlenecks


Federal agencies are under more scrutiny to resolve issues with responding to Freedom of Information Act (FOIA) requests.

The Freedom of Information Act provides for the full disclosure of agency records and information to the public unless that information is exempted under clearly delineated statutory language. In conjunction with FOIA, the Privacy Act serves to safeguard public interest in informational privacy by delineating the duties and responsibilities of federal agencies that collect, store, and disseminate personal information about individuals. The procedures established ensure that the Department of Homeland Security fully satisfies its responsibility to the public to disclose departmental information while simultaneously safeguarding individual privacy.

In February of this year, the House Oversight and Government Reform Committee opened a congressional review of executive branch compliance with the Freedom of Information Act.

The committee sent a six page letter to the Director of Information Policy at the Department of Justice (DOJ), Melanie Ann Pustay. In the letter, the committee questions why, based on a December 2012 survey, 62 of 99 government agencies have not updated their FOIA regulations and processes which was required by Attorney General Eric Holder in a 2009 memorandum. In fact the Attorney General’s own agency have not updated their regulations and processes since 2003.

The committee also pointed out that there are 83,000 FOIA request still outstanding as of the writing of the letter.

In fairness to the federal agencies, responding to a FOIA request can be time-consuming and expensive if technology and processes are not keeping up with increasing demands. Electronic content can be anywhere including email systems, SharePoint servers, file systems, and individual workstations. Because content is spread around and not usually centrally indexed, enterprise wide searches for content do not turn up all potentially responsive content. This means a much more manual, time consuming process to find relevant content is used.

There must be a better way…

New technology can address the collection problem of searching for relevant content across the many storage locations where electronically stored information (ESI) can reside. For example, an enterprise-wide search capability with “connectors” into every data repository, email, SharePoint, file systems, ECM systems, records management systems allows all content to be centrally indexed so that an enterprise wide keyword search will find all instances of content with those keywords present. A more powerful capability to look for is the ability to search on concepts, a far more accurate way to search for specific content. Searching for conceptually comparable content can speed up the collection process and drastically reduce the number of false positives in the results set while finding many more of the keyword deficient but conceptually responsive records. In conjunction with concept search, automated classification/categorization of data can reduce search time and raise accuracy.

The largest cost in responding to a FOIA request is in the review of all potentially relevant ESI found during collection. Another technology that can drastically reduce the problem of having to review thousands, hundreds of thousands or millions of documents for relevancy and privacy currently used by attorneys for eDiscovery is Predictive Coding.

Predictive Coding is the process of applying machine learning and iterative supervised learning technology to automate document coding and prioritize review. This functionality dramatically expedites the actual review process while dramatically improving accuracy and reducing the risk of missing key documents. According to a RAND Institute for Civil Justice report published in 2012, document review cost savings of 80% can be expected using Predictive Coding technology.

With the increasing number of FOIA requests swamping agencies, agencies are hard pressed to catch up to their backlogs. The next generation technologies mentioned above can help agencies reduce their FOIA related costs while decreasing their response time.

Coming to Terms with Defensible Disposal; Part 1


Last week at LegalTech New York 2013 I had the opportunity to moderate a panel titled: “Defensible Disposal: If it doesn’t exist, I don’t have to review it…right?” with an impressive roster of panelists. They included: Bennett Borden, Partner, Chair eDiscovery & Information Governance Section, Williams Mullen, Clifton C. Dutton, Senior Vice President, Director of Strategy and eDiscovery, American International Group and John Rosenthal, Chair, eDiscovery and Information Management Practice, Winston & Strawn and Dean Gonsowski, Associate General Counsel, Recommind Inc.

During the panel session it was agreed that organizations have been over-retaining ESI (which accounts for at least 95% of all data in organizations) even if it’s no longer needed for business or legal reasons. Other factors driving this over-retention of ESI were the fear of inadvertently deleting evidence, otherwise called spoliation. In fact an ESG survey published in December of 2012 showed that the “fear of the inability to furnish data requested as part of a legal or regulatory matter” was the highest ranked reason organizations chose not to dispose of ESI.

Other reasons cited included not having defined policies for managing and disposing of electronic information and adversely, organizations having defined retention policies to actually keep all data indefinitely (usually because of the fear of spoliation).

One of the principal information governance gaps most organizations haven’t yet addressed is the difference between “records” and “information”. Many organizations have “records” retention/disposition policies to manage those official company records required to be retained under regulatory or legal requirements. But those documents and files that fall under legal hold and regulatory requirements amount to approximately 6% of an organization’s retained electronic data (1% legal hold and 5% regulatory).

Another interesting survey published by Kahn Consulting in 2012 showed levels of employee understanding of their information governance-related responsibilities. In this survey only 21% of respondents had a good idea of what information needed to be retained/deleted and only 19% knew how  information should be retained or disposed of. In that same survey, only 15% of respondents had a general idea of their legal hold and eDiscovery responsibilities.

The above surveys highlight the fact that organizations aren’t disposing of information in a systematic process mainly because they aren’t managing their information, especially their electronic information and therefore don’t know what information to keep and what to dispose of.

An effective defensible disposal process is dependent on an effective information governance process. To know what can be deleted and when, an organization has to know what information needs to be kept and for how long based on regulatory, legal and business value reasons.

Over the coming weeks, I will address those defensible disposal questions and responses the LegalTech panel discussed. Stay tuned…

Information Management Cost Reduction Strategies for Litigation


In these still questionable economic times, most legal departments are still looking for ways to reduce, or at least stop the growth, of their legal budgets. One of the most obvious targets for cost reduction in any legal department is the cost of responding to eDiscovery including the cost of finding all potentially responsive ESI, culling it down and then having in-house or external attorneys review it for relevance and privilege. Per a CGOC survey, the average GC spends approximately $3 million per discovery to gather and prepare information for opposing counsel in litigation.

Most organizations are looking for ways to reduce these growing costs of eDiscovery. The top four cost reduction strategies legal departments are considering are:

  • Bring more evidence analysis and do more ESI processing internally
  • Keep more of the review of ESI in house rather that utilize outside law firms
  • Look at off-shore review
  • Pressure external law firms for lower rates

I don’t believe these strategies address the real problem, the huge and growing amount of ESI.

Several eDiscovery experts have told me that the average eDiscovery matter can include between 2 and 3 GB of potentially responsive ESI per employee. Now, to put that in context, 1 GB of data can contain between 10,000 and 75,000 pages of content. Multiply that by 3 and you are potentially looking at between 30,000 and 225,000 pages of content that should be reviewed for relevancy and privilege per employee. Now consider that litigation and eDiscovery usually includes more than one employee…ranging from two to hundreds.

It seems to me the most straight forward and common sense way to reduce eDiscovery costs is to better manage the information that could be pulled into an eDiscovery matter, proactively.

To illustrate this proactive information management strategy for eDiscovery, we can look at the overused but still appropriate DuPont case study from several years ago.

DuPont re-looked at nine cases. They determined that they had reviewed a total of 75,450,000 pages of content in those nine cases. A total of 11,040,000 turned out to be responsive to the cases. DuPont also looked at the status of these 75 million pages of content to determine their status in their records management process. They found that approximately 50% of those 75 million pages of content were beyond their documented retention period and should have been destroyed and never reviewed for any of the 9 cases. They also calculated they spent $11, 961,000 reviewing this content. In other words, they spent &11.9 million reviewing documents that should not have existed if their records retention schedule and policy had been followed.

An information management program, besides capturing and making ESI available for use, includes the defensible deletion of ESI that has reached the end of its retention period and therefore is valueless to the organization.

Corporate counsel should be the biggest proponents of information governance in their organizations simply due to the fact that it affects their budgets directly.

Can you wipe your twitter ramblings, and should you?


In December of 2011, the Library of Congress and Twitter signed an agreement that will eventually make available every public Tweet ever sent as an archive to the Library of Congress.

While writing a blog post last week, I began  to wonder how long all my twitter postings would be available and who could look at them. For the fun of it, I went back through approximately 6 months of my old twitter postings, re-tweets and replies (yes you can do it, it’s relatively easy and you can look at anyone’s).

 I’ve been pretty good about keeping my twitter posts “business-like” and have steered away from personal stuff like “I just checked in to the Ramada Inn on route 11…can’t wait for the evening to begin!”, or “does anyone know how to setup an off-shore bank account?” or “those jerks over at Company ABC are a bunch of losers”.  But many tweeters aren’t so disciplined and have posted stuff that could come back to haunt them later. I could imagine a perspective employer reviewing a candidate’s twitter history or even worse an attorney conducting research for a case using the public twitter archives to create a timeline.

With that in mind, could you delete your twitter postings and should you? Twitter does allow you to delete specific tweets one at a time but as far as I can determine, Twitter does not give you the ability to delete your entire twitter history short of deactivating your account. From the Twitter website:

How To Delete a Tweet

If you’ve posted something that you’d rather take back, you can remove it easily. When you hover over your Tweet while viewing your home or profile page, you’ll see a few options appear below the message.

To delete one of your Twitter updates:

  1. 1.       Log in to Twitter.com
  2. 2.       Visit your Profile page
  3. 3.       Locate the Tweet you want to delete
  4. 4.       Hover your mouse over the message (as shown below), and click the “Delete” option that appears

Voila! Gone forever… almost. Deleted updates sometimes hang out in Twitter search. They will clear with time.

We do not provide a way to bulk delete Tweets. If you’re looking to get a “fresh start” on your Twitter account without losing your username, the best way to do this is to create a temporary account with a temporary username, and then switch the username between your current account and the temporary account. Please see our article on How to Change Your Username for more info. 

On December 30, 2011, CNET published a story titled “How to delete all your tweets” which highlighted a product called TwitWipe. TwitWipe is a free tool that allows you to delete ALL your past tweets in one fell swoop. This may be handy because you can clean out your twitter account and start fresh without changing your username and dumping all your hard won followers.

This is an interesting capability but I think the more important question is why would you use this drastic of a step? The four most obvious reasons one would want to delete all their twitter postings and start fresh would be:

1.       You went through an unfortunate period in your life that you would rather forget

2.       You were regularly conducting criminal activities through your Twitter account

3.       You are considering a run for the presidency

4.       For whatever reason, you don’t want your twitter postings archived and available at the Library of Congress

The ability to delete ESI can be dangerous if done at the wrong time, especially if civil litigation is anticipated. Deleting a single tweet or every tweet you have ever posted can be construed as destruction of evidence if those tweets could have been relevant in litigation. ESI, no matter its format or where it’s stored, is potentially evidence  and should be at least considered when protecting ESI for litigation hold. Attorneys on both sides need to include social media content like twitter postings in their eDiscovery plans and be sure to warn all custodians about deleting/editing  social media content once litigation is anticipated.