Do organizations really have formal information disposal processes…I think NOT!

Do organizations really have formal information disposal processes…I think NOT!

Do organizations regularly dispose of information in a systematic, documented manner? If the answer is “sure we do”, do they do it via a standardized and documented process or “just leave it to the employees”?

If they don’t…who cares – storage is cheap!

When I ask customers if they have a formal information disposal process, 70 to 80 percent of the time the customer will answer “yes” but when pressed on their actual process, I almost always hear one of the following:

1.    We have mailbox limits, so employees have to delete emails when they reach their mailbox limit
2.    We tell our employees to delete content after 1,2, or 3 years
3.    We store our records (almost always paper) at Iron Mountain and regularly send deletion requests

None of these answers rise to an information governance and disposal process. Mailbox limits only force employees into stealth archiving, i.e. movement of content out of the organization’s direct control. Instructing employees to delete information without enforcement and auditing is as good as not telling them to do anything at all. And storing paper records at Iron Mountain does not address the 95%+ of the electronic data which resides in organizations.

Data center storage is not cheap. Sure, I can purchase 1 TB of external disk at a local electronics store for $150 but that 1 TB is not equal to 1 TB of storage in a corporate data center. It also doesn’t include annual support agreements, the cost of allocated floor space, the cost of power and cooling, or IT resource overhead including nightly backups. Besides, the cost of storage is not the biggest cost organizations who don’t actively manage their information face.

The astronomical costs arise when considering litigation and eDiscovery. A recent RAND survey highlighted the fact that it can cost $18,000 to review 1 GB of information for eDiscovery. And considering many legal cases include the collection and review of terabytes of information, you can imagine the average cost per case can be in the millions of dollars.

So what’s the answer? First, don’t assume information is cheap to keep. Data center storage and IT resources are not inexpensive, take human resources to keep up and running, and consume floor space. Second, information has legal risk and cost associated with it. The collection and review of information for responsiveness is time consuming and expensive. The legal risks associated with unmanaged information can be even more costly. Imagine your organization is sued. One of the first steps in responding to the suit is to find and secure all potentially responsive data. What would happen if you didn’t find all relevant data and it was later discovered you didn’t turn over some information that could have helped the other side’s case? The Judge can overturn an already decided case, issue an adverse inference, assign penalties etc. The withholding or destruction of evidence is never good and always costs the losing side a lot more.

The best strategy is to put policies, processes and automation in place to manage all electronic data as it occurs and to dispose of data deemed not required anymore. One solution is to put categorization software in place to index, understand and categorize content in real time by the conceptual meaning of the content.  Sophisticated categorization can also find, tag and automatically dispose of information that doesn’t need to be kept anymore.  Given the amount of information created daily, automating the process is the only definitive way to answer ‘yes we have a formal information disposal process’.


Defensible Disposal means never being accused of spoliation for hosting “Shred Days”

U.S District Judge Ronald Whyte in San Jose reversed his own prior ruling from a 2009 case where he issued a judgment against SK Hynix, awarding Rambus Inc. $397 million in a patent infringement case. In his reversal this month, Judge Whyte ruled that Rambus Inc. had spoliated documents in bad faith when it hosted company wide “shred days” in 1998, 1999, and 2000. Judge Whyte found that Rambus could have reasonably foreseen litigation against Hynix as early as 1998, and that therefore Rambus engaged in willful spoliation during the three “shred days” (a finding of spoliation can be based on inadvertent destruction of evidence). Because of this recent spoliation ruling, the Judge reduced the prior Rambus award from $397 million to $215 million, a cost to Rambus of $182 million.

Two questions come to mind in this case; 1) why did Rambus see the need to hold “shred days”?, and 2) did they have an information governance policy and defensible disposal process? As a matter of definition, defensible disposal is the process (manual or automated) of disposing of unneeded or valueless data in a way that will standup in court as reasonable and consistent.

The obvious answer to the second question is probably not or if yes, it wasn’t being followed, otherwise why the need for the shred days? Assuming that Rambus was not destroying evidence knowingly; the term “shred-days” still has a somewhat negative connotation. I would think corporate attorneys would instruct all custodians within their companies that the term “shred” should be used sparingly or not at all in communications because of the questionable implications.

The term “Shred days” reminds many of the Arthur Andersen partner who so famously sent an email message to employees working on the Enron account, reminding them to “comply with the firm’s documentation and retention policy”. The Andersen partner never ordered the destruction or shredding of evidence but because anticipation of future litigation was potentially obvious, the implication in her email was “get rid of suspect stuff”. The timing of the email message was also suspect in that just 21 minutes separated Ms. Temple’s e-mail message to Andersen employees on the Enron account about the importance of complying with the firm’s document retention policy from an entry in a record of her current projects in which she wrote that she was working on a case involving potential violations of federal securities laws.

The Rambus case highlights the need for a true information governance process including a truly defensible disposal strategy. An information governance process would have been capturing, indexing, applying retention policies, protecting content on litigation hold and disposing of content beyond the retention schedule and not on legal hold… automatically, based on documented and approved legally defensible policies. A documented and approved process which is religiously followed, and with proper safeguards goes a long way with the courts to show good faith intent to manage content and protect that content subject to anticipated litigation.

Knowledge Management is Dependent on Effective Information Governance

Last week I presented at the Janders Dean Legal Knowledge & Innovation Conference in Sydney Australia. This conference is one of the leading knowledge management and technology forums for the legal industry in the world. The forum was extremely interesting with a great venue and agenda.

Much of the content was directed at knowledge management within law firms and corporate legal departments i.e. how knowledge is created, collected, and shared within these organizations to maximum benefits and ROI.

The whole event was somewhat hair-raising for me in that I found out I was to travel to Sydney to speak at this forum the Thursday before the Monday I was to leave. It occurred to me on the Saturday before that I was to present at this forum and I had no idea what I was to speak on much less have the time to create an effective presentation. After looking at the agenda on-line I determined that 1) It was for the legal industry and 2) knowledge management was somehow involved.

That Saturday and Sunday I put together a presentation addressing what I thought would add to the discussion which included eDiscovery, Information Governance (because it’s the same as knowledge management – right)and some local Australian precedents. As I landed in San Francisco on Monday to catch my flight to Sydney I noticed an email from the Janders Dean organizer asking me for my presentation so the forum laptop could be loaded and ready to go with all presentations. Thinking that for once I was ahead of the curve I happily replied to the email with my presentation.

Dreading the 15 hour flight in “Economy” I noticed the departure board at the airport was now saying my 10:30 pm flight to Sydney was delayed for 11 hours due to weather and would take off at 9 am Tuesday morning (by the way, as I boarded the next day, the crew admitted it was not weather, but an equipment problem in Chicago). As I was furiously burning up my laptop keyboard looking for a room for the night I got a very nice email from Janders Dean telling me my presentation I had sent off really didn’t hit the mark and was much too eDiscovery heavy…the audience is knowledge management professional, not attorneys.

After getting the last available room in San Francisco (my 747 flight crew slept on the floor in the airport that night) I tried to put together something more “knowledge management (?)” focused and send it off before I got on my flight the next morning. Turns out the Janders Dean organizer (Justin North) was completely right in very politely telling me my first presentation attempt was not a fit. The forum was heavily weighted towards non-attorneys specializing in knowledge management.

The above description was a long winded opening to allow me to get to my main point (and complain about my travel experiences), which is this; really effective knowledge management is dependent on effective information governance. The creation and dissemination of knowledge within an organization is impossible without the ability to create, store, and share useful information while disposing of useless information.

Content auto- categorization and indexing techniques are the first step in getting control of an organization’s information. If a system can conceptually understand and auto- categorize content as it occurs so that all content in the enterprise is searchable and managed via the correct retention periods including immediate deletion of useless information, then information is much more available to be turned into real knowledge within the organization.

Information Security in the Cloud

Information Governance managers as well as individuals need to be aware of possible risks when utilizing external cloud storage providers.

CNN has reported that Dropbox, the popular cloud-storage service, is investigating whether a security breach is to blame for a recent wave of spam e-mail sent to Dropbox users. Dropbox has stated that they haven’t had any reports of unauthorized activity within Dropbox accounts, the suspicion is that email addresses were taken to use for spamming purposes. Dropbox has roughly 50 million users who,according to the site, upload a billion files to the service every 48 hours. So far several users in Europe have reported spam from gambling sites sent to email addresses users created specifically for setting up Dropbox accounts.

This possible security breach brings up the question of how secure these cloud storage sites are. I for one use Dropbox and consider it a fantastic service, especially the desktop icon use model. Individuals and companies need to take the lead in ensuring their data is secure either by not utilizing these services or by securing their data before they upload it.

I always encrypt data before I upload it to any cloud storage service. I use two free encryption utilities; Kryptelite and Iron Key both from Invsoftworks. Krypteliteallows you to encrypt files by simply dragging and dropping files onto the Kryptelite desktop icon. To decrypt the files once they’re encrypted, you must drag the encrypted file back onto the Kryptelite desktop icon and type in the file password. This means you cannot decrypt a file unless you have a running version of Kryptelite on the PC you are using at the time.

Iron Key allows you to create self decrypting files which are completely stand alone and can be decrypted anywhere by simply double clicking on it and typing in the password.

Incorporating this additional encryption step into your utilization of cloud storage will add an additional layer of security beyond what the cloud storage providers are already doing.

A Fox, a Henhouse, and Custodial Self-Collection

Judge Scheindlin just issued an opinion in the Freedom of Information Act (FOIA) case National Day Laborer Organizing Network et al. v. United States Immigration and Customs Enforcement Agency, et al. 2012 U.S. Dist. Lexis 97863 (SDNY, July 13, 2012). This dispute focuses on plaintiffs’ attempts to obtain information from several US government agencies including the Federal Bureau of Investigation, the Immigration and Customs Enforcement Agency,   and the Department of Homeland Security. Specifically, the plaintiffs have sought information regarding “Secure Communities”, a federal immigration enforcement program launched in 2008.

In December 2010, after the defendants failed to comply with their obligations under the agreement, Judge Scheindlin ordered them to produce the records on a new “drop dead date”. With the new date in mind, the defendants’ searched hundreds of employees expending thousands of hours and resulted in the production of tens of thousands of responsive records.

The plaintiffs argued the searches had been insufficient i.e. that the agencies failed to conduct any searches of the files of certain custodians who were likely to possess responsive records. Another complaint was that the defendants had not established that the searches that they did conduct were adequate.

On the issue of relying on custodians to “self-collect” i.e., conduct appropriate and legally defensible searches themselves, she writes:

“There are two answers to defendants’ question. First, custodians cannot ‘be trusted to run effective searches,’ without providing a detailed description of those searches, because FOIA places a burden on defendants to establish that they have conducted adequate searches; FOIA permits agencies to do so by submitting affidavits that ‘contain reasonable specificity of detail rather than merely conclusory statements.’ Defendants’ counsel recognize that, for over twenty years, courts have required that these affidavits ‘set [ ] forth the search terms and the type of search performed.’ But, somehow, DHS, ICE, and the FBI have not gotten the message. So it bears repetition: the government will not be able to establish the adequacy of its FOIA searches if it does not record and report the search terms that it used, how it combined them, and whether it searched the full text of documents.”

“The second answer to defendants’ question has emerged from scholarship and case law only in recent years: most custodians cannot be ‘trusted’ to run effective searches because designing legally sufficient electronic searches in the discovery or FOIA contexts is not part of their daily responsibilities. Searching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.”

“Simple keyword searching is often not enough: ‘Even in the simplest case requiring a search of on-line e-mail, there is no guarantee that using keywords will always prove sufficient.’ There is increasingly strong evidence that ‘[k]eyword search[ing] is not nearly as effective at identifying relevant information as many lawyers would like to believe.’ As Judge Andrew Peck — one of this Court’s experts in e-discovery — recently put it: ‘In too many cases, however, the way lawyers choose keywords is the equivalent of the child’s game of ‘Go Fish’ … keyword searches usually are not very effective.’”

Custodial self-discovery has been falling out of favor with some Judges for several reasons. First, the defense attorney should be overseeing the discovery process to ensure correctness and completeness. In many courts, the attorney has to certify that the discovery process was done correctly… and what attorney wants to do that if they didn’t really manage it?

In a recent article written by Ralph Losey, Ralph pointed out that custodial self-discovery was “equivalent to the fox guarding the hen house”.

Inadequate Information Management Policy Leads to Third Party eDiscovery

Many organizations have adopted an Information Governance policy of “not having a policy” for many reasons such as to save on costs associated with managing information or… to frustrate eDiscovery requests, i.e. if I can’t find it, then I can’t produce it.

The policy of purposely deleting (not retaining) business records is not necessarily illegal, unless you put that policy in place to thwart eDiscovery or you have federal or state retention requirements. The “no information governance” policy can have unforeseen consequences. Case in point: Peter Kiewit Sons’, Inc. v. Wall Street Equity Group, Inc., No. 8:10CV365, 2012 WL 1852048 (D. Neb. May 18, 2012).

 The case involves claims by the Plaintiff Peter Kiewit Sons’, Inc. against Defendants Wall Street Equity Group, Inc., Wall Street Group of Companies, Inc., Shepherd Friedman, and Steven West for the alleged violation of various aspects of federal and state trademark law, unfair competition, and commercial misrepresentation including the misuse of the “Kiewit” brand.

The Defendants assist business owners in marketing their businesses to prospective buyers. One strategy the Defendants allegedly use is to suggest to some of their potential clients that Kiewit may be a potential buyer of the client’s business

During the course of discovery, the defendants had objected to a number of the plaintiff’s interrogatories and requests for production on the grounds that the requests are “not reasonably calculated to lead to admissible evidence” – that is, the information requested in not relevant. The Defendants also argue that many of the requests are meant solely to harass the defendants (The court found no merit to that claim). This case has many interesting aspects but the one piece that interested me was the part where the defendants record retention policy and practices were called out.

In a section of the court’s memorandum titled “Defendant’s Inability to Serve as a Reliable Source of Discovery”, the Judge remarks;

 “The corporate defendants claim that, consistent with their standard practice, they do not retain any correspondence unless it results in a completed sale. Thus, the defendants claim they would have no documents showing how often and to what entities they sent proposals (to include any proposals using the Kiewit mark for marketing). The record itself includes evidence of this practice; specifically, defendant West acknowledges that he submitted a proposed merger to the plaintiff concerning a company called Agra, but the defendants have produced no documents regarding that proposal, presumably because it did not result in a sale.

 Since the defendants have a document retention practice of destroying all marketing records unless a closing occurs, to obtain a complete picture of the extent to which the Kiewit mark may have been used by the defendants, records identifying those who received the defendants’ marketing must be obtained. As a result of the defendants’ own document destruction practices, the only remaining sources for information regarding the content of defendants’ marketing materials are the recipient third parties. Information regarding the identity of these third party business contacts, whether obtained by a third party subpoena or in response to written discovery served on the defendants, is relevant to determining the extent of defendants’ use of the Kiewit service mark.

 There is nothing necessarily improper about a company’s reasonable pre-litigation document retention policy whereby documents are disposed of in periodic intervals. Generally speaking, spoliation arguments are unsuccessful if relevant documents were destroyed in accordance with the business’ reasonable document retention policy and/or practices.

 However, even a reasonable practice of destroying documents may have unintended consequences. By failing to retain any documentation, a defendant may lose its ability to credibly defend claims asserted against it, and it may open avenues of third party discovery which would have been closed had the defendant retained documents consistent with standard business practices, and thereby been considered a reliable and complete source of the relevant discovery.”

 As the Judge stated, there is nothing potentially improper about deleting records on an on-going basis, but those who choose this policy should be aware of the third part consequences. Do you really want opposing attorneys causing your customers and suppliers to have to respond to your eDiscovery?

In this case the Judge granted the plaintiff’s request to identify the defendant’s contacts and client lists to and to proceed with third party discovery on some or all of these clients. The Judge also ordered the defendants to turn over financial information, such as the defendants’ accounts receivable and record of its various completed business transactions to further help in identifying potential targets of third party discovery.

Conceptual Search verses Predictive Coding

In my last blog entry titled Successful Predictive Coding Adoption is Dependent on Effective Information Governance”, a question was posted which I thought deserved a wider sharing with the group; “What is the difference between predictive coding and conceptual search?” Being an individual not directly associated with either technology but with some interesting background, I believe I can attempt to explain the differences, at least as it pertains to discovery processes.

Conceptual search technologies allow a user to search on concepts…(pretty valuable insight, right?) instead of searching on a keyword such as “dog”. In the case of a keyword search on “dog”, the user would generate a results set of every document/file/record with the three letters D-O-G present in that specific sequence. The results could include returns on “dogs”, the 4- legged animals, references to “frankfurters”, references to movies (Dog Day Afternoon) etc. in no particular priority.

True conceptual search capability understands (based on search criteria) that the user was looking for information on the 4-legged animals so would return references to not just “dogs” but would also include references to “Golden Retrievers”, “Animal Shelters”, “Pet Adoption” etc.. Some conceptual search solutions will also cluster concepts to give the user the ability to quickly fine-tune their search; for example create a cluster of all dog (animal) references, a cluster for all food related references and so on. Many eDiscovery analytic solutions include this clustering capability.

Predictive coding is a process which includes both automation and human interaction to best produce a results set of potentially responsive documents that trained human reviewers can check.

Predictive coding takes the conceptual search and clustering idea much further than just understanding concepts. A predictive coding solution is “trained” in a very specific manner for each case. For example, the legal team with additional subject matter expertise, manually choose document/records/files that they deem as responsive examples for the particular case and input them to the predictive coding system as examples of content/format which should be found and coded as responsive to the case. Most predictive coding processes include several iterative cycles to fine-tune the example training examples. An iterative cycle would include legal professionals sampling/reviewing those records coded as responsive by the solution and determining if they are truly responsive in the opinion of the human reviewer. If the reviewers find examples of documents that are not deemed responsive, then those documents would then in turn be used to train the solution to disregard or not code as responsive specific content based on the iterative examples. This iterative cycle could be processed several times until the human professionals agree the system has reached the desired level of capability. By the way, this iterative process can and is also used to sample results sets of documents deemed non-responsive to determine if the solution is not finding potentially responsive content. This process is called “Elusion”. Elusion is the process to count the proportion of misses that a system yielded. The proportion of misses, is the proportion of responsive documents that were not marked responsive by the solution. Elusion is the proportion of missed documents that are in fact responsive. This elusion process can also be used in the iterative cycle to further train the system.

The obvious benefit of a predictive coding solution in the eDiscovery process is to dramatically reduce the time spent on legal professionals reading each and every document to determine its responsiveness. A 2012 RAND Institute for Civil Justice report estimated a savings of 80% for the eDiscovery review process (73% of the total cost of eDiscovery) when using a predictive coding solution.

So, to answer the question, conceptual search is an automated information retrieval method which is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Predictive coding is a process (which can include conceptual search) which uses machine learning technologies to categorize (or code) an entire corpus of documents as responsive, non-responsive, or privileged based on human chosen examples used to train the system in an iterative process. These technologies typically rank the documents from most to least likely to be responsive to a specific information request. This ranking can then be used to “cut” or partition the documents into one or more categories, such as potentially responsive or not, in need of further review or not, etc1.

1 Partial definition from the eDiscovery Daily Blog: