The Dangers of Infobesity at LegalTech


LegalTech just concluded in New York and one of the popular hot buttons many vendors were talking about was the idea that too much corporate, especially valueless, ungoverned, unstructured information is both risky as well as costly to organizations… I agree. The answer to this “infobesity” (the unrestricted saving of ESI because storage is supposedly cheap and saving everything is easier than checking with others to see if its ok to delete) is a defensible process to systematically dispose of information that’s not subject to regulatory requirements, litigation hold requirements or because it still has business value. In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI can be disposed of.

Several vendors at LegalTech were highlighting Defensible Disposal solutions, also known as defensible disposition and defensible deletion, as the answer to the problem of infobesity. Defensible Disposal is defined by many as a process (manual, automated or both) of identifying and permanently disposing of unneeded or valueless data in a way that will standup in court as reasonable and consistent. The key to this process is to be able to identify valueless information (not subject to regulatory retention or legal hold) with enough certainty to be able to actually follow through and delete the data. This may sound easy… its not. Many organizations are sitting on huge amounts of data because their legal department doesn’t want to be accused of spoliation, so has standing orders to “keep everything forever”. Corporate legal has to be convinced that the defensible disposal processes and solutions billed as being the answer to infogluttony can actually tell the difference, accurately and consistently, between information that should be kept and that information that’s truly valueless.

To automate this defensible disposal process, the solution needs to be able to be able to understand and differentiate content conceptually; that an apple is a fruit as well as a huge high tech company. The automated classification/categorization of content cannot accurately or consistently differentiate the meaning in unstructured content by just relying on keywords or simple rules.

An even less consistent approach to categorization is to base it on simple rules such as “delete everything from/to Bill immediately” or “keep everything to/from any accounting employee for 3 years”. This kind of rules based retention/disposition process will quickly have your GC explaining to a Judge why data that should have been retained was “inadvertently” deleted.

To truly automate disposal of valueless information in a consistently defensible manner, categorization applications must have the ability to first, conceptually understand the meaning in unstructured content so that only content meeting your intended intentions, regardless of language, is classified as “of value” to the organization not because it shares a keyword with other records but because it truly meets your definition of content that needs to be kept. Second, because unstructured data by definition is “free-flowing” (not structured into specific rows and columns) extremely high categorization accuracy rates and defensibly can only be achieved with defensible disposal solutions which incorporate an iterative training processes including “train by example” in a human supervised workflow.

Knowledge Management is Dependent on Effective Information Governance


Last week I presented at the Janders Dean Legal Knowledge & Innovation Conference in Sydney Australia. This conference is one of the leading knowledge management and technology forums for the legal industry in the world. The forum was extremely interesting with a great venue and agenda.

Much of the content was directed at knowledge management within law firms and corporate legal departments i.e. how knowledge is created, collected, and shared within these organizations to maximum benefits and ROI.

The whole event was somewhat hair-raising for me in that I found out I was to travel to Sydney to speak at this forum the Thursday before the Monday I was to leave. It occurred to me on the Saturday before that I was to present at this forum and I had no idea what I was to speak on much less have the time to create an effective presentation. After looking at the agenda on-line I determined that 1) It was for the legal industry and 2) knowledge management was somehow involved.

That Saturday and Sunday I put together a presentation addressing what I thought would add to the discussion which included eDiscovery, Information Governance (because it’s the same as knowledge management – right)and some local Australian precedents. As I landed in San Francisco on Monday to catch my flight to Sydney I noticed an email from the Janders Dean organizer asking me for my presentation so the forum laptop could be loaded and ready to go with all presentations. Thinking that for once I was ahead of the curve I happily replied to the email with my presentation.

Dreading the 15 hour flight in “Economy” I noticed the departure board at the airport was now saying my 10:30 pm flight to Sydney was delayed for 11 hours due to weather and would take off at 9 am Tuesday morning (by the way, as I boarded the next day, the crew admitted it was not weather, but an equipment problem in Chicago). As I was furiously burning up my laptop keyboard looking for a room for the night I got a very nice email from Janders Dean telling me my presentation I had sent off really didn’t hit the mark and was much too eDiscovery heavy…the audience is knowledge management professional, not attorneys.

After getting the last available room in San Francisco (my 747 flight crew slept on the floor in the airport that night) I tried to put together something more “knowledge management (?)” focused and send it off before I got on my flight the next morning. Turns out the Janders Dean organizer (Justin North) was completely right in very politely telling me my first presentation attempt was not a fit. The forum was heavily weighted towards non-attorneys specializing in knowledge management.

The above description was a long winded opening to allow me to get to my main point (and complain about my travel experiences), which is this; really effective knowledge management is dependent on effective information governance. The creation and dissemination of knowledge within an organization is impossible without the ability to create, store, and share useful information while disposing of useless information.

Content auto- categorization and indexing techniques are the first step in getting control of an organization’s information. If a system can conceptually understand and auto- categorize content as it occurs so that all content in the enterprise is searchable and managed via the correct retention periods including immediate deletion of useless information, then information is much more available to be turned into real knowledge within the organization.