Information Governance and Predictive Coding


Predictive coding, also known as computer assisted coding and technology assisted review, all refer to the act of using computers and software applications which use machine learning algorithms to enable a computer to learn from records presented it (usually from human attorneys) as to what types of content are potentially relevant to a given legal matter. After a sufficient number of examples are provided by the attorneys, the technology is given access to the entire potential corpus (records/data) to sort through and find records that, based on its “learning”, are potentially relevant to the case.

This automation can dramatically reduce costs due to the fact that computers, instead of attorneys conduct the first pass culling of potentially millions of records.

Predictive coding has several very predictable dependencies that need to be addressed to be accepted as a useful and dependable tool in the eDiscovery process. First, which documents/records are used and who chooses them to “train the system”? This training selection will almost always be conducted by attorneys involved with the case.

The second dependency revolves around the number of documents used for the training. How many training documents are needed to provide the needed sample size to enable a dependable process?

And most importantly, do the parties have access to all potentially relevant documents in the case to draw the training documents from? Remember, potentially relevant documents can be stored anywhere. For predictive coding, or any other eDiscovery process to be legally defensible, all existing case related documents need to be available. This requirement highlights the need for effective information management by all in a given organization.

As the courts adopt, or at least experiment with predictive coding, as Judge Peck did in Monique Da Silva Moore, et al., v. Publicis Groupe & MSL Group, Civ. No. 11-1279 (ALC)(AJP) (S.D.N.Y. February 24, 2012, an effective information management program will become key to he courts adopting this new technology.

Advertisement

The ROI of Information Management


Information, data, electronically stored information (ESI), records, documents, hard copy files, email, stuff—no matter what you call it; it’s all intellectual property that your organization pays individuals to produce, interpret, use and export to others. After people, it’s a company’s most valuable asset, and it has many CIOs, GCs and others responsible asking: What’s in that information; who controls it; and where is it stored?

In simplest terms, I believe that businesses exist to generate and use information to produce revenue and profit.  If you’re willing to go along with me and think of information in this way as a commodity, we must also ask: How much does it cost to generate all that information? And, what’s the return on investment (ROI) for all that information?

The vast majority of information in an organization is not managed, not indexed, not backed up and, as you probably know or could guess, is rarely–if ever–accessed. Consider for a minute all the data in your company that is not centrally managed and  not easily available. This data includes backup tapes, share drives, employee hard disks, external disks, USB drives, CDs, DVDs, email attachments  sent outside the organization and hardcopy documents hidden away in filing cabinets.

Here’s the bottom line: If your company can’t find information or  doesn’t know what it contains, it is of little value. In fact, it’s valueless.

Now consider the amount of money the average company spends on an annual basis for the production, use and storage of information. These expenditures span:

  • Employee salaries. Most employees are in one way or another hired to produce, digest and act on information.
  • Employee training and day-to-day help-desk support.
  • Computers for each employee
  • Software
  • Email boxes
  • Share drives, storage
  • Backup systems
  • IT employees for data infrastructure support

In one way or another, companies exist to create and utilize information. So… do you know where all your information is and what’s in it? What’s your organization’s true ROI on the production and consumption of your information in your entire organization? How much higher could it be if you had complete control if it?

As an example, I have approximately 14.5 GB of Word documents, PDFs, PowerPoint files, spreadsheets, and other types of files in different formats that I’ve either created or received from others. Until recently, I had 3.65 GB of emails in my email box both on the Exchange server and mirrored locally on my hard disk. Now that I have a 480 MB mailbox limit imposed on me, 3.45 GB of those emails are now on my local hard disk only.

How much real, valuable information is contained in the collective 18 GB on my laptop? The average number of pages of information contained in 1 GB is conservatively 10,000. So 18 GB of files equals approximately 180,000 pages of information for a single employee that is not easily accessible or searchable by my organization. Now also consider the millions of pages of hardcopy records existing in file cabinets, microfiche and long term storage all around the company.

The main question is this: What could my organization do with quick and intelligent access to all of its employees’ information?

The more efficient your organization is in managing and using information, the higher the revenue and hopefully profit per employee will be.

Organizations need to be able to “walk the fence” between not impeding the free flow of information generation and sharing, and having a way for the organization as a whole to  find and use that information. Intelligent access to all information generated by an organization is key to effective information management.

Organizations spend huge sums of money to generate information…why not get your money’s worth? This future capability is the essence of true information management and much higher ROIs for your organization.