Infobesity in the Healthcare Industry: A Well-Balanced Diet of Predictive Governance is needed


Fat TwitterWith the rapid advances in healthcare technology, the movement to electronic health records, and the relentless accumulation of regulatory requirements, the shift from records management to information governance is increasingly becoming a necessary reality.

In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI is not needed and could be disposed of without impact to the organization. I would argue that for the healthcare industry, especially for covered entities with medical record stewardship, those retention percentages are somewhat higher, especially the regulatory retention requirements.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages. Despite a push for more organization and process in managing unstructured data, healthcare organizations continue to binge on unstructured data with little regard to the overall health of their enterprises.

So how does this info-gluttony, (the unrestricted saving of unstructured data because data storage is cheap and saving everything is just easier), affect the health of the organization? Obviously you’ll look terrible in horizontal stripes, but also finding specific information quickly (or at all) is impossible, you’ll spend more on storage, data breaches will could occur more often, litigation/eDiscovery expenses will rise, and you won’t want to go to your 15th high school reunion…

To combat this unstructured info-gain, we need an intelligent information governance solution – STAT!  And that solution must include a defensible process to systematically dispose of information that’s no longer subject to regulatory requirements, litigation hold requirements or because it no longer has business value.

To enable this information governance/defensible disposal Infobesity cure, healthcare information governance solutions must be able to extract meaning from all of this unstructured content, or in other words understand and differentiate content conceptually. The automated classification/categorization of unstructured content based on content meaning cannot accurately or consistently differentiate the meaning in electronic content by simply relying on simple rules or keywords. To accurately automate the categorization and management of unstructured content, a machine learning capability to “train by example” is a precondition. This ability to systematically derive meaning from unstructured content as well as machine learning to accurately automate information governance is something we call “Predictive Governance”.

A side benefit of Predictive Governance is (you’ll actually look taller) previously lost organizational knowledge and business intelligence can be automatically compiled and made available throughout the organization.

The ROI of Conceptual Search


After people, information is a company’s most valuable asset. But many are asking; “what’s in that information?”, “who controls it?”, “can others access it?”, and “is it a risk to keep?”, “for how long?”. The vast majority of information in any organization is not managed, not indexed, and is rarely–if ever–accessed.

Companies exist to create and utilize information. Do you know where all your organization’s information is, what’s in it, and most importantly, can those that need it find and access it? If your employees can’t find when they need it, then the return on investment (ROI) for that information is zero. How much higher could the ROI be if your employees could actually find and share data effortlessly?

Enterprise search – The mindless regurgitation of keyword matches

Enterprise search is the organized query/retrieval of information from across an organization’s enterprise data systems. Data sources include e-mail servers, application databases, content management systems, file systems, intranet sites and many others. Legacy enterprise search systems provide users the ability to query organizational data repositories utilizing keyword-based inquiries that returns huge results sets that then have to be manually filtered by the user until they find what they were looking for (if they actually find it).

A sizeable drawback to a keyword-based search is that it will return all keyword matches even though they may be conceptually different – false positives.

What is conceptual search?

A conceptual search is used to search electronically stored information for information that is conceptually a match or similar to the information represented in a search query as opposed to a keyword search where only documents with exact keyword matches are returned. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query regardless of shared terms or language.

Cost savings – Concept versus keyword search

Employees are rarely capable of constructing keyword and Boolean searches that return the data they are looking for immediately. Because of this fact, time is wasted in actually finding what they were looking for. IDC has estimated that using a higher quality enterprise search capability can save up to 53.4% of time spent searching for data. Many have argued conceptual search can save even more time because conceptual search more closely models how humans think and therefor will return more meaningful results quicker.

Alan Greenspan, a past Chairman of the Federal Reserve, once stated “You’re entitled to your own opinions, but not to your own facts”. Return on Investment calculations are only as good as the reliability of the variables used to calculate it. To calculate ROI, the benefit (return) of an investment is divided by the cost of the investment – the result is expressed as a percentage.

Enterprise Search ROI calculations require the following data points:

•           The total cost of the current enterprise search process used

•           The total cost of the new enterprise search process after the investment is in place

•           The total cost of the new enterprise search investment

The actual ROI formula looks like this:


Return on investment is an often asked for but little understood financial measure. Many equate cost savings to ROI but cost savings is only a part of the equation. ROI also includes looking at the cost of the solution that produced the savings. ROI lets you compare returns from various investment opportunities to make the best investment decision for your available dollars.

Organizations run on information. If information is easier to find and use, the organization profits from it.

Ask the Magic 8-Ball; “Is Predictive Defensible Disposal Possible?”


The Good Ole Days of Paper Shredding

In my early career, shred days – the scheduled annual activity where the company ordered all employees to wander through all their paper records to determine what should be disposed of, were common place. At the government contractor I worked for, we actually wheeled our boxes out to the parking lot to a very large truck that had huge industrial shredders in the back. Once the boxes of documents were shredded, we were told to walk them over to a second truck, a burn truck, where we, as the records custodian, would actually verify that all of our records were destroyed. These shred days were a way to actually collect, verify and yes physically shred all the paper records that had gone beyond their retention period over the preceding year.

The Magic 8-Ball says Shred Days aren’t Defensible

Nowadays, this type of activity carries some negative connotations with it and is much more risky. Take for example the recent case of Rambus vs SK Hynix. In this case U.S District Judge Ronald Whyte in San Jose reversed his own prior ruling from a 2009 case where he had originally issued a judgment against SK Hynix, awarding Rambus Inc. $397 million in a patent infringement case. In his reversal this year, Judge Whyte ruled that Rambus Inc. had spoliated documents in bad faith when it hosted company-wide “shred days” in 1998, 1999, and 2000. Judge Whyte found that Rambus could have reasonably foreseen litigation against Hynix as early as 1998, and that therefore Rambus engaged in willful spoliation during the three “shred days” (a finding of spoliation can be based on inadvertent destruction of evidence as well). Because of this recent spoliation ruling, the Judge reduced the prior Rambus award from $397 million to $215 million, a cost to Rambus of $182 million.

Another well know example of sudden retention/disposition policy activity that caused unintended consequences is the Arthur Andersen/Enron example. During the Enron case, Enron’s accounting firm sent out the following email to some of its employees:

 

 

This email was a key reason why Arthur Andersen ceased to exist shortly after the case concluded. Arthur Andersen was charged with and found guilty of obstruction of justice for shredding the thousands of documents and deleting emails and company files that tied the firm to its audit of Enron. Less than 1 year after that email was sent, Arthur Andersen surrendered its CPA license on August 31, 2002, and 85,000 employees lost their jobs.

Learning from the Past – Defensible Disposal

These cases highlight the need for a true information governance process including a truly defensible disposal capability. In these instances, an information governance process would have been capturing, indexing, applying retention policies, protecting content on litigation hold and disposing of content beyond the retention schedule and not on legal hold… automatically, based on documented and approved legally defensible policies. A documented and approved process which is consistently followed and has proper safeguards goes a long way with the courts to show good faith intent to manage content and protect that content subject to anticipated litigation.

To successfully automate the disposal of unneeded information in a consistently defensible manner, auto-categorization applications must have the ability to conceptually understand the meaning in unstructured content so that only content meeting your retention policies, regardless of language, is classified as subject to retention.

Taking Defensible Disposal to the Next Level – Predictive Disposition

A defensible disposal solution which incorporates the ability to conceptually understand content meaning, and which incorporates an iterative training process including “train by example,” in a human supervised workflow provides accurate predictive retention and disposition automation.

Moving away from manual, employee-based information governance to automated information retention and disposition with truly accurate (95 to 99%) and consistent meaning-based predictive information governance will provide the defensibility that organizations require today to keep their information repositories up to date.

Predicting the Future of Information Governance


Information Anarchy

Information growth is out of control. The compound average growth rate for digital information is estimated to be 61.7%. According to a 2011 IDC study, 90% of all data created in the next decade will be of the unstructured variety. These facts are making it almost impossible for organizations to actually capture, manage, store, share and dispose of this data in any meaningful way that will benefit the organization.

Successful organizations run on and are dependent on information. But information is valuable to an organization only if you know where it is, what’s in it, and what is shareable or in other words… managed. In the past, organizations have relied on end-users to decide what should be kept, where and for how long. In fact 75% of data today is generated and controlled by individuals. In most cases this practice is ineffective and causes what many refer to as “covert orunderground archiving”, the act of individuals keeping everything in their own unmanaged local archives. These underground archives effectively lock most of the organization’s information away, hidden from everyone else in the organization.

This growing mass of information has brought us to an inflection point; get control of your information to enable innovation, profit and growth, or continue down your current path of information anarchy and choke on your competitor’s dust.

 

img-pred-IG

 

Choosing the Right Path

How does an organization ensure this infection point is navigated correctly? Information Governance. You must get control of all your information by employing the proven processes and technologies to allow you to create, store, find, share and dispose of information in an automated and intelligent manner.

An effective information governance process optimizes overall information value by ensuring the right information is retained and quickly available for business, regulatory, and legal requirements.  This process reduces regulatory and legal risk,  insures needed data can be found quickly and is secured for litigation,  reduces overall eDiscovery costs, and provides structure to unstructured information so that employees can be more productive.

Predicting the Future of Information Governance

Predictive Governance is the bridge across the inflection point. It combines machine-learning technology with human expertise and direction to automate your information governance tasks. Using this proven human-machine iterative training capability,Predictive Governance is able to accurately automate the concept-based categorization, data enrichment and management of all your enterprise data to reduce costs, reduce risks, enable information sharing and mitigate the strain of information overload.

Automating information governance so that all enterprise data is captured, granularity evaluated for legal requirements, regulatory compliance, or business value and stored or disposed of in a defensible manner is the only way for organizations to move to the next level of information governance.

Finding the Cure for the Healthcare Unstructured Data Problem


Healthcare information/ and records continue to grow with the introduction of new devices and expanding regulatory requirements such as The Affordable Care Act, The Health Insurance Portability and Accountability Act (HIPAA), and the Health Information Technology for Economic and Clinical Health Act (HITECH). In the past, healthcare records were made up of mostly paper forms or structured billing data; relatively easy to categorize, store, and manage.  That trend has been changing as new technologies enable faster and more convenient ways to share and consume medical data.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages.

Who or what is going to actually manage this growing mountain of unstructured information?

To insure regulatory compliance and the confidentiality and security of this unstructured information, the healthcare industry will have to 1) hire a lot more professionals to manually categorize and mange it or 2) acquire technology to do it automatically.

Looking at the first solution; the cost to have people manually categorize and manage unstructured information would be prohibitively expensive not to mention slow. It also exposes private patient data to even more individuals.  That leaves the second solution; information governance technology. Because of the nature of unstructured information, a technology solution would have to:

  1. Recognize and work with hundreds of data formats
  2. Communicate with the most popular healthcare applications and data repositories
  3. Draw conceptual understanding from “free-form” content so that categorization can be accomplished at an extremely high accuracy rate
  4. Enable proper access security levels based on content
  5. Accurately retain information based on regulatory requirements
  6. Securely and permanently dispose of information when required

An exciting emerging information governance technology that can actually address the above requirements uses the same next generation technology the legal industry has adopted…proactive information governance technology based on conceptual understanding of content,  machine learning and iterative “train by example” capabilities

 

The lifecycle of information


Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery.

In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago.

The figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – red shading).

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 1 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences.

The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data. So how much of this accumulated data needs to be retained? Again referring to figure 1 below, the blue shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Figure 1: The Lifecycle of data

The above lifecycle of data shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it automatically…

As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly, will never be realized. Even better, if data categorization can be based on the meaning of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Discoverable versus Admissible; aren’t they the same?


This question comes up a lot, especially from non-attorneys. The thought is that if something is discoverable, then it must be admissible; the assumption being that a Judge will not allow something to be discovered if it can’t be used in court. The other thought is that everything is discoverable if it pertains to the case and therefor everything is admissible.

Let’s first address what’s discoverable. For good cause, the court may order discovery of any matter (content) that’s not privileged relevant to the subject matter involved in the action. In layman’s terms, if it is potentially relevant to the case, you may have to produce it in discovery or in other words, anything and everything is potentially discoverable.  All discovery is subject to the limitations imposed by FRCP Rule 26(b)(2)(C).

With that in mind, let’s look at the subject of admissibility.

In Lorraine v. Markel Am. Ins. Co., 241 F.R.D. 534, 538 (D. Md. 2007), the court started with the premise that the admissibility of ESI is determined by a collection of evidence rules “that present themselves like a series of hurdles to be cleared by the proponent of the evidence”.  “Failure to clear any of these evidentiary hurdles means that the evidence will not be admissible”. Whenever ESI is offered as evidence, five evidentiary rules need to be considered. They are:

  • is relevant to the case
  • is authentic
  • is not hearsay pursuant to Federal Rule of Evidence 801
  • is an original or duplicate under the original writing rule
  • has probative value that is substantially outweighed by the danger of unfair prejudice or one of the other factors identified by Federal Rule of Evidence 403, such that it should be excluded despite its relevance.

Hearsay is defined as a statement made out of court that is offered in court as evidence to prove the truth of the matter asserted. Hearsay comes in many forms including written or oral statements or even gestures.

It is the Judge’s job to determine if evidence is hearsay or credible. There are three evidentiary rules that help the Judge make this determination:

  1. Before being allowed to testify, a witness generally must swear or affirm that his or her testimony will be truthful.
  2. The witness must be personally present at the trial or proceeding in order to allow the judge or jury to observe the testimony firsthand.
  3. The witness is subject to cross-examination at the option of any party who did not call the witness to testify.

The Federal Rules of Evidence Hearsay Rule prohibits most statements made outside of court from being used as evidence in court. Looking at the three evidentiary rules mentioned above – usually a statement made outside of the courtroom is not made under oath, the person making the statement outside of court is not present to be observed by the Judge, and the opposing party is not able to cross examine the statement maker. This is not to say all statements made outside of court are inadmissible. The Federal Rule of Evidence 801 does provide for several exclusions to the Hearsay rule.

All content is discoverable if it potentially is relevant to the case and not deemed privileged, but discovered content may be ruled inadmissible if it is deemed privileged (doctor/patient communications), unreliable or hearsay. You may be wondering how an electronic document can be considered hearsay? The hearsay rule refers to “statements” which can either be written or oral. So, as with paper documents, in order to determine whether the content of electronic documents are hearsay or fact, the author of the document must testify under oath and submit to cross-examination in order to determine whether the content is fact and can stand as evidence.

This legal argument between fact and hearsay does not relieve the discoveree from finding, collecting and producing all content in that could be relevant to the case.