Infobesity in the Healthcare Industry: A Well-Balanced Diet of Predictive Governance is needed

Fat TwitterWith the rapid advances in healthcare technology, the movement to electronic health records, and the relentless accumulation of regulatory requirements, the shift from records management to information governance is increasingly becoming a necessary reality.

In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI is not needed and could be disposed of without impact to the organization. I would argue that for the healthcare industry, especially for covered entities with medical record stewardship, those retention percentages are somewhat higher, especially the regulatory retention requirements.

According to an April 9, 2013 article on, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages. Despite a push for more organization and process in managing unstructured data, healthcare organizations continue to binge on unstructured data with little regard to the overall health of their enterprises.

So how does this info-gluttony, (the unrestricted saving of unstructured data because data storage is cheap and saving everything is just easier), affect the health of the organization? Obviously you’ll look terrible in horizontal stripes, but also finding specific information quickly (or at all) is impossible, you’ll spend more on storage, data breaches will could occur more often, litigation/eDiscovery expenses will rise, and you won’t want to go to your 15th high school reunion…

To combat this unstructured info-gain, we need an intelligent information governance solution – STAT!  And that solution must include a defensible process to systematically dispose of information that’s no longer subject to regulatory requirements, litigation hold requirements or because it no longer has business value.

To enable this information governance/defensible disposal Infobesity cure, healthcare information governance solutions must be able to extract meaning from all of this unstructured content, or in other words understand and differentiate content conceptually. The automated classification/categorization of unstructured content based on content meaning cannot accurately or consistently differentiate the meaning in electronic content by simply relying on simple rules or keywords. To accurately automate the categorization and management of unstructured content, a machine learning capability to “train by example” is a precondition. This ability to systematically derive meaning from unstructured content as well as machine learning to accurately automate information governance is something we call “Predictive Governance”.

A side benefit of Predictive Governance is (you’ll actually look taller) previously lost organizational knowledge and business intelligence can be automatically compiled and made available throughout the organization.


The ROI of Conceptual Search

After people, information is a company’s most valuable asset. But many are asking; “what’s in that information?”, “who controls it?”, “can others access it?”, and “is it a risk to keep?”, “for how long?”. The vast majority of information in any organization is not managed, not indexed, and is rarely–if ever–accessed.

Companies exist to create and utilize information. Do you know where all your organization’s information is, what’s in it, and most importantly, can those that need it find and access it? If your employees can’t find when they need it, then the return on investment (ROI) for that information is zero. How much higher could the ROI be if your employees could actually find and share data effortlessly?

Enterprise search – The mindless regurgitation of keyword matches

Enterprise search is the organized query/retrieval of information from across an organization’s enterprise data systems. Data sources include e-mail servers, application databases, content management systems, file systems, intranet sites and many others. Legacy enterprise search systems provide users the ability to query organizational data repositories utilizing keyword-based inquiries that returns huge results sets that then have to be manually filtered by the user until they find what they were looking for (if they actually find it).

A sizeable drawback to a keyword-based search is that it will return all keyword matches even though they may be conceptually different – false positives.

What is conceptual search?

A conceptual search is used to search electronically stored information for information that is conceptually a match or similar to the information represented in a search query as opposed to a keyword search where only documents with exact keyword matches are returned. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query regardless of shared terms or language.

Cost savings – Concept versus keyword search

Employees are rarely capable of constructing keyword and Boolean searches that return the data they are looking for immediately. Because of this fact, time is wasted in actually finding what they were looking for. IDC has estimated that using a higher quality enterprise search capability can save up to 53.4% of time spent searching for data. Many have argued conceptual search can save even more time because conceptual search more closely models how humans think and therefor will return more meaningful results quicker.

Alan Greenspan, a past Chairman of the Federal Reserve, once stated “You’re entitled to your own opinions, but not to your own facts”. Return on Investment calculations are only as good as the reliability of the variables used to calculate it. To calculate ROI, the benefit (return) of an investment is divided by the cost of the investment – the result is expressed as a percentage.

Enterprise Search ROI calculations require the following data points:

•           The total cost of the current enterprise search process used

•           The total cost of the new enterprise search process after the investment is in place

•           The total cost of the new enterprise search investment

The actual ROI formula looks like this:

Return on investment is an often asked for but little understood financial measure. Many equate cost savings to ROI but cost savings is only a part of the equation. ROI also includes looking at the cost of the solution that produced the savings. ROI lets you compare returns from various investment opportunities to make the best investment decision for your available dollars.

Organizations run on information. If information is easier to find and use, the organization profits from it.

Predicting the Future of Information Governance

Information Anarchy

Information growth is out of control. The compound average growth rate for digital information is estimated to be 61.7%. According to a 2011 IDC study, 90% of all data created in the next decade will be of the unstructured variety. These facts are making it almost impossible for organizations to actually capture, manage, store, share and dispose of this data in any meaningful way that will benefit the organization.

Successful organizations run on and are dependent on information. But information is valuable to an organization only if you know where it is, what’s in it, and what is shareable or in other words… managed. In the past, organizations have relied on end-users to decide what should be kept, where and for how long. In fact 75% of data today is generated and controlled by individuals. In most cases this practice is ineffective and causes what many refer to as “covert orunderground archiving”, the act of individuals keeping everything in their own unmanaged local archives. These underground archives effectively lock most of the organization’s information away, hidden from everyone else in the organization.

This growing mass of information has brought us to an inflection point; get control of your information to enable innovation, profit and growth, or continue down your current path of information anarchy and choke on your competitor’s dust.




Choosing the Right Path

How does an organization ensure this infection point is navigated correctly? Information Governance. You must get control of all your information by employing the proven processes and technologies to allow you to create, store, find, share and dispose of information in an automated and intelligent manner.

An effective information governance process optimizes overall information value by ensuring the right information is retained and quickly available for business, regulatory, and legal requirements.  This process reduces regulatory and legal risk,  insures needed data can be found quickly and is secured for litigation,  reduces overall eDiscovery costs, and provides structure to unstructured information so that employees can be more productive.

Predicting the Future of Information Governance

Predictive Governance is the bridge across the inflection point. It combines machine-learning technology with human expertise and direction to automate your information governance tasks. Using this proven human-machine iterative training capability,Predictive Governance is able to accurately automate the concept-based categorization, data enrichment and management of all your enterprise data to reduce costs, reduce risks, enable information sharing and mitigate the strain of information overload.

Automating information governance so that all enterprise data is captured, granularity evaluated for legal requirements, regulatory compliance, or business value and stored or disposed of in a defensible manner is the only way for organizations to move to the next level of information governance.

The lifecycle of information

Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery.

In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago.

The figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – red shading).

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 1 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences.

The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data. So how much of this accumulated data needs to be retained? Again referring to figure 1 below, the blue shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Figure 1: The Lifecycle of data

The above lifecycle of data shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it automatically…

As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly, will never be realized. Even better, if data categorization can be based on the meaning of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Next Generation Technologies Reduce FOIA Bottlenecks

Federal agencies are under more scrutiny to resolve issues with responding to Freedom of Information Act (FOIA) requests.

The Freedom of Information Act provides for the full disclosure of agency records and information to the public unless that information is exempted under clearly delineated statutory language. In conjunction with FOIA, the Privacy Act serves to safeguard public interest in informational privacy by delineating the duties and responsibilities of federal agencies that collect, store, and disseminate personal information about individuals. The procedures established ensure that the Department of Homeland Security fully satisfies its responsibility to the public to disclose departmental information while simultaneously safeguarding individual privacy.

In February of this year, the House Oversight and Government Reform Committee opened a congressional review of executive branch compliance with the Freedom of Information Act.

The committee sent a six page letter to the Director of Information Policy at the Department of Justice (DOJ), Melanie Ann Pustay. In the letter, the committee questions why, based on a December 2012 survey, 62 of 99 government agencies have not updated their FOIA regulations and processes which was required by Attorney General Eric Holder in a 2009 memorandum. In fact the Attorney General’s own agency have not updated their regulations and processes since 2003.

The committee also pointed out that there are 83,000 FOIA request still outstanding as of the writing of the letter.

In fairness to the federal agencies, responding to a FOIA request can be time-consuming and expensive if technology and processes are not keeping up with increasing demands. Electronic content can be anywhere including email systems, SharePoint servers, file systems, and individual workstations. Because content is spread around and not usually centrally indexed, enterprise wide searches for content do not turn up all potentially responsive content. This means a much more manual, time consuming process to find relevant content is used.

There must be a better way…

New technology can address the collection problem of searching for relevant content across the many storage locations where electronically stored information (ESI) can reside. For example, an enterprise-wide search capability with “connectors” into every data repository, email, SharePoint, file systems, ECM systems, records management systems allows all content to be centrally indexed so that an enterprise wide keyword search will find all instances of content with those keywords present. A more powerful capability to look for is the ability to search on concepts, a far more accurate way to search for specific content. Searching for conceptually comparable content can speed up the collection process and drastically reduce the number of false positives in the results set while finding many more of the keyword deficient but conceptually responsive records. In conjunction with concept search, automated classification/categorization of data can reduce search time and raise accuracy.

The largest cost in responding to a FOIA request is in the review of all potentially relevant ESI found during collection. Another technology that can drastically reduce the problem of having to review thousands, hundreds of thousands or millions of documents for relevancy and privacy currently used by attorneys for eDiscovery is Predictive Coding.

Predictive Coding is the process of applying machine learning and iterative supervised learning technology to automate document coding and prioritize review. This functionality dramatically expedites the actual review process while dramatically improving accuracy and reducing the risk of missing key documents. According to a RAND Institute for Civil Justice report published in 2012, document review cost savings of 80% can be expected using Predictive Coding technology.

With the increasing number of FOIA requests swamping agencies, agencies are hard pressed to catch up to their backlogs. The next generation technologies mentioned above can help agencies reduce their FOIA related costs while decreasing their response time.

Healthcare Information Governance Requires a New Urgency

From safeguarding the privacy of patient medical records to ensuring every staff member can rapidly locate emergency procedures, healthcare organizations have an ethical, legal, and commercial responsibility to protect and manage the information in their care. Inadequate information management processes can result in:

  • A breach of protected health information (PHI) costing millions of dollars and ruined reputations.
  • A situation where accreditation is jeopardized due to a team-member’s inability to demonstrate the location of a critical policy.
  • A premature release of information about a planned merger causing the deal to fail or incurring additional liability.

The benefits of effectively protecting and managing healthcare information are widely recognized but many organizations have struggled to implement effective information governance solutions. Complex technical, organizational, regulatory and cultural challenges have increased implementation risks and costs and have led to relatively high failure rates.  Ultimately, many of these challenges are related to information governance.

In January 2013, The U.S. Department of Health and Human Services published a set of modifications to the HIPAA privacy, security, enforcement and breach notification rules.  These included:

  • Making business associates directly liable for data breaches
  • Clarifying and increasing the breach notification process and penalties
  • Strengthening limitations on data usage for marketing
  • Expanding patient rights to the disclosure of data when they pay cash for care

Effective Healthcare Information Governance steps

Inadvertent or just plain sloppy non-compliance with regulatory requirements can cost your healthcare organization millions of dollars in regulatory fines and legal penalties. For those new to the healthcare information governance topic, below are some suggested steps that will help you move toward reduced risk by implementing more effective information governance processes:

  1. Map out all data and data sources within the enterprise
  2. Develop and/or refresh organization-wide information governance policies and processes
  3. Have your legal counsel review and approve all new and changed policies
  4. Educate all employees and partners, at least annually, on their specific responsibilities
  5. Limit data held exclusively by individual employees
  6. Audit all policies to ensure employee compliance
  7. Enforce penalties for non-compliance

Healthcare information is by nature heterogeneous. While administrative information systems are highly structured, some 80% of healthcare information is unstructured or free form.  Securing and managing large amounts of unstructured patient as well as business data is extremely difficult and costly without an information governance capability that allows you to recognize content immediately, classify content accurately, retain content appropriately and dispose of content defensibly.