The Lifecycle of Information – Updated


Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery. In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago as the Lifecycle of Data. Figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – blue shading).

Lifecycle of Data 1

Figure 1: The Lifecycle of Information

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 2 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences. The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data (if they don’t delete anything). So how much of this accumulated data needs to be retained? Again referring to figure 2 below, the red shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Lifecycle of Data 2

Figure 2: The Lifecycle of information Value

The above Lifecycle of Information Value graphic above shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it using automation (because employees probably won’t)… As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only realistic way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly will never be realized. Even better, if data categorization can be based on the value of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Advertisement

Infobesity in the Healthcare Industry: A Well-Balanced Diet of Predictive Governance is needed


Fat TwitterWith the rapid advances in healthcare technology, the movement to electronic health records, and the relentless accumulation of regulatory requirements, the shift from records management to information governance is increasingly becoming a necessary reality.

In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI is not needed and could be disposed of without impact to the organization. I would argue that for the healthcare industry, especially for covered entities with medical record stewardship, those retention percentages are somewhat higher, especially the regulatory retention requirements.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages. Despite a push for more organization and process in managing unstructured data, healthcare organizations continue to binge on unstructured data with little regard to the overall health of their enterprises.

So how does this info-gluttony, (the unrestricted saving of unstructured data because data storage is cheap and saving everything is just easier), affect the health of the organization? Obviously you’ll look terrible in horizontal stripes, but also finding specific information quickly (or at all) is impossible, you’ll spend more on storage, data breaches will could occur more often, litigation/eDiscovery expenses will rise, and you won’t want to go to your 15th high school reunion…

To combat this unstructured info-gain, we need an intelligent information governance solution – STAT!  And that solution must include a defensible process to systematically dispose of information that’s no longer subject to regulatory requirements, litigation hold requirements or because it no longer has business value.

To enable this information governance/defensible disposal Infobesity cure, healthcare information governance solutions must be able to extract meaning from all of this unstructured content, or in other words understand and differentiate content conceptually. The automated classification/categorization of unstructured content based on content meaning cannot accurately or consistently differentiate the meaning in electronic content by simply relying on simple rules or keywords. To accurately automate the categorization and management of unstructured content, a machine learning capability to “train by example” is a precondition. This ability to systematically derive meaning from unstructured content as well as machine learning to accurately automate information governance is something we call “Predictive Governance”.

A side benefit of Predictive Governance is (you’ll actually look taller) previously lost organizational knowledge and business intelligence can be automatically compiled and made available throughout the organization.

Finding the Cure for the Healthcare Unstructured Data Problem


Healthcare information/ and records continue to grow with the introduction of new devices and expanding regulatory requirements such as The Affordable Care Act, The Health Insurance Portability and Accountability Act (HIPAA), and the Health Information Technology for Economic and Clinical Health Act (HITECH). In the past, healthcare records were made up of mostly paper forms or structured billing data; relatively easy to categorize, store, and manage.  That trend has been changing as new technologies enable faster and more convenient ways to share and consume medical data.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages.

Who or what is going to actually manage this growing mountain of unstructured information?

To insure regulatory compliance and the confidentiality and security of this unstructured information, the healthcare industry will have to 1) hire a lot more professionals to manually categorize and mange it or 2) acquire technology to do it automatically.

Looking at the first solution; the cost to have people manually categorize and manage unstructured information would be prohibitively expensive not to mention slow. It also exposes private patient data to even more individuals.  That leaves the second solution; information governance technology. Because of the nature of unstructured information, a technology solution would have to:

  1. Recognize and work with hundreds of data formats
  2. Communicate with the most popular healthcare applications and data repositories
  3. Draw conceptual understanding from “free-form” content so that categorization can be accomplished at an extremely high accuracy rate
  4. Enable proper access security levels based on content
  5. Accurately retain information based on regulatory requirements
  6. Securely and permanently dispose of information when required

An exciting emerging information governance technology that can actually address the above requirements uses the same next generation technology the legal industry has adopted…proactive information governance technology based on conceptual understanding of content,  machine learning and iterative “train by example” capabilities

 

The lifecycle of information


Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery.

In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago.

The figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – red shading).

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 1 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences.

The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data. So how much of this accumulated data needs to be retained? Again referring to figure 1 below, the blue shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Figure 1: The Lifecycle of data

The above lifecycle of data shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it automatically…

As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly, will never be realized. Even better, if data categorization can be based on the meaning of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Healthcare Information Governance Requires a New Urgency


From safeguarding the privacy of patient medical records to ensuring every staff member can rapidly locate emergency procedures, healthcare organizations have an ethical, legal, and commercial responsibility to protect and manage the information in their care. Inadequate information management processes can result in:

  • A breach of protected health information (PHI) costing millions of dollars and ruined reputations.
  • A situation where accreditation is jeopardized due to a team-member’s inability to demonstrate the location of a critical policy.
  • A premature release of information about a planned merger causing the deal to fail or incurring additional liability.

The benefits of effectively protecting and managing healthcare information are widely recognized but many organizations have struggled to implement effective information governance solutions. Complex technical, organizational, regulatory and cultural challenges have increased implementation risks and costs and have led to relatively high failure rates.  Ultimately, many of these challenges are related to information governance.

In January 2013, The U.S. Department of Health and Human Services published a set of modifications to the HIPAA privacy, security, enforcement and breach notification rules.  These included:

  • Making business associates directly liable for data breaches
  • Clarifying and increasing the breach notification process and penalties
  • Strengthening limitations on data usage for marketing
  • Expanding patient rights to the disclosure of data when they pay cash for care

Effective Healthcare Information Governance steps

Inadvertent or just plain sloppy non-compliance with regulatory requirements can cost your healthcare organization millions of dollars in regulatory fines and legal penalties. For those new to the healthcare information governance topic, below are some suggested steps that will help you move toward reduced risk by implementing more effective information governance processes:

  1. Map out all data and data sources within the enterprise
  2. Develop and/or refresh organization-wide information governance policies and processes
  3. Have your legal counsel review and approve all new and changed policies
  4. Educate all employees and partners, at least annually, on their specific responsibilities
  5. Limit data held exclusively by individual employees
  6. Audit all policies to ensure employee compliance
  7. Enforce penalties for non-compliance

Healthcare information is by nature heterogeneous. While administrative information systems are highly structured, some 80% of healthcare information is unstructured or free form.  Securing and managing large amounts of unstructured patient as well as business data is extremely difficult and costly without an information governance capability that allows you to recognize content immediately, classify content accurately, retain content appropriately and dispose of content defensibly.

Coming to Terms with Defensible Disposal; Part 1


Last week at LegalTech New York 2013 I had the opportunity to moderate a panel titled: “Defensible Disposal: If it doesn’t exist, I don’t have to review it…right?” with an impressive roster of panelists. They included: Bennett Borden, Partner, Chair eDiscovery & Information Governance Section, Williams Mullen, Clifton C. Dutton, Senior Vice President, Director of Strategy and eDiscovery, American International Group and John Rosenthal, Chair, eDiscovery and Information Management Practice, Winston & Strawn and Dean Gonsowski, Associate General Counsel, Recommind Inc.

During the panel session it was agreed that organizations have been over-retaining ESI (which accounts for at least 95% of all data in organizations) even if it’s no longer needed for business or legal reasons. Other factors driving this over-retention of ESI were the fear of inadvertently deleting evidence, otherwise called spoliation. In fact an ESG survey published in December of 2012 showed that the “fear of the inability to furnish data requested as part of a legal or regulatory matter” was the highest ranked reason organizations chose not to dispose of ESI.

Other reasons cited included not having defined policies for managing and disposing of electronic information and adversely, organizations having defined retention policies to actually keep all data indefinitely (usually because of the fear of spoliation).

One of the principal information governance gaps most organizations haven’t yet addressed is the difference between “records” and “information”. Many organizations have “records” retention/disposition policies to manage those official company records required to be retained under regulatory or legal requirements. But those documents and files that fall under legal hold and regulatory requirements amount to approximately 6% of an organization’s retained electronic data (1% legal hold and 5% regulatory).

Another interesting survey published by Kahn Consulting in 2012 showed levels of employee understanding of their information governance-related responsibilities. In this survey only 21% of respondents had a good idea of what information needed to be retained/deleted and only 19% knew how  information should be retained or disposed of. In that same survey, only 15% of respondents had a general idea of their legal hold and eDiscovery responsibilities.

The above surveys highlight the fact that organizations aren’t disposing of information in a systematic process mainly because they aren’t managing their information, especially their electronic information and therefore don’t know what information to keep and what to dispose of.

An effective defensible disposal process is dependent on an effective information governance process. To know what can be deleted and when, an organization has to know what information needs to be kept and for how long based on regulatory, legal and business value reasons.

Over the coming weeks, I will address those defensible disposal questions and responses the LegalTech panel discussed. Stay tuned…