Bring your dark data out of the shadows


NosferatuShadowDark data, otherwise known as unstructured, unmanaged, and uncategorized information is a major problem for many organizations. Many organizations don’t have the will or systems in place to automatically index and categorize their rapidly growing unstructured dark data, especially in file shares, and instead rely on employees to manually manage their own information. This reliance on employees is a no-win situation because employees have neither the incentive nor the time to actively manage their information.

Organizations find themselves trying to figure out what to do with huge amounts of dark data, particularly when they’re purchasing TBs of new storage annually because they’ve run out.

Issues with dark data:

  • Consumes costly storage space and resources – Most medium to large organizations provide terabytes of file share storage space for employees and departments to utilize. Employees drag and drop all kinds of work related files (and personal files like personal photos, MP3 music files, and personal communications) as well as PSTs and work station backup files. The vast majority of these files are unmanaged and are never looked at again by the employee or anyone else.
  • Consumes IT resources – Personnel are required to perform nightly backups, DR planning, and IT personnel to find or restore files employees could not find.
  • Masks security risks – File shares act as “catch-alls” for employees. Sensitive company information regularly finds its way to these repositories. These file shares are almost never secure so sensitive information like personally identifiable information (PII), protected health information (PHI, and intellectual property can be inadvertently leaked.
  • Raises eDiscovery costs – Almost everything is discoverable in litigation if it pertains to the case. The fact that tens or hundreds of terabytes of unindexed content is being stored on file shares means that those terabytes of files may have to be reviewed to determine if they are relevant in a given legal case. That can add hundreds of thousands or millions of dollars of additional cost to a single eDiscovery request.

To bring this dark data under control, IT must take positive steps to address the problem and do something about it. The first step is to look to your file shares.

Advertisement

Infobesity in the Healthcare Industry: A Well-Balanced Diet of Predictive Governance is needed


Fat TwitterWith the rapid advances in healthcare technology, the movement to electronic health records, and the relentless accumulation of regulatory requirements, the shift from records management to information governance is increasingly becoming a necessary reality.

In a 2012 CGOC (Compliance, Governance and Oversight Counsel) Summit survey, it was found that on the average 1% of an organization’s data is subject to legal hold, 5% falls under regulatory retention requirements and 25% has business value. This means that 69% of an organization’s ESI is not needed and could be disposed of without impact to the organization. I would argue that for the healthcare industry, especially for covered entities with medical record stewardship, those retention percentages are somewhat higher, especially the regulatory retention requirements.

According to an April 9, 2013 article on ZDNet.com, by 2015, 80% of new healthcare information will be composed of unstructured information; information that’s much harder to classify and manage because it doesn’t conform to the “rows & columns” format used in the past. Examples of unstructured information include clinical notes, emails & attachments, scanned lab reports, office work documents, radiology images, SMS, and instant messages. Despite a push for more organization and process in managing unstructured data, healthcare organizations continue to binge on unstructured data with little regard to the overall health of their enterprises.

So how does this info-gluttony, (the unrestricted saving of unstructured data because data storage is cheap and saving everything is just easier), affect the health of the organization? Obviously you’ll look terrible in horizontal stripes, but also finding specific information quickly (or at all) is impossible, you’ll spend more on storage, data breaches will could occur more often, litigation/eDiscovery expenses will rise, and you won’t want to go to your 15th high school reunion…

To combat this unstructured info-gain, we need an intelligent information governance solution – STAT!  And that solution must include a defensible process to systematically dispose of information that’s no longer subject to regulatory requirements, litigation hold requirements or because it no longer has business value.

To enable this information governance/defensible disposal Infobesity cure, healthcare information governance solutions must be able to extract meaning from all of this unstructured content, or in other words understand and differentiate content conceptually. The automated classification/categorization of unstructured content based on content meaning cannot accurately or consistently differentiate the meaning in electronic content by simply relying on simple rules or keywords. To accurately automate the categorization and management of unstructured content, a machine learning capability to “train by example” is a precondition. This ability to systematically derive meaning from unstructured content as well as machine learning to accurately automate information governance is something we call “Predictive Governance”.

A side benefit of Predictive Governance is (you’ll actually look taller) previously lost organizational knowledge and business intelligence can be automatically compiled and made available throughout the organization.