The Lifecycle of Information – Updated


Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery. In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago as the Lifecycle of Data. Figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – blue shading).

Lifecycle of Data 1

Figure 1: The Lifecycle of Information

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 2 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences. The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data (if they don’t delete anything). So how much of this accumulated data needs to be retained? Again referring to figure 2 below, the red shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Lifecycle of Data 2

Figure 2: The Lifecycle of information Value

The above Lifecycle of Information Value graphic above shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it using automation (because employees probably won’t)… As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only realistic way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly will never be realized. Even better, if data categorization can be based on the value of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Advertisement

Productivity and InfoGov; Are they Related?


SymbiosisYes they are. Employee productivity is adversely affected by a lack of information governance (IG) in two ways. First, without IG, employees spend time “managing” their work files, contacts, emails and attachments. This management time includes reviewing content, deciding whether a particular file or email should be kept or deleted, deciding how long required emails will be kept and where, and finally, moving these files to their final storage location. Many research organizations and experts have stated that this content management time is estimated to consume anywhere from two to four hours per week. Consider a conservative example of two hours per week for this activity: this translates to 104 hours per year per employee or, for an organization of 5,000 employees, 520,000 hours per year devoted to individually managing data – that may or may not have been performed efficiently or effectively.

A second measure of lost employee productivity is in the number of hours per week that employees spend searching for information within the enterprise. Organizations without a centrally managed information management capability usually don’t actively manage employee file shares. When searchable central indexes are not available, employees fall back on simple keyword searches – which rarely produce the results the employee is looking for in a timely manner, if at all. In some cases, stored information might not be found due to weak or incorrect search terms, poor file naming, or the fact that the file wasn’t actually saved at all (i.e. the employee just thought it was).

This lack of information management can cost an organization a great deal and not even realize it.

InfoGov: Productivity Gains Equal Revenue Gains


A great deal has been written on lost productivity and the benefits of information governance. The theory being that an information governance program will raise employee productivity thereby saving the organization money. This theory is pretty well accepted based on the common sense realization and market data that information workers spend many hours per week looking for information to do their jobs. One data point comes from a 2013 Wortzmans e-Discovery Feed blog titled “The Business Case for Information Governance – Reduce Lost Productivity! that states employees spend up to nine hours per week (or 1 week per month or 12 weeks per year) looking for information. The first question to consider is how much of that time searching for information could be saved with an effective information governance program?

InfoGov Productivity Savings

Three months out of every year spent looking for information seems a little high… so what would a more conservative number be for time spent searching for information? In my travels through the archiving, records management, eDiscovery, and information governance industries, I have spoken to many research analysts and many, many more customers and have generally seen numbers in the 2 to 4 hours per week range thrown around. Assuming the four hours per week estimate, the average employee spends 208 hours per year (26 working days or 5.2 weeks) looking for information. Let’s further assume that an effective information governance program that would capture, index, store, and manage (including disposal), of all ESI per centralized policies would save 50% of the time employees spend looking for information (not an unrealistic estimate in my humble opinion), or 104 hours per year (13 days or 2.6 weeks). To bring this number home, let’s dollarize employee time.

Table 1 lays out the assumptions we will use for the productivity calculations including the average annual and hourly salary per employee.

Blog 08082014 t1

 

 

 

 

 

 

Table 2 below shows the calculations based on the assumptions in table 1 for weekly and annual time periods.

Blog 08082014 t2

 

 

 

 

 

Assuming a work force of 1000 employees at this company, the total annual cost of search is $7.5 million. Assuming a 50% increase in search productivity gives us an estimated $3.75 million saving from recovered employee productivity. In most cases, a $3.75 million annual savings would more than pay for an effective information governance program for a company of 1000 employees. But that potential savings is only a third of the recoverable dollars.

Another productivity cost factor is the amount of time spent recreating data that couldn’t be found (but existed) during search. Additional variables to be used for calculations include:

Blog 08082014 t3

 

 

 

Most employees will agree that a certain percentage of their search time is spent looking for information they don’t find…until well after their need has passed. This number is very hard to estimate but based on my own experience, I use a percentage of 40%. The other important variable is the amount of time (as a percentage) spent actually recreating the data you couldn’t find. In other words, the percentage of time (200%) of hours spent searching for information but not finding it (table 3).

Blog 08082014 t4

 

 

 

 

Table 4 above lays out the calculations showing the total hours wasted recreating data that should have been found of 166,400 across the entire company or $6 million. The assumption is that this wasted time spent recreating data not found would be reduced to zero with an effective information governance program.

So far the estimated saving based on recovered productivity (if they adopted an information governance program) for this company of 1000 employees is $3.75 million plus $6 million or $9.75 million (table 5).

Blog 08082014 t5

 

 

 

The last (and most controversial) calculation is based on the revenue opportunity cost or in other words; what additional revenue could be generated with a productivity recovery increase in employee hours? For these calculations we need an additional number; the annual revenue for the company. Divide this by the number of employees and you will get the average revenue per employee and the average revenue per employee per hour (table 6).

Blog 08082014 t6

 

 

 

 

How Does Productivity Affect Revenue

The last variable that needs an explanation is the “discount factor for revenue recovery” (table 6). This discount factor is based on the assumption that every recovered hour will not equal an additional (one for one) average revenue per employee per hour. Common sense tells us this will not happen but common sense also tells us that employees that are more productive generate more revenue. So in this example, I will use revenue recovery discount factor of 60% or 40% of the above $101.92 per hour number. This is met to impose a degree of believability to the calculation.

To calculate the total (discounted) recoverable revenue from improved information search we use the following formula: Estimated recoverable productivity hours for wasted search time * (the average revenue per hour per employee – (1 – the revenue recovery discount factor)) or 104,000*($101.92*(1-60%)) which equals $4,239,872 or $4.24 million.

Calculating the (discounted) recovered revenue from productivity gains from recreating data not found we will use the following formula: Estimated total hours spent recreating data not found * (1 minus the revenue recovery discount factor * the average revenue per employee per hour or (166,400*(1-60%)*$101.92) equals $6,784,000.

So to wrap up this painful experiment in math, the potential dollar savings and increased revenue from the adoption of an information governance program is:

Blog 08082014 t7

 

 

The point of this discussion was to explore the potential of using the concept of recovered revenue from increases in productivity from the more effective management of information – information governance. You may (probably) disagree with the numbers used, but I think the point of calculating an InfoGov ROI using recovered revenue due to productivity gains… is realistic.