Can ChatGPT Solve Information Management’s Biggest Challenge?

I have often spoken about the coming data privacy inflection point for information management in various blogs, articles, webinars, and podcasts. This inflection point involves the need to manage ALL data within an organization to ensure that new data privacy rights can be addressed.

Now, most organizations currently only capture and centrally manage between 5% and 10% of all data generated and received within the company. This means that between 90% and 95% of all corporate data is not tracked, indexed, or viewable by a central IT authority or Records Management personnel. In reality, this unmanaged data is controlled by individual employees and stored locally on employee devices without any visibility by records management or IT.

However, for companies to comply with the new data privacy laws (and the associated data subject rights), they will need access to all corporate data, including that data held by individual employees. Some have pushed back and questioned why not let the employees continue to manage their non-records – we’ll ask them for specific PII when we need to.

Two points here; first, most employees don’t manage their local data, i.e., keep it in a logical file structure in one specific location, attach keywords, and assign retention/disposition policies. This is usually due to a lack of training and time to spend performing records management duties.

And second, the number of end-user data subject access requests (DSAR) for information on their PII will skyrocket in the next couple of years, potentially reaching thousands of requests per year. This fact will make it nearly impossible to rely on employees to search for specific PII on a case-by-case basis.

Because of this current employee information management challenge, companies cannot perform a complete search for, report on, or delete PII when requested due to the lack of data visibility on employee devices.

With this new reality, information governance professionals are facing a formidable task – gaining access to and managing 5 to 10 times more data. However, in this article, I will discuss the potential of managing a great deal more information (after the access challenge is solved) using AI to categorize data, namely a look at ChatGPT for document auto-categorization.

ChatGPT and Information management

ChatGPT, a large language model developed by OpenAI, has received a great deal of press in the last several months. Because of the notoriety, I, and I’m sure many others, started to think about how AI could address the needs of this new information management reality.

I started this exercise by asking the ChatGPT AI the following question:

Can ChatGPT auto-categorize documents based on content and context?

The AI produced the following answer:

Yes, ChatGPT can be trained to auto-categorize documents based on their content and context. This is a common application of natural language processing (NLP) and machine learning, where the model is trained on a large dataset of categorized documents to learn patterns in the language and context that are associated with each category.

The process of auto-categorization typically involves several steps. First, the text of each document is preprocessed to extract relevant features, such as word frequencies, sentence structure, and topic keywords. Then, these features are fed into a machine learning algorithm, such as a decision tree or a neural network, that learns to classify the documents into predefined categories based on their features.

To train such a model, a large dataset of categorized documents is required. This dataset is used to train the machine learning algorithm by presenting it with examples of documents and their corresponding categories. The algorithm learns to recognize patterns in the language and context of the documents that are associated with each category, and it uses these patterns to predict the category of new, unseen documents.

Overall, auto-categorization is a powerful tool for organizing and analyzing large volumes of text data, and it has a wide range of applications in fields such as content management, marketing, and customer service.

This answer was eye-opening because it reminded me of my time at Recommind many years ago, an early player and pioneer in machine learning-based predictive coding for reviewing large evidence data sets for eDiscovery.

Early predictive coding involved providing the application with both correct and incorrect examples of case-responsive content. This process was called supervised machine learning and involved many training cycles to ensure the accuracy rate was in the 90% to 95% range so that Judges would readily accept the results.

But the payoff for eDiscovery was that responsive content could be very quickly and accurately found in gigantic data sets, dramatically reducing the cost of the eDiscovery review process. This cost savings was realized because attorneys no longer had to rely on groups of paralegals or expensive contract attorneys to read every page of content and determine relevance to the case.

With the eDiscovery/predictive coding example in mind, what if AI could accurately auto-categorize, tag, apply retention/disposition policies, and file vast amounts of data moving around an enterprise?

This would remove the requirement for employees to spend their time trying to manage their local content (which most don’t do anyway) while also making all the local data manageable and visible to central authorities.

For those unfamiliar with the concept of auto-categorization, it refers to the automatic classification, tagging, indexing, and management of documents based on their content and context.

In the case of the context of content in documents, let me offer an example some of you will be familiar with from years ago. How would an application that includes auto-categorization based on keywords (a standard capability right now) file a document that features the word Penguin? How would the application recognize that the document was referring to the black and white flightless bird, the publishing house, or the comic book Batman villain? By understanding the context of the additional content in the document, the AI would understand which Penguin the document was referring to and be able to categorize, tag and file it accurately.

Circling back (sorry for the reference) to the data privacy law inflection point, this capability can be particularly useful for all businesses that deal with, collect, use, and sell PII.

Traditionally (some) employees have manually performed data categorization and management, which can be time-consuming and less-than-accurate. And as I mentioned above, most employees don’t have the time or training to categorize and manage all the documents they encounter daily.

However, with the advent of new AI capabilities, such as ChatGPT and now the more advanced ChatGPT 4, consistent and accurate auto-categorization can now be done automatically across massive data sets – even potentially in real time.

One of the primary benefits of using ChatGPT for document auto-categorization is that it is incredibly consistent and accurate. ChatGPT is a machine learning model that has already been trained on vast amounts of data, and it can use this data to predict the correct category for each document.

Because ChatGPT has been trained on a very large dataset, it can recognize patterns and make highly accurate categorization predictions. This means businesses will be able to rely on ChatGPT to correctly and consistently categorize their documents without needing manual/employee intervention.

Another benefit of using ChatGPT for document auto-categorization is that it is incredibly fast. Speed is of the essence when dealing with huge volumes of live documents. This means that businesses can process their documents much more rapidly, improving efficiency and consistency and relieving employees of these non-productive requirements.

Additionally, because ChatGPT can quickly categorize documents, it can be utilized in real-time (live) data flows, which will be particularly useful for businesses that now must “read” and categorize much larger data set flows (live data) due to the data privacy laws.

Using ChatGPT for records auto-categorization will also lead to cost savings for businesses. Traditionally, document categorization has been done manually by employees, which can be inaccurate, time-consuming, and labor-intensive.

However, by using ChatGPT, organizations can free up employees to work on other tasks raising productivity. Additionally, because ChatGPT can categorize documents quickly and accurately, businesses can avoid costly errors arising from inaccurate manual document categorization.

Finally, ChatGPT is a machine-learning model that can learn and improve over time. As businesses continue to use ChatGPT for document categorization, the model will become more accurate and efficient, leading to even greater benefits in the long run. As ChatGPT continues to evolve, it will likely become even more sophisticated, which means that businesses can look forward to even more significant benefits in the future.

What this means for users and vendors

ChatGPT is quickly being built into many platforms, including Microsoft’s Bing search engine and the Azure Cloud infrastructure.

What does this mean for information/records management applications in the Azure Cloud? Soon vendors with native Azure applications will be able to design ChatGPT capabilities into their information management applications to provide highly accurate auto-categorization, tagging, litigation hold placement, field-level encryption (of PII), and retention/disposition policy placement.    

However, this is only half of the solution I referenced concerning the information management inflection point challenge. The other important requirement all companies will face is gaining access to and managing all corporate data, including that data controlled by individual employees.

The bottom line for information management application vendors is that using ChatGPT for records auto-categorization and related capabilities is a no-brainer because it will offer a wide range of benefits for businesses. From improved accuracy to faster processing times, greater employee productivity, and, most importantly, compliance with the new data privacy laws.

Those information management vendors that ignore or are slow to include these new capabilities will lose.


Challenges on the Horizon for Companies if the ADPPA does not make it into Law

The US House Energy and Commerce Committee approved the proposed American Data Privacy and Protection Act (ADPPA) by a 53-2 margin on July 20, 2022. With this accomplishment, the ADPPA has made it further along the federal legislative process than any other data privacy regulation in US history.

Both republicans and democrats in the House and the Senate support the bill, and its passage could radically change the privacy landscape in the US. Still, if not passed or its preemption clause is removed, companies doing business in the US could look at a highly complex environment to operate in.

If the ADPPA is signed into law, it will preempt all other state data privacy laws, which means businesses operating in the US would have one data privacy law to comply with instead of separate regulations from each state with differing definitions, timelines, and requirements. So, a business currently subject to the new Connecticut, California, Virginia, Utah, or Colorado laws law would instead need to comply with the single ADPPA. This preemption provision of the ADPPA would greatly simplify data privacy compliance in the US.

Preemption is a significant stumbling block in the ADPPA. Many states, namely California, don’t want their laws to be superseded by the ADPPA.

As technology advances and data privacy becomes increasingly important to individuals, the states are stepping up to create their data privacy laws. By the end of 2022, 5 states had enacted data privacy laws. However, in the first month of 2023, approx. 9 state legislatures had filed new data privacy bills. How many will be passed is anyone’s guess, but by the end of 2024, the majority of states will have passed data privacy bills.

If the ADPPA does not become law with the preemption provision, what does that means for businesses? With the prospect of most states having their own (differing) data privacy laws soon, companies collecting personal information will face more significant complexities and spiraling compliance costs.

Data privacy laws are designed to protect personal information from being misused or mishandled. By granting individuals greater control over how their data is collected, secured, used, and shared, the data privacy laws are expected to help to ensure that personal information remains secure and that businesses are held accountable for how they handle data.

Consider this; for an organization collecting personal information, they will need to individually track the individual’s state of residence, the consent that was received, when it was received, the individual state laws around the length of time it can be retained, and the differing state law definitions and exemptions.

Additionally, each state data privacy law includes specific data subject rights such as the right to query the company about the detailed personal information that has been collected on them as well as the right to have their personal information erased – if no other laws stop the erasure such as federal data retention requirements (financial services) and involvement in litigation. These rights are absolute, meaning an organization must fully comply – not just give it their best effort.

Companies will need to invest in new technologies and procedures in order to comply with the various state laws. In addition, they will likely need to hire additional staff to monitor compliance and ensure they follow all applicable individual laws. Implementing such measures will be expensive, especially for small businesses.

These new data privacy laws include data security requirements. New data security requirements could consist of implementing additional security measures like encryption (I hope), multi-factor authentication or, eventually zero-trust architectures. This will also ultimately mean providing more transparency into customer data use.

Furthermore, businesses will face greater legal liability if they fail to comply with state-level data privacy laws. Companies that fail to comply could face fines, civil penalties, or even criminal charges brought by the state Attorney Generals if found to violate the law. This could result in a significant financial hit for businesses as well as bad publicity, especially for companies that are not prepared for such an eventuality.

Finally, businesses will also experience a loss of customer trust if they fail to comply with state-level data privacy laws. As customers become more educated about how their data is being used, they may be less likely to trust a business if they feel their data is not adequately protected. This could lead to a loss of existing customers and a decrease in overall sales and profits.

Overall, the outcome for businesses if each state passes a data privacy law and the federal ADPPA law is passed and does not include preemption could be significant. Companies will no doubt face increased compliance costs, stricter regulations, greater legal liability, and a loss of customer trust if they are not in compliance with the law.

As such, businesses should be sure to prepare for these potential outcomes and ensure they comply with any applicable data privacy laws. Doing so can help to ensure their data remains secure and their customers remain confident in their data privacy practices.

Data Privacy Laws: An Inflection Point for Information Managers

I have written about this topic several times, but with recent changes, I wanted to jump into it again. The basic premise is that with the rising numbers of data privacy bills becoming law, the Information Management/Records Management profession will face managing much greater amounts of corporate data.

The progression of cloud-based computing and data management has led to an explosion of data collection, data selling, data analysis, and data hoarding (the opposite of data minimization) by companies worldwide. As a result, there has been growing concern about data security and privacy needs to catch up with the new cyber-theft technologies leading to the inevitable implementation of new data privacy laws. These more recent data privacy laws, such as the EUs GDPR and California’s CCPA/CPRA, are becoming an inflection point for the information management profession.

The Impact of Data Privacy Laws

Data privacy laws require companies to obtain consent from individuals before collecting and using their personal information (PI). They also require companies (if requested) to disclose how they will use this data and to allow individuals to access, correct, or delete their data upon request. Failure to comply with these laws can result in significant fines, legal action, and bad press.

The EU’s GDPR and California’s CCPA/CPRA data privacy laws have significantly impacted how companies collect and use data. They have forced companies to be more transparent about their data collection and use practices and to ensure that individuals have greater control over their PI. In addition, these laws have increased awareness of data privacy issues among individuals, leading to more informed decisions about how they share their personal information as well as increasing numbers of data subject access requests (DSARs) to be filed with companies.

With more states passing data privacy laws, data collectors are being forced to adapt to an increasingly complex data privacy landscape. Imagine being required to track each individual’s PI based on individual state data privacy definitions, rights, and requirements, including when consent was given and for what specific use.

Data Privacy Laws and Information Management

New privacy laws are beginning to have and will continue to significantly impact information management practices. Companies must now take a more strategic and inclusive approach to data collection and management, considering the potential legal and financial risks associated with non-compliance. This is leading to a necessary shift in the way companies think about and manage data, with a greater emphasis on data inclusion, governance, and compliance.

Data inclusion refers to the need for data not currently centrally managed by information management applications, such as that data held locally by employees on their individual workstations and laptops, to be included in ongoing information management activities.

Could employees be storing content that includes PI on their laptops?

Data governance refers to the policies, procedures, and technologies that enable organizations to manage their data assets. This includes data quality management, data security, and data privacy. With the implementation of data privacy laws, companies must now incorporate data privacy into their data governance strategies, ensuring that personal data is collected, used, and stored in a compliant manner.

Because of the new laws, companies will now be forced to manage ALL data within their environment, including all data held locally on employee devices.


Data subjects now have the right to query companies on what of their PI the company is storing, whether it has been sold, how it’s being used, and for what purposes. Data subjects now also have the right to have their PI permanently deleted (if there are no regulatory or legal requirements to keep it). These rights are absolute, meaning an organization must completely comply with data subject requests, not just give it their best try – all within a specific timeframe.

For example, what if Bob Smith filed a data subject access request (DSAR) asking if the company was storing any of his PI, and if so, requesting it is deleted? How would IT search all employee devices for all PI on John Smith?

Because of these new data privacy rights, companies will be forced to either somehow ensure all PI cannot be stored on local employee workstations or actively manage all employee data centrally. Besides the cultural impact on employee data, IT having access to all data on a laptop, indexing it for easy search, and applying retention/disposition policies will be a significant undertaking.

Consider that organizations currently manage 5-10% of all the corporate data, only that they consider “regulated records.” Now, IT and information management professionals will be looking at 10 to 20 times more data to manage with more complex and granular policies.

New ways to manage all corporate data

Data privacy laws have also led to the development of new technologies and solutions to manage personal data. For example, consent management platforms enable companies to obtain and manage consent from individuals for collecting and using their personal data.

Data mapping tools will help companies identify where personal data is located within their central enterprise and how it is used. But do these data mapping tools have the ability to scan individual employee laptops?

Additionally, “manage in place” applications rarely reach out to individual workstations – making total PI management impossible.

The Future of Information Management

Data privacy laws are just the beginning of a new era of information management. As technology continues to evolve, the amount of data collected and used by companies will only increase. This will require new strategies and solutions to ensure that personal data is managed in a compliant and secure manner.

One area of focus for the future of information management will be the use of artificial intelligence (AI) and machine learning (ML) to automate data privacy compliance. AI and ML can be used to analyze data collection and usage patterns, identify potential risks, and automate data subject access requests. This will enable companies to manage personal data more efficiently and effectively while reducing the risk of non-compliance.

Another area of focus for the future of information management will be the development of new technologies and solutions to protect personal data. This will include using blockchain technology, which can be used to create secure, decentralized systems for managing personal data. It will also include developing and using new data encryption technologies such as field-level encryption, secure multiparty computation, data masking, and homomorphic encryption – which allows encrypted data to be used without needing to decrypt.

This means that PI will need to be encrypted in transit, at rest, AND while in use, ensuring that the company and individual data subjects cannot be extorted by threatening to release their PI on the dark web.

These new security measures will help protect personal data from cyber theft, ransomware, and extortionware.

Effective data privacy is dependent on evolving data security

Data privacy laws are the new inflection point for the information management profession. The laws have forced companies to take a more strategic approach to data collection and management, incorporating data privacy and security into their data governance strategies. They have also led to the development of new technologies and solutions to manage personal data anywhere in the enterprise.

The amount of data collected and used by companies will only increase. Additionally, as data privacy laws and technology continue to evolve, organizational risk will continue to rise. This new environment will require new strategies and solutions to ensure that personal data is managed in a compliant and secure manner.

However, AI and ML will partially automate data privacy compliance, including who can move PI, where, and who can access it. AI will automatically recognize PI in documents, encrypt it with the correct permissions, and store it in special, secure repositories.

Additionally, AI/ML-assisted granular data security capabilities and more pervasive data encryption use will ensure cyber-theft and extortionware will be less successful, which will, in turn, possibly reduce cyber-liability insurance rates.

But information management professionals will quickly be dealing with a great deal more data to manage.

Office 365 Journaling to Create a Comprehensive eDiscovery Archive

Blog02212019_ WarehouseDoes your organization utilize Office 365 for email? Is your organization required to journal email for compliance, legal, or business requirements? Do your Attorneys complain about the time it takes to find information for an eDiscovery request? If the answer is yes to any of these questions, then keep reading. Continue reading

The Right to be Forgotten Versus The Need to Backup

Blog02072019A great deal has been written about the GDPR and CCPA privacy laws, both of which includes a “right to be forgotten.” The right to be forgotten is an idea that was put into practice in the European Union (EU) in May 2018 with the General Data Privacy Regulation (GDPR). Continue reading

The Lifecycle of Information – Updated

Organizations habitually over-retain information, especially unstructured electronic information, for all kinds of reasons. Many organizations simply have not addressed what to do with it so many of them fall back on relying on individual employees to decide what should be kept and for how long and what should be disposed of. On the opposite end of the spectrum a minority of organizations have tried centralized enterprise content management systems and have found them to be difficult to use so employees find ways around them and end up keeping huge amounts of data locally on their workstations, on removable media, in cloud accounts or on rogue SharePoint sites and are used as “data dumps” with or no records management or IT supervision. Much of this information is transitory, expired, or of questionable business value. Because of this lack of management, information continues to accumulate. This information build-up raises the cost of storage as well as the risk associated with eDiscovery. In reality, as information ages, it probability of re-use and therefore its value, shrinks quickly. Fred Moore, Founder of Horison Information Strategies, wrote about this concept years ago as the Lifecycle of Data. Figure 1 below shows that as data ages, the probability of reuse goes down…very quickly as the amount of saved data rises. Once data has aged 10 to 15 days, its probability of ever being looked at again approaches 1% and as it continues to age approaches but never quite reaches zero (figure 1 – blue shading).

Lifecycle of Data 1

Figure 1: The Lifecycle of Information

Contrast that with the possibility that a large part of any organizational data store has little of no business, legal or regulatory value. In fact the Compliance, Governance and Oversight Counsel (CGOC) conducted a survey in 2012 that showed that on the average, 1% of organizational data is subject to litigation hold, 5% is subject to regulatory retention and 25% had some business value (figure 2 – green shading). This means that approximately 69% of an organizations data store has no business value and could be disposed of without legal, regulatory or business consequences. The average employee creates, sends, receives and stores conservatively 20 MB of data per day. This means that at the end of 15 business days, they have accumulated 220 MB of new data, at the end of 90 days, 1.26 GB of data and at the end of three years, 15.12 GB of data (if they don’t delete anything). So how much of this accumulated data needs to be retained? Again referring to figure 2 below, the red shaded area represents the information that probably has no legal, regulatory or business value according to the 2012 CGOC survey. At the end of three years, the amount of retained data from a single employee that could be disposed of without adverse effects to the organization is 10.43 GB. Now multiply that by the total number of employees and you are looking at some very large data stores.

Lifecycle of Data 2

Figure 2: The Lifecycle of information Value

The above Lifecycle of Information Value graphic above shows us that employees really don’t need all of the data they squirrel away (because its probability of re-use drops to 1% at around 15 days) and based on the CGOC survey, approximately 69% of organizational data is not required for legal, regulatory retention or has business value. The difficult piece of this whole process is how can an organization efficiently determine what data is not needed and dispose of it using automation (because employees probably won’t)… As unstructured data volumes continue to grow, automatic categorization of data is quickly becoming the only realistic way to get ahead of the data flood. Without accurate automated categorization, the ability to find the data you need, quickly will never be realized. Even better, if data categorization can be based on the value of the content, not just a simple rule or keyword match, highly accurate categorization and therefore information governance is achievable.

Productivity and InfoGov; Are they Related?

SymbiosisYes they are. Employee productivity is adversely affected by a lack of information governance (IG) in two ways. First, without IG, employees spend time “managing” their work files, contacts, emails and attachments. This management time includes reviewing content, deciding whether a particular file or email should be kept or deleted, deciding how long required emails will be kept and where, and finally, moving these files to their final storage location. Many research organizations and experts have stated that this content management time is estimated to consume anywhere from two to four hours per week. Consider a conservative example of two hours per week for this activity: this translates to 104 hours per year per employee or, for an organization of 5,000 employees, 520,000 hours per year devoted to individually managing data – that may or may not have been performed efficiently or effectively.

A second measure of lost employee productivity is in the number of hours per week that employees spend searching for information within the enterprise. Organizations without a centrally managed information management capability usually don’t actively manage employee file shares. When searchable central indexes are not available, employees fall back on simple keyword searches – which rarely produce the results the employee is looking for in a timely manner, if at all. In some cases, stored information might not be found due to weak or incorrect search terms, poor file naming, or the fact that the file wasn’t actually saved at all (i.e. the employee just thought it was).

This lack of information management can cost an organization a great deal and not even realize it.

Are Law Firms the Weakest Link in the Information Security Chain?

Many law firms are unwittingly setting themselves up to be a prime target for cyber criminals. But it is not the firm’s data that hackers might be looking for – it is the huge volume of client data that law firms handle on a daily basis that make them so appealing for cyber criminals to target.

eDiscovery continues to generate huge, and ever-growing data sets of ESI for law firms to manage. Those data sets are often passed to the client’s law firm for processing, review and production. The end result is law firms are sitting on huge amounts of sensitive client data and if the firm is not diligent about managing it, securing it, and disposing of it at the conclusion of the case.  And absent serious reforms in the Rules of Civil Procedure, these data volumes will only continue to grow.

A 2014 ABA Legal Technology Survey Report found that 14% of law firms experienced a security breach in 2013 which included a lost or stolen computer or smartphone, a cyber-attack, a physical break in of website exploit event. That same survey reported that 45% of respondents had experienced a virus-based technology infection and boutique firms of 2 to 9 attorneys were the most likely to have experienced an infection. Law firms of 10 to 49 attorneys were the most likely to suffer security breaches.

A growing number of clients are demanding their law firms take data security more seriously and are laying down the law – “give us what we want or we will find another law firm that will…” Generally speaking, law firms have never been accused of being technology “early adopters” and while they still don’t need to be, they do need to take client (and firm) data security and management seriously and adopt technology and processes that will both satisfy their client’s rising expectations as well as their cyber insurance providers best practices.

At the end of the day, law firms should ask themselves a basic question: is my law firm prepared and equipped to protect our client’s data and if not, what’s the best strategy for my law firm going forward?

For more detail on this topic, download the Paragon white paper on this subject.

Email Use Policies: The beginning of the end?

A December 2014 National Labor Relations Board (NLRB) decision in reference to the Purple Communications, Inc. case might have started the decline of employer’s rights over how their property and systems can be used by employees.

In the 2007 Guard Publishing decision, the NLRB held that the National Labor Relations Act does not give employees the right to use an employer’s email system for union-related business, i.e., activity not related to the running of the business. Partly because of this decision, employers have regularly created and enforced email use policies that forbid the use of the employer’s email system for anything other than actual company business. This decision was supposedly based on the NLRB’s comparison of an employer’s bulletin board, telephone system, copy machines and PA systems to the employer’s email system. In other words, employees did not have carte blanche to utilize these other systems for non-business-related activities either.

The NLRB Purple Communications decision reversed the 2007 ruling and held that employees do now have the presumptive right to use their employer’s email system for non-work NLRB-protected purposes. But does this decision also reverse the practice of employers restricting the use of the other systems (copy machines, bulletin boards, etc.) to strictly business-related purposes?

There are several points to keep in mind before taking over your employer’s copy machine to print 1,000 garage sale flyers.

  •  The 2014 NLRB-Purple Communications decision was limited to email systems only.
  • The 2014 NLRB-Purple Communications decision was limited to actual employees of the company—not family members or anyone else.
  • The 2014 NLRB-Purple Communications decision relates to activities protected by the National Labor Relations Act, i.e., union-related activities only.
  • The NLRB invalidated the prior validity of prohibitions of the non-work use of company physical property such as the previously mentioned copy machines, bulletin boards, and telephone systems.

Another interesting fact from the 2014 case is that the NLRB (re)confirmed an employer’s right to monitor its email system for “legitimate management purposes” and that employees continue to have no expectation of privacy in their use of the employer’s email system. But the NLRB stated that the employer may not increase employee email monitoring during union-organizing campaigns or focus monitoring activities on “protected” conduct or union activists specifically.

Obviously, the NLRB decision was directed specifically to companies with union membership and activities. But this raises the question of the use of employer equipment and systems for non-union-related activities. Will this decision be used to erode employer restrictions on the use of company property in the future?

InfoGov: Productivity Gains Equal Revenue Gains

A great deal has been written on lost productivity and the benefits of information governance. The theory being that an information governance program will raise employee productivity thereby saving the organization money. This theory is pretty well accepted based on the common sense realization and market data that information workers spend many hours per week looking for information to do their jobs. One data point comes from a 2013 Wortzmans e-Discovery Feed blog titled “The Business Case for Information Governance – Reduce Lost Productivity! that states employees spend up to nine hours per week (or 1 week per month or 12 weeks per year) looking for information. The first question to consider is how much of that time searching for information could be saved with an effective information governance program?

InfoGov Productivity Savings

Three months out of every year spent looking for information seems a little high… so what would a more conservative number be for time spent searching for information? In my travels through the archiving, records management, eDiscovery, and information governance industries, I have spoken to many research analysts and many, many more customers and have generally seen numbers in the 2 to 4 hours per week range thrown around. Assuming the four hours per week estimate, the average employee spends 208 hours per year (26 working days or 5.2 weeks) looking for information. Let’s further assume that an effective information governance program that would capture, index, store, and manage (including disposal), of all ESI per centralized policies would save 50% of the time employees spend looking for information (not an unrealistic estimate in my humble opinion), or 104 hours per year (13 days or 2.6 weeks). To bring this number home, let’s dollarize employee time.

Table 1 lays out the assumptions we will use for the productivity calculations including the average annual and hourly salary per employee.

Blog 08082014 t1







Table 2 below shows the calculations based on the assumptions in table 1 for weekly and annual time periods.

Blog 08082014 t2






Assuming a work force of 1000 employees at this company, the total annual cost of search is $7.5 million. Assuming a 50% increase in search productivity gives us an estimated $3.75 million saving from recovered employee productivity. In most cases, a $3.75 million annual savings would more than pay for an effective information governance program for a company of 1000 employees. But that potential savings is only a third of the recoverable dollars.

Another productivity cost factor is the amount of time spent recreating data that couldn’t be found (but existed) during search. Additional variables to be used for calculations include:

Blog 08082014 t3




Most employees will agree that a certain percentage of their search time is spent looking for information they don’t find…until well after their need has passed. This number is very hard to estimate but based on my own experience, I use a percentage of 40%. The other important variable is the amount of time (as a percentage) spent actually recreating the data you couldn’t find. In other words, the percentage of time (200%) of hours spent searching for information but not finding it (table 3).

Blog 08082014 t4





Table 4 above lays out the calculations showing the total hours wasted recreating data that should have been found of 166,400 across the entire company or $6 million. The assumption is that this wasted time spent recreating data not found would be reduced to zero with an effective information governance program.

So far the estimated saving based on recovered productivity (if they adopted an information governance program) for this company of 1000 employees is $3.75 million plus $6 million or $9.75 million (table 5).

Blog 08082014 t5




The last (and most controversial) calculation is based on the revenue opportunity cost or in other words; what additional revenue could be generated with a productivity recovery increase in employee hours? For these calculations we need an additional number; the annual revenue for the company. Divide this by the number of employees and you will get the average revenue per employee and the average revenue per employee per hour (table 6).

Blog 08082014 t6





How Does Productivity Affect Revenue

The last variable that needs an explanation is the “discount factor for revenue recovery” (table 6). This discount factor is based on the assumption that every recovered hour will not equal an additional (one for one) average revenue per employee per hour. Common sense tells us this will not happen but common sense also tells us that employees that are more productive generate more revenue. So in this example, I will use revenue recovery discount factor of 60% or 40% of the above $101.92 per hour number. This is met to impose a degree of believability to the calculation.

To calculate the total (discounted) recoverable revenue from improved information search we use the following formula: Estimated recoverable productivity hours for wasted search time * (the average revenue per hour per employee – (1 – the revenue recovery discount factor)) or 104,000*($101.92*(1-60%)) which equals $4,239,872 or $4.24 million.

Calculating the (discounted) recovered revenue from productivity gains from recreating data not found we will use the following formula: Estimated total hours spent recreating data not found * (1 minus the revenue recovery discount factor * the average revenue per employee per hour or (166,400*(1-60%)*$101.92) equals $6,784,000.

So to wrap up this painful experiment in math, the potential dollar savings and increased revenue from the adoption of an information governance program is:

Blog 08082014 t7



The point of this discussion was to explore the potential of using the concept of recovered revenue from increases in productivity from the more effective management of information – information governance. You may (probably) disagree with the numbers used, but I think the point of calculating an InfoGov ROI using recovered revenue due to productivity gains… is realistic.