Approximately 90% of the data in a company remains unused. In many cases, the existence of this data is not even known. However, it can be crucial for a company to address this “Dark Data,” as it harbors important information that can enhance competitiveness, generate revenue potential, and reduce risks.
The causes of Dark Data are diverse. With the advent of digitalization, the number of data sources and the volume of data collected have significantly increased. Simultaneously, the costs of data storage have plummeted. Many companies prophylactically store all kinds of data for later (yet never conducted) analyses. Data and documents are often copied, modified, and stored in various silos and versions within the company, accessible only to specific employees or departments. Some data is erroneous, others simply become obsolete and are never deleted. In the worst-case scenario, data that could provide real value is simply ignored—whether because it is unknown, cannot be found, is inaccessible, or because the necessary resources and skills for analysis are lacking.
To uncover these hidden data treasures, you first need to find them—either by conducting a comprehensive inventory of all data within the company (a data assessment) or by specifically searching for specific information using appropriate data retrieval and information retrieval tools and methods, given the necessary access.
Next, it’s essential to separate the wheat from the chaff, or more precisely, distinguish ROT data from business-relevant data. “ROT” stands for redundant, obsolete, trivial, referring to outdated, damaged, or erroneous data without value. These primarily represent a cost factor and should be deleted. Additionally, appropriate data management should prevent the future accumulation of ROT data.
Dark Data is business-relevant if it could represent risks or opportunities. Risks arise, for example, when “forgotten” data is subject to lower security standards and becomes a target for hackers. If the hacked data includes personal information, it can quickly become very costly for a company. Personal data, in light of the GDPR, should generally be quickly findable, modifiable, exportable, and deletable (right to access, rectification, transfer, and erasure).
Conversely, opportunities can abound within the data: for instance, call recordings from customer service or email complaints can provide important insights into pain points, customer sentiment, or optimization potentials for products and services. Log files offer clues about website visitor behavior and ways to improve website performance. Geodata can reconstruct customer movement patterns and be utilized for further business planning (geo-tagging).
For evaluating the diverse structured and unstructured data, traditional statistical methods as well as data mining and text mining techniques can be employed. This way, patterns (classifications, segmentations, forecasts, dependency analyses, deviation analyses) can be identified, and key topics, sentiments, and trends can be pinpointed and subsequently leveraged for marketing, sales, and service.
Naturally, a company can approach its Dark Data project-wise initially. In the long run, however, data usage should be strategically planned and firmly anchored within the company. This requires a comprehensive data strategy aligned with the specific business strategy, setting the framework for data retention, data quality assurance, data provisioning, and data usage, ensuring that IT architecture, organizational structure, and data value creation seamlessly integrate.
If you have any questions or need support in uncovering your hidden data treasures, please feel free to contact us now!