Companies have data — lots of data — but a large amount of that data goes unused. According to Forbes, dark data is:
All the information companies collect in their regular business processes, don’t use, have no plans to use, but will never throw out.
Not using that data is like encouraging employees to sit idle. No organization consciously lets its team members waste their workday, so why does important data remain unused?
Some of the common reasons companies let data go to waste include not having the right tools to access the data, or feeling like there is simply too much data to process.
In other words, businesses do not have a clear strategy for how to access data across their entire company. They need a plan for how to manage the knowledge contained in those bytes of stored data.
Knowledge management.
The most concise definition of knowledge management comes from Tom Davenport, who wrote in 1994 that “Knowledge Management is the process of capturing, distributing, and effectively using knowledge.”
Since 1994, knowledge management has grown in importance. Today, knowledge is viewed as an enduring competitive advantage. Every company has a unique knowledge base, which sets it apart from the competition. However, a business’s ability to leverage that information is what determines its success.
Turning dark data into usable knowledge can lead to improved revenue and reduced operating costs. These improvements are achieved through increased employee productivity, streamlined processes, and better decision-making. All of these efforts can also result in increased customer satisfaction.
Accessing data.
If employees can’t access data, they can’t use it. This is why not being able to access data is the number one reason for dark data. But, finding the right tools can be a challenge because not all data is created equal. Data comes in many forms, such as security logs, emails, spreadsheets, and video files, presenting technical obstacles for data integration.
Structured vs. unstructured data.
The different formats in which data is stored are referred to as structured, unstructured, semi-structured, or meta-data. Each has its own unique characteristics.
- Structured Data. Computers love structured data. Every piece of data is in a specific location and is easy to find. Examples of structured data include spreadsheets and relational databases. Because of its tabular format, it is easy to aggregate data and return search results quickly.
- Unstructured Data. Computers are not so enamored with unstructured data because it has no pre-defined organizational pattern. Text-heavy audio and video files are examples of unstructured data. Searching through these files to find values to match filters is time-consuming and prone to error. However, new technologies such as machine learning have improved the ability to store and search unstructured data.
- Semi-structured Data. Data that does not fit a formal data-structure model, such as relational databases, but contains some self-defining structure is classified as semi-structured. Specific programming frameworks such as SML and JSON provide this infrastructure.
- Metadata. Metadata provides information about data. For example, graphic or video files can contain information on where and when the files were created. It can also include tags that point to a subject, such as a lion exhibit or running emu. Search tools look at the metadata when trying to respond to a specific request.
The different ways data is stored present a technical challenge when trying to develop tools for accessing data across an enterprise.
Fuzzy searches.
Online search engines are the best examples of fuzzy search capabilities. Fuzzy searches make assumptions about what the end-user is asking. How many times does Google return results with questions such as, “Did you mean cantaloupe?” These programs compensate for human error or inaccuracies.
For example, an employee is looking for a document that has the word “security-based analysis” in the title. Initial results show no exact matches. With a fuzzy search, the results would include “security based analysis,” “security-based analyses,” and “security based analyses.” The ability to include or exclude hyphens, for example, provides a better employee experience and eliminates the need for an employee to conduct multiple searches to find the desired document.
Data protection.
Part of data accessibility includes data security. Not everyone should have access to all information, especially in organizations where privacy protection laws apply. Educational organizations must comply with FERPA, and medical facilities have HIPPA. Any company accepting credit or debit card payments must adhere to PCI-DSS regulations.
Making sure that search results are in compliance with all regulatory guidelines adds to the technical challenges of enterprise-wide searches. Not only do technical tools have to search multiple data forms and intuit what the end-user meant to ask, but they also have to ensure that data is not accessible to unauthorized personnel.
Finding the best solutions for making dark data accessible requires locating companies that follow the best practices for delivering enterprise search capabilities. The right solution incorporates the latest in machine learning technologies to improve corporate-wide productivity.
Enterprise search challenges.
Enterprise searches are a powerful way to address the underutilization of dark data. By using a range of technologies, including artificial intelligence and machine learning, these tools can collect, index, search and return information from across an enterprise. With enterprise searches, employees can access the information they need in order to deliver solutions that improve the customer experience. Without such capabilities, it becomes increasingly difficult for businesses to compete with those companies that have developed a knowledge management strategy. The right strategy can help organizations realize the bottom-line profitability that comes when dark data becomes part of the decision-making process throughout the enterprise.