Getting started in text analytics can be challenging. The amount of unstructured content in most organizations is overwhelming and usually poorly organized and understood. Attempting to analyze all of the unstructured content within an organization in hopes of finding “insights” which rise to the surface will end in unclear results and project failure.
Another difficulty in starting a text analytics project or program is securing the necessary knowledge and skills. While there are many roles with overlapping skill sets within an organization, it’s rare to find a resource who has the prerequisite knowledge to perform text analytics and use the required software at an enterprise scale.
Even with the right people, most text analytics endeavors fail due to lack of clear direction and results, not because of insufficient technology or analysis. What can you do to make a good start in text analytics which will benefit your organization? Most of the steps involved in getting started in text analytics should be familiar to you and are what you might do to start any information project.
Identify the Problem
What are you trying to do? What is keeping you from accomplishing your goals? Surprisingly, it’s common to have undefined goals and a technology solution in search of a problem. The real aim of the program gets lost.
First, articulate a clear problem statement. This could be a single sentence or many related statements. Once the challenge is identified and clearly stated, it makes it easier to address possible solutions. For example, here is a possible problem statement: “Our product improvement time to market is slow, because we know there are customer complaints, but the analysis and response to these complaints is too slow.”
In this scenario, a company has a product which could be improved, but the improvements come slowly because they aren’t collected, analyzed, or acted upon in a timely fashion.
Once a problem is identified, there must be goals established to understand when the challenge is rectified and to measure the improvement. For example, in response to the problem statement above, a goal statement might be: “We would like to reduce our product improvement time to market by identifying issues from customer reviews, act on them more efficiently, and cut the time to market by 3 months.” The goal statement presents a clear objective, a measurable goal, and points toward a body of content which can help solve the problem.
In this scenario, it becomes clear that customer feedback has to be identified and analyzed. The next questions, naturally, are: “Where do I find this content? Do I own it or do I need to find it externally? Do I have access to this content? How do I analyze this content more quickly and efficiently?” And so on.
Simply knowing and clearly stating the problem and the goal can help set the boundaries for the program and avoid scope creep.
The Knowledge Audit
Though not exclusive to text analytics projects, conducting a knowledge audit can lay the groundwork for a successful text analytics project. A knowledge or content audit is a comprehensive effort to identify and characterize content within an organization. The goal is to map the information terrain and to assess it as it stands. Patrick Lambe, of Straits Knowledge, describes knowledge audits in this short video. A knowledge audit typically involves a combination of subject matter expert interviews, system identification and analysis, process analysis, and content analysis. The results of a knowledge audit might unveil issues with current procedures, the way content is stored, accessed, and archived, or content coverage vital to the business.
A knowledge audit might come before or after identifying the problem. It may come about, for instance, that a knowledge audit for one initiative uncovers content and processes which might be essential to another project. A knowledge audit may also follow identifying the problem by honing in on particular aspects of the findings. In our previous example, a knowledge audit may be strictly focused on content which relates to customer reviews and only target processes and systems which touch this content directly. In this case, it might include customer center calls and transcripts, marketing department purchased third party product reviews, or other information streams which go directly to the product development department.
The Proof of Concept
Text analytics can both benefit and suffer from scale. When there is a large amount of known and clean information being analyzed, the results can be visualized and presented clearly for quick decisions. Prior to that scenario, however, poking around blindly in huge repositories of unstructured text may reveal very little of interest. Hence, starting with a narrowly focused proof of concept (PoC) involving smaller quantities of text and clear objectives can lead to better results. PoCs are generally smaller scale and involve less risk in time, upfront costs, or disappointing results. In addition, the complex nature of text analytics initiatives benefit from a PoC by setting a framework for communicating the work and findings to both executives and working teams.
A text analytics PoC can include a few hundred hand-selected documents of interest, preferably documents which set a gold standard for what information is being sought. In addition, existing taxonomies or ontologies can help jump-start the work by framing the concepts important to the organization.
The goal of the PoC is to show text analytics processes can be honed to arrive at the desired, meaningful results. Don’t expect every text analytics PoC to be a success, though, as the nature of the effort is to determine the feasibility of the project to address the original problem statement. Even a failed PoC is valuable as it can reveal a lot about the organization, such as not having the appropriate content to address the problem, the inability of the organization to adapt to new techniques and processes, or a lack of maturity in the information landscape to handle a text analytics program.
Typically, a PoC is conducted by an external consultancy or software vendor specializing in text analytics. However, this doesn’t preclude an internally conducted PoC if the right skill sets and access to software are available.
Upon completing a PoC, set a plan in motion to scale up with the resulting methodology. There are many ways to surface the results of text analytics, including reports sent directly to business stakeholders of interest, visualization dashboards, and results within search query results. Whatever the means for consuming the results of text analytics processes, there must be processes and governance in place to scale the PoC.
One method of scaling up is to identify a key function or group who will benefit from the results. This will likely be the same group who sponsored the PoC in the first place, and they will be the first to reap the benefits of the resulting program. Whether done through ground up expansion or from top-down directive, the text analytics program can be rolled out function by function or location by location and adapted in each case.
Another key to scaling up and maintaining a proper text analytics program rather than a one-off effort is to build the right team. I outline what that entails in this blog. The text analytics team may comprise several roles in several functions, but the key is to have a clear charter, clear governance and processes, and a path forward to scale the effort across the enterprise.
Text analytics projects can be challenging, but waiting until they get easier isn’t practical. Start outlining a path now to get the most out of your knowledge assets.