Sift through your unstructured text with cloud-native products, machine learning tools, or specialized text analytics programs. Credit: nikada / Getty Images Text analytics, sometimes called text data mining, is the process of uncovering insightful and actionable information, trends, or patterns from text. The extracted and structured data is much more convenient than the original text, making it easier to determine the information’s data quality and usefulness. Developers and data scientists can then use the mined data in downstream data visualizations, analytics, machine learning, and applications. Text analytics aims to identify facts, relationships, sentiments, or other contextual information. The types of information extracted often start with tagging entities such as people’s names, places, and products. It can advance to assigning topics, determining categories, and discovering sentiments. When measures such as currencies, dates, or quantities are extracted, establishing their relationship to other entities (and any qualifiers) is a key text analytics capability. Extracting data from documents versus form fields The hardest challenges in text analytics are processing enterprise repositories and large documents such as aggregated news from websites, corporate SEC filings, electronic health records, and other unstructured or semistructured documents. Parsing documents has some unique challenges as the document’s size and structure often dictate domain-specific preprocessing rules and NLP (natural language processing) algorithms. For example, categorizing a 1,000-word blog post is a lot easier than ranking all of the topics found in a book collection. Also, larger documents often require validating the extracted information based on context; for instance, the medical conditions of a patient should be categorized independently from the conditions listed in their family history. But what if you want to perform a potentially simpler task of extracting information from a form field or other short text snippet? Consider these possible scenarios: Quantify feedback from an employee survey’s open-ended responses Process social media posts for their sentiments about brands or products Categorize different types of chatbot interactions Assign topics to user stories on an agile backlog Route service desk requests based on the problem details Parse information submitted to marketing on your website These problems require more simplified algorithms than parsing documents because the text fields are identifiable, short, and often carry a specific type of information. Let’s say you need to leverage unstructured field data in an application or are asked to include insightful information extracted from text in a data visualization. Text analytics is an important first step, and agile data science teams often use spikes to conduct discovery work. The team needs tools, skills, and methodologies to perform text analytics. Here are three different approaches. 1. Use a public cloud’s NLP and cognitive services The major public clouds offer natural language processing and other cognitive services, so teams already working in these environments and skilled at using these algorithms should research these options. Azure Cognitive Services offers several related services. Form Recognizer can extract key/value pairs from text fields and documents, and Text Analytics can identify entities, sentiment, and key phrases. The more advanced Language Understanding capability can be used for developing NLP models in chatbot, mobile, and IoT applications. Google Cloud Platform has two separate natural language offerings. Developers can use the natural language API to analyze basic entities, extract sentiment, and categorize content into 700 predefined categories. The more advanced AutoML Natural Language creates custom categorization and sentiment models. AWS Comprehend has similar text analytics and NLP features with APIs for detecting entities, events, key phrases, topics, sentiments, and personally identifiable information. Developers and data scientists can also use Amazon SageMaker to test, train, and deploy NLP models such as BlazingText, BERT (Bidirectional Encoder Representations from Transformers), or SpaCy. IBM Watson Natural Language Understanding can extract entities, sentiment, categories, and concepts but also has more sophisticated features for identifying relations, emotions, and semantic roles. 2. Use text analytics tools in data integration and machine learning platforms If your organization invested in data integration, machine learning, or analytics platforms, then it’s likely one has some text analytics and NLP capabilities. Using these platforms may be an easier and faster way to perform lightweight text analytics, rather than coding to APIs or in data science notebooks. Here are some examples: Alteryx Designer has text mining functions for preprocessing, topic modeling, and sentiment analysis. IBM SPSS Modeler Text Analytics can be used for categorization and is a common tool in market research for processing survey responses. SAS Visual Text Analytics is a visual tool and open platform for parsing, information extraction, NLP modeling, sentiment analysis, and trend analysis. Other data science platforms such as RapidMiner, Knime, and Dataiku offer text mining functions natively, through plug-ins and integrations with public cloud services. 3. Use specialized text analytics tools If coding on public cloud platforms is too complex, and if your organization does not already have an analytics, data science, or machine learning platform with text mining capabilities, then you’re probably seeking a third option. Specialized text analytics tools may be the answer. Take a look at KeatText, Lexalytics, MeaningCloud, MonkeyLearn, NetOwl, Provalis Research, Rosette Text Analytics, and other platforms that offer text analytics capabilities. Text analytics is also common in customer experience, marketing automation, market research, social listening, chatbot, and other platforms that capture qualitative information around customers and sales prospects. It’s no surprise that many tools have text analytics capabilities. Some offer simple on-ramps with prebuilt models based on standardized entities, categories, and topics, whereas others enable robust model building. The platforms also differ by target use cases, with some focusing on specific industries, document types, integration requirements, or technology use cases. If you’re just getting started with text analytics, there are a few best practices. Begin any data and analytics discovery exercise by defining questions and target outcomes that potentially deliver business value. From there, consider the overall complexity of the document, content, and text fields that require processing, and examine the details around the target entities, topics, and semantics. Understanding the problem complexity can help separate whether an agile spike against a lightweight approach is viable or if a more extensive agile proof of concept co-constructed with text mining experts is needed. Most importantly, recognize that text analytics and natural language processing is a form of machine learning. Arriving at robust solutions requires experimenting with different algorithms, improving models, adding new data sources, and validating the results’ quality. For organizations trying to improve customer experiences, text analytics is an important capability to develop. Related content analysis 7 steps to improve analytics for data-driven organizations Effective data-driven decision-making requires good tools, high-quality data, efficient processes, and prepared people. Here’s how to achieve it. By Isaac Sacolick Jul 01, 2024 10 mins Analytics news Maker of RStudio launches new R and Python IDE Posit, formerly RStudio, has released a beta of Positron, a ‘next generation’ data science development environment based on Visual Studio Code. By Sharon Machlis Jun 27, 2024 3 mins Integrated Development Environments Python R Language feature 4 highlights from EDB Postgres AI New platform product supports transactional, analytical, and AI workloads. By Aislinn Shea Wright Jun 13, 2024 6 mins PostgreSQL Generative AI Databases analysis Microsoft Fabric evolves from data lake to application platform Microsoft delivers a one-stop shop for big data applications with its latest updates to its data platform. By Simon Bisson Jun 13, 2024 7 mins Microsoft Azure Natural Language Processing Data Architecture Resources Videos