data analytics word cloud

While it takes human judgment to determine the usefulness of statistical analysis, analyzing quantitative data using the many statistical methods available to us is a relatively straightforward process. Analyzing qualitative data, on the other hand, is not so clear-cut. Qualitative data usually come in the form of words and are collected from a variety of sources, depending on the type of research being conducted.

Questionnaires, surveys, interviews and focus groups are some of the ways used to collect information and opinions from people on specific topics. Historical documents are sources of data that researchers use to analyze patterns and compare the current state of things to the past. Literary works have been analyzed to compare styles of authors and find common themes. The data collected are made understandable and useful through text analysis. Below are some of the common methods of text analysis. A good starting point for further understanding of text analysis is the Duke University Libraries text analysis website.

Uses of Text Analysis

Text analysis is often used to detect recurring words and common themes in various qualitative data sets (such as responses to surveys, newspaper and magazine articles, books and research papers). The researcher uses the recurrences to sort the data into categories. Depending on the type of research, this may be the end of the project or further research is undertaken to delve deeper into the themes. For example, literary scholars may use text analysis on multiple works by the same author to determine if he or she uses certain words more often than might be expected and then, once the most common words are found, run text analysis on works of other authors to determine if they also used those words in their writing.

Advertisers and public relations professionals analyze responses to surveys to determine consumers’ reactions to products and advertising campaigns. They can use the results of their analysis to modify the message or, depending on the situation, the product itself. For example, if consumers consistently use specific words or phrases to describe a product, the advertisers may use those same words in product descriptions or promotional materials.

Text analysis is also used in determining the reading level of books and other written works. Such features as the length of sentences and words are used to provide a level of reading based on years in school. (For example, a reading level of 10.1 indicates a student in the first month of the tenth grade should be able to understand the text.) Studies on the reading (or education) level of the State of the Union addresses by the Presidents of the United States have been conducted and show a general decline in the level. The Guardian website shows the education level of each address, from George Washington’s first one in 1790, which had an education level of 20.4, to Barack Obama’s most recent speech in 2016, which had an education level of 10.1. The address with the highest education level came in 1815 by James Madison at 25.3. The lowest was George H.W. Bush’s speech in 1992 at 7.6.

In the security world, messages written by criminals and suspected criminals can be analyzed to determine patterns of speech, which can be clues to the identity of the person or people. Text analysis can also be used in search for secret codes. An early use of text analysis helped Allied analysts break the German and Japanese codes, which allowed the Allies to decode intercepted messages and plan their defensive moves or go on the offense before the enemy attacks.

Software

Text analysis can be carried out by a computer or by people, depending on the depth of analysis being performed. Computer software has become more and more sophisticated and can carry out simple analyses, such as word counts (how often each word or specific words appear in the document), and complex analyses, like those that determine reading levels. Text analytics are common features of data mining software. These programs look for patterns and connections between such things as search terms inputted by a consumer.

Proprietary and open-source software is available for text analysis. The type of software needed will depend on the goals of the research (and, possibly, budget constraints). The Predictive Analytics Today website provides a review of 62 software programs that provide text analysis. Many of the proprietary software providers allow for free trials, so before purchasing software, be sure to check it out and make sure it does what you need it to do.

text analysis word cloud

A word cloud is a visual analysis of the words in a document. There are numerous word cloud generators, such as the WorditOut website.

Limitation

As with any type of research, text analysis has its limitations. One of them, particularly for historical and literary research, is that the sources (such as old books and public records) are not in a form that can be read by computer software. Turning printed text into machine readable format can be a time consuming and costly process, so the data sources for such research could be limited or take the researcher (and her assistants) time to analyze the data by hand.

Learn More

If topics such as text analysis are of interest to you, you may want to consider pursuing a degree in data science.