What exactly is ‘text analytics’?
If you asked ten different researchers this question, you would get ten different answers. For me, I prefer using the term ‘text mining’ as a way to explain this form of research. It’s a great analogy, because in mining the goal is to extract ore from the earth, and similarly when analyzing text, the goal is to search for valuable extracts from a vast, unstructured data set. Essentially, you are giving structure to something that is unstructured.
Alongside text mining, there is natural language processing (NLP), which can be applied as part of text mining—the distinction between these two methods is where debate exists among researchers. I find NLP as a valuable tool in text mining, as it allows for parts of speech tagging, among other things. Unlike with numbers, text is not naturally set up as a matrix—but, through sentence parsing, keyword extraction, word counts, and other text summaries, you can start to create matrices of words and terms.
‘Text analytics’ is a phrase that encompasses a lot of analytical methods, as well as tools—in a holistic sense, what is the true power of text analytics?
Text is our natural language—words are the way in which we communicate and the way in which we think. When using words, you can express complex ideas in much more detailed ways. This creates a very rich data set, which is also less distorted and more realistic than just capturing numbers or numbered scales.
Many times, we try to conform our thinking into how computers process things, which can distort meaning, instead of the other way around—using computers to conform to how we naturally communicate and think, through language and words. Text mining allows us to gain richer insights into what people think.
What are some of the opportunities you see where text analytics can be useful in higher education?
Course evaluations are a perfect example of a “low hanging fruit” in higher education where text mining can provide incredibly valuable insights, as students are allowed to provide completely autonomous feedback. If, for instance, a professor has 200 students across a semester it becomes increasingly difficult to read every single evaluation, let alone compare topics across them. But, by running a sentiment analysis on those evaluations, you can extract common themes and feelings. Because of the students’ autonomy, this summarization often contains additional information that the numeric portion of the survey did not address.
We often also think of mining data that is generated by students, but it’s also valuable to analyze even the questions developed for surveys. Categorizing questions into buckets allows for comparisons between differently worded questions that are getting at the same construct. Course outcomes are another area where institutions have rich datasets that could benefit from text analytics. You can run analyses to identify if they are measurable and map where soft skills are developed through a course’s outcomes.
There’s also a lot of potential for colleges and universities to utilize text analytics tools in streamlining FAQs. We are conditioned, because of tools like Siri and Alexa, to verbally ask a question and immediately receive a response. Students, especially, don’t expect to have to search in depth for answers to common questions. There’s also the opportunity to map themes within student involvement data and make further recommendations on what clubs or events in which they should be involved, or even courses to take that align with their interests.
How can text analytics contribute specifically to cross-campus student success initiatives?
There is a lot of potential in using text analytics to explore student interests and motivation. Not only can we identify what someone is interested in, but we can start to reference those interests in an array of learning opportunities. For instance, when I was a teacher I learned what my students were interested in—then I knew that with certain students, to have them engage more fully, I could help them learn through the context of their specific interests. You can do that on a much larger scale with text analyses.
By examining what type of co-curricular activities someone is involved in, what interests are highlighted in admissions essays, the descriptions of courses someone registers for—we can suggest better co-curricular activities, involvement opportunities or even courses that better align to interests and previous engagement. Interests are what link together our experiences, and all student experiences are connected in some way. This creates incredible value to students as well by emphasizing a well-rounded educational experience.
Do you have to be a data scientist to effectively perform text analyses of data?
No. But, it really depends on the level of analysis someone wants to undertake. It absolutely does help to have a strong research background or training. That being said, there are a lot of resources and educational opportunities that can help people grow in their abilities—it takes time and effort. There are also different levels of tools. For instance, most people are familiar with word clouds, which are a simple way to gain high-level insight into text.
What’s the first step for someone to take if they want to utilize text analytics on their campus?
I would suggest that someone really ask themselves what they are interested in and what questions they want answered. There are costs with collecting data and the tools needed to analyze it. It’s also important to ask, “Do I have the data already?” If you are brand new to text analytics and the questions you want answered are in depth, it’s important to recognize that you might not have the ability to undertake analyses to the desired degree. That’s when you might consider partnering with a researcher with a strong text analysis background to help do analyses.
Also, examine the data readiness on your campus. Silos exist in higher education, identify who owns the data you will need and whether or not relationships and policies exist for that data to be shared. It doesn’t matter how useful or innovative your questions are if you can’t get access to data in the first place. Most importantly, it’s important to think about what your course of action is when you do have the information you want—how are you going to act on it? It doesn’t matter what kind of analysis you do if you don’t act on it.
What resources are available for someone to learn more about exploring text analyses of their data?
Again, the first step is to start assessing what it is you want to analyze and explore the data readiness of your campus. Beyond that it really depends on the depth you want to get into. I highly recommend anyone interested in text analytics read the book The Secret Life of Pronouns: What Our Words Say About Us.
Further, there is a great TED Talk by MIT researcher Deb Roy called “The birth of a word,” that explores the topic of language and how we learn. The Stanford Natural Language Processing Group also has several online resources for more information. And there are several online tools to start basic-level text analyses, including Wordle, a wordcloud tool, and Voyant Tools, which allows you to analyze different trends in writing.
Interested in learning more about text analytics?
Join Campus Labs at the 2018 Data & Analytics Summit, presented by Achieving the Dream. This two-day event is designed for student success and data-minded professionals at two- and four-year institutions.
And don’t miss “Unlock the Power of Your Data through Text Analytics,” our interactive pre-conference workshop led by members of the Campus Labs data science team—you will learn how to use free resources to conduct text analyses and unlock new insights from the data you already collect, including course evaluations, early alerts and more.
Tyler Rinker, PhD, leads the data science team at Campus Labs, working closely with both our Campus Success consultants and the product development team. His areas of expertise include text analysis, computational discourse analysis, multimodal analysis, data visualization, as well as engagement, motivation, and feedback. To refine his research methods, he uses R, an open-source programming language and software environment for statistical computing and graphics. When not at the office doing analysis for our Member Campuses, he blogs about data science best practices.