Data in Higher Education Series | Episode 9
The Data Science behind Text Analytics: Building a Bridge for Untapped Data
Published February 25, 2019
What can you do with the power of text? JD White, PhD, Vice President of Product Management, and Tyler Rinker, PhD, Lead Data Scientist, discuss the buzz around text analytics, its increasing use on campus, and the power it can give you when assessing your qualitative data.
What is text analytics?
When you consider text analytics, you really have to talk about the problem first. Humans communicate in natural language that’s both efficient and effective. But the problem is computers can’t understand us. Campuses have massive stores of data—50% of our data is qualitative text data—but computers don’t understand it out of the box. Text analytics is the bridge between the rich, natural human language and puts it into a form that computers can understand so we can analyze it and make meaning out of it.
What types of data do we have?
Student types of data like course evaluation comments first come to mind, but there’s more to it than that. Students also make social responses through emails, texts or on Twitter. They also generate text on club and event descriptions. Faculty generate text on course outcomes, programs and course descriptions, and rubrics. At the institution level there’s text data like the questions on course evaluations, which is text data about the campus that it feels is important. We can look at forms, applications, newsfeeds, website FAQs-there’s really a ton of data on campus.
We also see a concept of a comprehensive student record. This is about making sure a student’s experience and record of their experience doesn’t just include steps inside the classroom, but also the steps outside the classroom. Organizations like NASPA, Lumina Foundation, and AACRAO (American Association of Collegiate Registrars and Admissions Officers), are really thinking about how to bring together those two components.
What’s the difference between natural language processing and text analytics?
Success is one part and being able to reach across campus is another part.
Let’s touch on success first. In the past, we’ve been good at collecting quantitative information—GPAs, SATs, survey responses—that give us a student snapshot. But natural language is rich with information about the student, as well. We can use this to make better decisions for the student or even policies across campus. If we can analyze the text data, we get a more complete understanding of students.
It’s also great for combining information across campus. What if we could combine things like quality of entrance essays or sentiment scores from course evaluation comments? We ask things on a numeric scale about agreeing or disagreeing, but when you do sentiment analysis you can capture things like frustration, sadness, or disconnection from campus. These are rich layers we can connect across campus domains. Open-ended questions give the student autonomy where they can give you insights you might not get from definitive questions.
What are some definitive text analytics uses?
There is a lot of interest around sentiment analysis. Campuses wonder: Are there positive or negative feelings at our institution? They realize they can take this large quantity of text and start condensing and summarizing it into a way they can understand.
Also, readability is important—people want to know how understandable their own text is. When they can understand how difficult or easy it is, lightbulbs will start to come on about how that may affect student success. If there’s a high level of reading comprehension it might be hard for students to process the question itself, affecting the response.
How can decision makers leverage the use of text analytics?
Qualitative data is rich language data but it’s difficult to scale up and read closely. The higher up you get at the institution level the harder it may be to take these qualitative sources of data and make meaning across them. But what if can start applying quantitative methods to that qualitative data? That’s what text analytics is. It makes it scalable to do summarizations and give a more complete picture about what’s happening on campuses and to students.
What are the key takeaways about text analytics?
i) There’s a ton of untapped, potential data – 50% of the data on your campus is text data, giving you a more complete picture of the student if you can bring it into decision making.
ii) Text analytics unlocks that potential – It can take that qualitative information and puts it in a format that computers can understand.
iii) Text analytics is accessible – Lots of people will think it’s not something they can do, but while at the Data and Analytics summit we provided a web app that allowed campuses to engage in sentiment analysis, topic modeling, readability scoring, and word counts.
Use the app https://campuslabs.shinyapps.io/text_analytics/
How do you get started with text analytics?
Let’s keep it simple and ask why, what and how.
i) Why am I considering even doing this? If you’re doing it because it’s the latest and greatest technique, that’s bad motivation. If you’re doing it because you realize there’s a lot of untapped potential about your students and it’ll give you a more complete understanding of them, then you’ll be motivated to move on to the what.
ii) What types of data do I have? What types of questions do I have? Start thinking about what text data really is on campus and it’ll start generating questions, curiosity and wonder. Then you need to get methods to answer the questions.
iii) How do I answer my new questions related to my text data? How do I approach this qualitative data? There will be costs associated with getting the methods to answer your new questions, usually involving people, education or software. You may need to hire people to analyze the text data, or educate the people on doing the analysis, or purchase software to make the analyzing easier. Or it could be all three!