The original conversation and source unstructured data were in Ukrainian. For the purposes of this article, both the input data and the interaction with ChatGPT have been translated into English.
We all know how to draw conclusions from structured data to some extent, but most people get lost when asked to analyze unstructured data. Today, I’ll try to take a closer look at this process.
Just in case,
Unstructured data refers to data that does not have a defined structure. Classic examples include product or service reviews, social media messages, website search results, etc.
A bit about the data itself: When students start the PRO ANALYTICS course, they fill out a short questionnaire. Essentially, it includes two fields:
And the answers, of course, are written in a free form. I always read them, but since the course has been running for over 3 years now, there are a lot of responses, and I wanted to get the big picture.
Here’s an example of such data:
As you can see, there's a lot of varied input. But there is a solution. We’ll use ChatGPT for this analysis — and spoiler alert: I think it did a great job. The distribution of students by position can be seen in the screenshot below. The analysis of expectations and practical skills will follow later.
Of course, there’s a large “Unknown” portion, but that’s understandable given some of the responses from students who are taking the course for the second or third time, such as:
Below, I’ll explain a bit more about the process and share some life hacks for this type of analysis — because in reality, just uploading a file and saying “make a chart” doesn’t work. Well, a chart might be generated, but it won’t be useful. An example of such a superficial analysis is shown in the screenshot below.
So let’s figure out how to do it properly. Of course, there are many prompting strategies, but I’ll share the one that works best for me:
Let’s go over each point in more detail below.
If the “straightforward” approach doesn’t work, you can try asking ChatGPT to categorize the responses — but most likely, it will group them in ways that don’t match your needs. For example, in my case, it created a separate group for “data engineers,” although there was only one such student in the entire history of the course.
That’s why the best approach is to ask ChatGPT to classify the responses using your own predefined categories. I did this in two steps:
In my case, I used the approach described above because I already knew most of the professional categories of students who take my course. I just didn’t expect SEO specialists to make up such a significant group. But if you don’t know the potential categories ahead of time, there’s another way — ask ChatGPT to analyze the frequency of words in the responses and form category suggestions based on that. Here’s an example of what such a ChatGPT response might look like:
The issue with this approach is that it doesn’t take into account logical groupings that are obvious to us — for example, Marketer, Marketing, and Digital all essentially describe a marketer. Or PPC and contextual advertising. So identifying keywords is only the beginning. Next, you need to group them into logical clusters. In my case, it looked like this:
As you can see, this method is very helpful when forming categories, even if you’re not familiar with the data. But our ultimate goal isn’t just to define categories — it’s to determine which category (profession) each student belongs to. The next step, then, is to assign each student a standardized profession label. Here's an example of what that looked like:
While this approach seems to work well, there were still plenty of situations where ChatGPT labeled a response as “Other” — even when, through logical reasoning, we could have easily placed it in a relevant category. One such example is “CMO” — for me, it’s clear this is a marketing role, but since this response was rare, ChatGPT simply ignored it as a keyword during analysis. To correctly classify such cases, we’ll use one more technique.
As you’ve likely guessed by now, our goal is no longer just to search for keywords — we need to perform logical reasoning. Here’s an example of that kind of logic-based analysis:
And here’s how ChatGPT suggests you prompt it to perform such an analysis. Just use the phrase: “Flexible logic-based classification.” This became the final step in my categorization. Of course, I also asked it to list what was left in the “Other” group after the logic-based analysis — and, indeed, it was truly “Other” )
You already saw the final breakdown by profession at the beginning of the article. I also conducted a similar analysis for the responses to the question: “What knowledge and practical skills would you like to gain from the course? What are your expectations?”
Here are the results:
This article is not meant to be an in-depth guide to unstructured data analysis. In reality, the process is usually more complex than it might seem. But my main goal was to show that even someone with no technical background, equipped with unstructured data and ChatGPT, can extract useful insights using simple techniques and in a short amount of time. And I hope I succeeded.
How do you analyze your unstructured data? Do you use ChatGPT for such analysis — and if so, what techniques do you apply?
If you enjoyed this content, subscribe to my LinkedIn page.
I also run a LinkedIn newsletter with fresh analytics updates every two weeks — here’s the link to join.
Web Analyst, Marketer