Topic modeling

Spopgg · April 6, 2021, 10:24pm

I’m new to NLP and NLU and I was looking for a way to do topic modeling, like extracting the topic of sentence and extracting its tags.

Is Rasa a good choice to do that (how), or there are other libraries/frameworks better for this task ?

koaning · April 12, 2021, 12:01pm

There are two “kinds” of topic models I guess.

One kind of topic modelling tries to provide tags to text. In a lot of cases these tags will be known ahead of time and the task is to attach a tag to a new text. For example it might say “this newspaper article is about politics” vs “about tech”. This use-case falls into the realm of “supervised learning” and I might call this “classification” or “tagging”.

The other kind of topic modelling is unsupervised. Here there are no labels and you’d be more interested in figuring out if there are clusters in the texts that have been provided. Commonly, you don’t just want to have clusters but you’d also be interested in having some interpretation for each cluster as well.

Rasa comes with algorithms to do the former (supervised), not the latter (unsupervised). The main use-case for us is that we want to detect the “intent” of a message that comes to our assistant.

For unsupervised topic modelling there are a lot of techniques. It’s kind of it’s own field. I’ve done a bit of work in this field and my favourite “trick” is the one demonstrated in my video on Bulk Labelling. Another popular technique is latend dirichlet allocation and you can find an implementation of it in scikit-learn.

One thing to remember about unsupervised topic modelling (and this also holds unsupervised learning in general) → it’s incredibly hard to argue if the found topic models are appropriate. Without labels that represent ground truth, it can be very hard to quantify how well an approach works.

Topic		Replies	Views
Is there a way to use NLU for batch processing data? Rasa Open Source	1	618	October 28, 2020
Rasa NLU for other purposes than ChatBot Getting Started with Rasa	1	130	December 19, 2018
Multi intents behind the scene Rasa Open Source	1	285	August 31, 2020
SUPERVISED EMBEDDING Rasa Open Source	2	1024	May 15, 2019
What does Rasa use internally for text classification? Rasa Open Source	4	1425	July 14, 2019

Topic modeling

Related topics