Rasa Extracting unsupervised entities

sameertikoo · August 14, 2019, 6:43pm

I am trying to find ways to extract Entity that I havent trained my model on.

eg. install python which has an intent lets say install_software I have more than 20 examples of a person asking for installing software in the NLU.md however my model is only recognising the softwares that I have "explicitly"mentioned.

I should also be able to understand “install slack” and extract slack from it that will be saved in the slot software_name later. Current using CRFEntityExtractor which to my knowledge is for supervised embeddings.

I guess I need some unsupervised extractor too… Any suggestions ?

Tanja · August 15, 2019, 7:31am

Just to clarify: Let’s assume you training data looks like this:

## intent:install_software
- can you install [python](software_name)
- please install [rasa](software_name)
- install [rasa-x](software_name)
- ...

You train your bot and when you start the bot, the bot recognizes python as software_name but not slack, for example. Is that correct?

Normally that should not happen. Your CRFEntityExtractor should generalize and also recognize software names that were not mention explicitly in the training data. How much training data do you have? If you just have a couple examples that include a software_name it might be hard to generalize for the CRFEntityExtractor. So, maybe try to add more examples to your training data. Another thing that could help would be lookup tables (Training Data Format). If you have a list of software names that you want to detect, you can add those as lookup table to your training data. Lookup tables basically add a new feature to the CRFEntityExtractor, it should help to improve the performance.

sameertikoo · August 15, 2019, 11:48am

You have understood my issue correctly. I have tried having more than 20 examples as you can see in the screenshot i took from botfront.

Here is my config.

Here is how it understands a pretrained entity.

Here is how it doesn’t understand an untrained entity.

I am also aware of the lookup tables however I want to extract the entities generically.

Tanja · August 16, 2019, 12:08pm

It seems like you are just using “zoom”, “python” and “outlook” as software name examples in the training data. Can you try to use more diverse software names and retrain? Your current NER might just overfit to those software names. However, as I mentioned the NER should be able to generalize. I guess you just need to confront it with more diverse training data.

sameertikoo · August 19, 2019, 6:41am

@Tanja you are bang on target. Overfitting was the problem here. Thanks a lot for the help!

znat · August 23, 2019, 9:37pm

Another thing think you might try is remove the"low" feature from the middle word. Low, prefix and siffix features accentuate memorization. If surrounding words are usually the same that should help generalize.

Topic		Replies	Views
Separate training data for crf_entity_extractor Rasa Open Source	1	481	November 28, 2019
CrfEntityExtractor Rasa Open Source	1	2222	July 2, 2019
RASA NLU not extracting entities for keys which are not in training data Rasa Open Source	1	702	August 12, 2019
Issue with entity detection - fails to detect outside of the training set Rasa Open Source	4	3098	February 6, 2019
Rasa only recognizing names from lookuptable Rasa Open Source	7	3661	June 29, 2020

Rasa Extracting unsupervised entities

Related topics