Choosing NLU pipeline

igormis · December 9, 2019, 2:12pm

I have train data with the following characteristics/stats:

intent examples: 11263 (2 distinct intents)
- Found intents: ‘general’, ‘irrelevant’
- Number of response examples: 0 (0 distinct response)
- entity examples: 9407 (22 distinct entities)
- found entities: ‘’, ‘company’, ‘amount_price_target’, ‘analyst’, ‘financial_topic’, ‘financial_instrument’, ‘period’, ‘person’, ‘price_movement’, ‘hashtag’, ‘publication’, ‘ticker’, ‘amount’, ‘percent’, ‘number’, ‘media_type’, ‘location’, ‘rating_agency’, ‘event’, ‘exchange’, ‘product’, ‘sector’

I wanted to check if the supervised_embeddings.yml will outperform the pretrained_embeddings_spacy.yml concerning entity extraction, so I perform rasa test nlu --config pretrained_embeddings_spacy.yml supervised_embeddings.yml --nlu CF_model/config_en.json --runs 3 --percentages 0 25 50 70 90 Is this approach ok? or the results will be the same for both approaches for entity extraction? In the dataset we do not use the intents (just two intents) and we use financial entities.

Tobias_Wochinger · December 10, 2019, 8:38am

Welcome to the community, @igormis

ner_crf does currently not use any features from the intent classification part. So there shouldn’t be any difference between them.

igormis · December 10, 2019, 9:08am

tnx Tobias, however I get memory error whenever I run the test command…

Tobias_Wochinger · December 11, 2019, 12:15pm

@igormis Just talked to one of our researchers and my answer was wrong Depending on the configuration of your crf component, the features of previous components affect the entity extraction (See NLU Training Data) .

tnx Tobias, however I get memory error whenever I run the test command…

How much training data
Rasa version
what’s the error message in detail?

igormis · December 12, 2019, 11:08am

The output is not so descriptive:

2019-12-12 11:58:59 INFO rasa.nlu.model - Finished training component. 2019-12-12 11:58:59 INFO rasa.nlu.model - Starting to train component EntitySynonymMapper 2019-12-12 11:58:59 INFO rasa.nlu.model - Finished training component. 2019-12-12 11:58:59 INFO rasa.nlu.model - Starting to train component CountVectorsFeaturizer 2019-12-12 11:59:00 INFO rasa.nlu.model - Finished training component. 2019-12-12 11:59:00 INFO rasa.nlu.model - Starting to train component CountVectorsFeaturizer Killed
Rasa version = 1.5.1
Training data stats:
- intent examples: 11262 (2 distinct intents)
  - Found intents: ‘irrelevant’, ‘general’
  - Number of response examples: 0 (0 distinct response)
  - entity examples: 9407 (22 distinct entities)
  - found entities: ‘’, ‘percent’, ‘financial_topic’, ‘financial_instrument’, ‘amount_price_target’, ‘media_type’, ‘product’, ‘period’, ‘event’, ‘sector’, ‘rating_agency’, ‘analyst’, ‘person’, ‘ticker’, ‘location’, ‘company’, ‘publication’, ‘amount’, ‘price_movement’, ‘number’, ‘exchange’, ‘hashtag’
Data size (./CF_model/config_en.json) - 13 MB in Json format or 2.7 MB in MD format (I tried with both
Command: rasa test nlu -u CF_model/config_en.json --config supervised_embeddings.yml --cross-validation
supervised_embeddings.yml

language: “en”

pipeline: “supervised_embeddings”
Same problem when I run: rasa test nlu --config pretrained_embeddings_spacy.yml supervised_embeddings.yml --nlu CF_model/config_en.json --runs 3 --percentages 0 25 50 70 90
pretrained_embeddings_spacy.yml

language: “en”

pipeline: “pretrained_embeddings_spacy”

Tobias_Wochinger · December 16, 2019, 8:56am

How much memory does your machine have? Seems the vocabulary size for the CountVectorizer is getting too big for your machine. You can restrict the size of the vocabulary using the min_df and max_df parameters Components

igormis · December 16, 2019, 9:15am

it is 16GB of RAM memory and only this process is Memory-intensive

Topic		Replies	Views
Cannot get entity extraction to work with Rasa NLU Rasa Open Source	4	2174	October 15, 2019
NLU gets one-word entity right, misses extraction Rasa Open Source	2	311	October 20, 2020
Need help for data training Rasa Open Source	6	453	March 13, 2020
Rasa NLU Supervised Embeddings Pipeline entity issue Rasa Open Source	2	1613	February 5, 2020
Only recognizing one entity Rasa Open Source varsha	0	530	March 7, 2019

Choosing NLU pipeline

Related topics