Supervised embeddings are not giving good accuracy

There are total 2000 Intents and each intent has 10 to 12 minimun examples. There is no entity present in the example. I tried following tests

Test 1

-- pipeline: supervised_embeddings
-- augmentation 20
-- emb_dim = 50
-- batch_size = [128, 512]
-- split_symbol = "+"
-----------------------------------------------------------------------
Accuracy  = 0.568

Test 2

-- pipeline: supervised_embeddings
-- augmentation 40
-- emb_dim = 50
-- batch_size = [64, 256]
-- split_symbol = "-"
-----------------------------------------------------------------------
Accuracy  = 0.581

Test 3

-- pipeline: supervised_embeddings
-- augmentation 50
-- emb_dim = 70
-- batch_size = [256, 1024]
-- split_symbol = "+"
-----------------------------------------------------------------------
Accuracy  = 0.567

Test 4

-- pipeline: supervised_embeddings
-- augmentation 70
-- emb_dim = 70
-- batch_size = [512, 2048]
-- split_symbol = "+"
-----------------------------------------------------------------------
Accuracy  = 0.566

Why the accuracy is constant. following is the training sample: nlu.md:

intent:intent_number_1

  • Where can I find a lawyer?
  • find something on find lawyer?
  • show me something on find lawyer?
  • I need something about find lawyer?
  • I’m looking for something about find lawyer?
  • I’d like to see something about find lawyer?
  • find something about find lawyer?
  • search for something about find lawyer?
  • get me something about find lawyer?
  • something about find lawyer?
  • I need something on find lawyer?
  • I want to see something on find lawyer?
  • I would like to see something on find lawyer?
  • I want you to show me find lawyer?
  • I want something about find lawyer?
  • I would like to see find lawyer?

intent:intent_number_2

  • Where can I go for free legal help?
  • find something on go free legal help?
  • show me something on go free legal help?
  • I need something about go free legal help?
  • I’m looking for something about go free legal help?
  • I’d like to see something about go free legal help?
  • find something about go free legal help?
  • search for something about go free legal help?
  • get me something about go free legal help?
  • something about go free legal help?
  • I need something on go free legal help?
  • I want to see something on go free legal help?
  • I would like to see something on go free legal help?
  • I want you to show me go free legal help?
  • I want something about go free legal help?
  • I would like to see go free legal help?

stories.md

Story Number: 1

  • intent_number_1
    • utter_intent_number_1

Story Number: 2

  • intent_number_2
    • utter_intent_number_2

please help

Hi @prasad01dalavi. I would guess that 10-12 examples per intent when you have 2000 intents is simply not enough + I would doubt that you really need 2000 intents and chances are that some intents simply have the examples which are too similar which results in a model making lots of mistakes. For example, just looking at the example you provided I already see that the examples you used to train both intents are very similar in terms of vocabulary and sentence structure. I would suggest diversifying them a bit more and testing the performance then

Thanks Juste. I agree we may need more examples. but about the 2000 intents are because they are different questions and answers. and you are very right that model is getting confused for the section part. as below:

## intent:intent_number_1972
- what is section 170 of IPC?
- find something on section 170 of IPC?
- show me something on section 170 of IPC?
- I need something about section 170 of IPC?
- I'm looking for something about section 170 of IPC?
- I'd like to see something about section 170 of IPC?
- find something about section 170 of IPC?
- search for something about section 170 of IPC?
- get me something about section 170 of IPC?
- something about section 170 of IPC?
- I need something on section 170 of IPC?
- I want to see something on section 170 of IPC?
- I would like to see something on section 170 of IPC?
- I want you to show me section 170 of IPC?
- I want something about section 170 of IPC?
- I would like to see section 170 of IPC?

## intent:intent_number_1974
- what is section 171 of IPC?
- find something on section 171 of IPC?
- show me something on section 171 of IPC?
- I need something about section 171 of IPC?
- I'm looking for something about section 171 of IPC?
- I'd like to see something about section 171 of IPC?
- find something about section 171 of IPC?
- search for something about section 171 of IPC?
- get me something about section 171 of IPC?
- something about section 171 of IPC?
- I need something on section 171 of IPC?
- I want to see something on section 171 of IPC?
- I would like to see something on section 171 of IPC?
- I want you to show me section 171 of IPC?
- I want something about section 171 of IPC?
- I would like to see section 171 of IPC?

and many more sections are there. How do i make it different. It is as it is. User will ask like that only. Its working fine for other intents but completely fails for section thing as its lot similar to each other.

Question 2: Will you please tell what is the best Test Configuration I used here. there are 4 tests i listed down. so that i will go with that only. because more augmentation is taking lot of time for training and also the epochs.

Queston 3: Spacy is not a good choice as you have mentioned in pipelines that for more than 1000 examples we need supervised.

And I would like to appreciate your work as team and individual as I can see how clearly you explained it on youtube.

Hi, May I know the status please. Its really important to me to implement as soon as possible Supervised embeddings are not giving good accuracy

You may have better luck treating this as a NER problem and trying to extract the section number from the text and have some conversation flow/database query action based on that.

In the examples in your initial post you could do something similar, or also just do a keyword lookup (though not actually sure how to do that in Rasa short of a custom action that just searches the input string)

thank you for the suggestion. that can be easily done by using elastic search also. i expected that rasa nlu should be able to understand the thing.

HI @prasad01dalavi. You should definitely go the NER route for this use case. It will be almost impossible to build a good classification model when your examples for different intents are completely the same except for the different entity which is present in the sentence. In the example you included, you should have just one intent, for example intent_number and write your examples as follows:

- what is section [171](number) of IPC? 

If you use duckling component to extract the numbers you will not need to do any data labelling.

By going this route you will need way less data to achieve much better results for you assistant. I am almost sure, that you will be able to cut your data at least in half by better designing the training examples (using entities instead of just intents).

Cool. Let me try with this entity extraction method. thank you so much for the clarification. I tried all possible ways.

Hi Juste,

I looked at entity extraction but I am not able to use it. I have following config.py

language: en pipeline:

  • name: “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor” features: [ [“low”, “title”, “upper”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”], [“low”, “title”, “upper”] ]

and nlu.md is as similar to:

intent:intent_number_2

  • Where can I go for free legal help?
  • find something on go free legal help?
  • show me something on go free legal help?
  • I need something about go free legal help?
  • I’m looking for something about go free legal help?
  • I’d like to see something about go free legal help?
  • find something about go free legal help?
  • search for something about go free legal help?
  • get me something about go free legal help?
  • something about go free legal help?
  • I need something on go free legal help?
  • I want to see something on go free legal help?
  • I would like to see something on go free legal help?
  • I want you to show me go free legal help?
  • I want something about go free legal help?
  • I would like to see go free legal help?

intent:section_search

  • What is section 279 in IPC
  • find something on section 279 IPC
  • show me something on section 279 IPC
  • I need something about section 279 IPC
  • I’m looking for something about section 279
  • I’d like to see something about section 279
  • find something about section 279
  • search for something about section 279
  • get me something about section 279
  • something about section 279
  • I need something on section 279
  • I want to see something on section 279
  • I would like to see something on section 279
  • I want you to show me section 279
  • I want something about section 279
  • I would like to see section 279

intent:article_search

  • What is the Article 44?
  • find something on Article 44?
  • show me something on Article 44?
  • I need something about Article 44?
  • I’m looking for something about Article 44?
  • I’d like to see something about Article 44?
  • find something about Article 44?
  • search for something about Article 44?
  • get me something about Article 44?
  • something about Article 44?
  • I need something on Article 44?
  • I want to see something on Article 44?
  • I would like to see something on Article 44?
  • I want you to show me Article 44?
  • I want something about Article 44?
  • I would like to see Article 44?

lookup:article

  • 44
  • 3
  • 17

lookup:section

  • 9
  • 159
  • 497

made it like I would like to see section 279

but I am getting blank response when i train it even for geet

{ “intent”: { “name”: null, “confidence”: 0.0 }, “entities”: [], “text”: “hi” }

how do I deal with it? Is my configuration wrong or dataset. I have referred the following RASA Blog:

You should used duckling extractor to extract numbers. What is your reasoning behind using lookup tables here?

1 Like

Thank you so much, It worked with the following configuration.

language: en pipeline:

  • name: “CountVectorsFeaturizer”
  • name: “EmbeddingIntentClassifier”
  • name: “DucklingHTTPExtractor”

    url of the running duckling server

    url: “http://localhost:8000

    dimensions to extract

    dimensions: [“time”, “number”, “amount-of-money”, “distance”]

    allows you to configure the locale, by default the language is

    used

    locale: “de_DE”

    if not set the default timezone of Duckling is going to be used

    needed to calculate dates from relative expressions like “tomorrow”

    timezone: “Europe/Berlin”

Configuration for Rasa Core.

Policies

policies:

  • name: MemoizationPolicy
  • name: KerasPolicy
  • name: MappingPolicy

I am from India, What should be the timezone value in above given by RASA:innocent:

1 Like

Perfect :slight_smile: Happy to see it worked. The locale and timezone you used should work for Berlin