Huge intent recognition problems


I have an intent goodbye with some german examples to say goodbye:

- bis später
- tschüss
- bye

and here is my config.yml for the spacy model

language: "de_core_news_sm"

- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "ner_crf"
  features: [
    ["low", "title", "upper"],
    ["bias", "low", "prefix5", "prefix2", "suffix5", "suffix3", "suffix2", "upper", "title", "digit", "pattern", "pos", "pos2"],
    ["low", "title", "upper"]]
- name: "intent_featurizer_spacy"
- name: "intent_classifier_sklearn"

after training the nlu and the core I try to communicate with the bot but with some major problems. If I ask him total nonsence like syedxgvhkmijgedsfhdrwysa it falls in the intent goodbye with a confidence about 47%.

UserUttered(text: syedxgvhkmijgedsfhdrwysa, intent: {'name': 'goodbye', 'confidence': 0.4790237086878828}, entities: [])

So one method to prevent such problems is to create a fallback policy and configure it with a high nlu confidence. But are there any other methods to prevent these problems? Maybe configurations directly in spacy?

Another question: The string above is a very weird combination of many letters, absolutely with no analogy to my goodbye-examples. Why do I have such a high confidence?

i would like to know two since i am having the same problem

I’m running into the same issues and was going to make a similar post.

An example of my issues is if I have an intent such as “RegisterVisitor” and I have an utterance of “John Doe will be visiting me at 2PM” and test it with something like “Phil is visiting tomorrow at 4PM”, one of two things is happening. Either it will correctly identify the intent as RegisterVisitor but with a low confidence (.60 or lower) or it will incorrectly identify the intent.

What are some ways around this?


I would suggest you try to replace de_core_news_sm by de_core_news_md. This could improve you intent recognition. You might as well want to add more data to your NLU. Most of the time, intent recognition or entity extraction issues come from the lack of data and stories but I could be wrong.

I don’t think that increasing the NLU confidence is the right move to do.


Hi @lindig

following @blacknight s suggestion - imagine it like a kid you want to teach the intent “goodbye”. Three samples are simply not enough to sharpen / distinguish between other intents, especially with few words in them.

To give you more advices, we would need to know your other intents to decide if they are distinct or not.

If you need more help, feel free to ask!

1 Like