How works N-gram based model

I have build RASA NLU by following intents,


  • hi
  • hello
  • hey


  • how are you
  • are you okay
  • how are you doing

NLU Pipeline Configuration

- name: "WhitespaceTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  analyzer: "word"
  stop_words: { "english" }
  min_ngram: 1
  max_ngram: 3
- name: "EmbeddingIntentClassifier"

My question is, When I am testing a sentence like below

  • are you how
  • you doing are how

RASA NLU, still getting the intent as ask_how_doing with 91% accuracy and even when we give an proper order of sentence the accuracy is 97%. I feel that the tri-gram model is not given proper importance when sequence is missed because the sequence of words are taken from 1 to 3.

But if you try the sequence from 2 to 3. like below configuration,

  • min_ngram: 2
  • max_ngram: 3

The above sequence problem was solved but individual words like “hi”, “hello” and “hey” not able to get the intent because individual words are not trained in the model.

So, How do i tackle these kind of problem in RASA NLU pipeline?