Intents do not obtain NLU threhold because of domain specificity chatbot

Because of the very specific domain we are covering intents could lie extremely close together. Even when a very ‘straightforward’ question is asked from which the intent should be easily detected (and the entities are detected correctly), the NLU threshold is not obtained, resulting in the chatbot asking to rephrase the question.

To create stories ‘chatette’ is used, which is a module that enables the creation of similar stories in terms of structure, changing some words to their synonyms. We than sample from the possible stories generated by this module in a way that enough sample stories are made for chatbot training, but the processing time is not impacted tremendously.

Is there a way to cope optimally with intents covering very closely related but dissimilar topics (like assigning more weight to core words representing the intents when detected in a story)?

this is a good question! You can use (regex features)[Training Data Format] to achieve what you want to some extent.

Probably the better approach is to do a hyperparameter search on the parameters of your pipeline. Say for example you’re using the tensorflow embedding pipeline. Split up the components:

language: en
pipeline:
- name: "intent_featurizer_count_vectors"
  analyzer: char_wb
  min_df: 0.006789048157425257
  max_df: 0.4343982144945721
  max_ngram: 7
- name: "intent_classifier_tensorflow_embedding"
  epochs: 34
  batch_size: 165
  embed_dim: 60
  C2: 0.0002167140015537944
  C_emb: 0.00022451701750527038
  droprate: 0.17386092794138916
  num_hidden_layers_a: 0
  hidden_layer_size_a: 170
  num_hidden_layers_b: 4
  hidden_layer_size_b: 130

And use a library like hyperopt to optimize those parameters with respect to your loss function. It might make sense to penalize confusion between closely related intents more heavily than other errors to achieve what you want.

Another possibility is to use (multi intents)[Choosing a Rasa NLU Pipeline] to construct multi-intents like main_topic+subtopic_1, main_topic+subtopic_2, etc.

Many thanks for the answer! :smile: These are definitely some interesting suggestions to try out :wink: I’ll keep you posted about the most efficient technique to solve the issue I am facing!