DIETClassifier splits one entity into list of token

I am migrating from RASA 1.5.3 to 1.10.14

In the previous version entity extraction is working fine, but now the slot witch have a special character as / or - is splitted into list of token. For exemple, I want to extract IT / Engineering as a department entity but it’s extracted as a list ["it", "engineering"]

This is my NLU data

intent: inform

Department slot in the domain looks like:

    type: unfeaturized

And this is the pipeline I am using:

# Configuration for Rasa NLU.
language: en
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  batch_strategy: sequence
  epochs: 90
  ranking_length: 5
- name: EntitySynonymMapper
- name: "ResponseSelector"
  retrieval_intent: zigzag
  scale_loss: false

I think that DIETClassifier component caused this error because when I remove it everything works fine. Which hyperparamater should I change to fix this problem please?

Hi @nadachaabani1 ,

Please see the explanation of this behavior, and possible solutions, in this forum post.

1 Like

Thank you so much @Arjaan. Now I’am using SpacyTokenizer and everything works fine.

1 Like