Hi, I am trying to build a chatbot for train booking (just a univerisity project, the chatbot will interact with a mock database). I have a problem with entity recognition, in particular in distinguishing the role of the entitiy “location” between departure and destinaiton. We do not have data available, so we created the nlu training data from scratch. The part of the nlu that should train the location recognition looks like this: (they are examples under the intent: inform)
- I need a train going to [Osnabrueck]{"entity": "location", "role": "destination"}
- I need a train from [Hannover]{"entity": "location", "role": "departure"} to [Berlin]{"entity": "location", "role": "departure"} in the [morning](time) of the [15-03-2023](date).
- I want to leave from [Hanover]{"entity": "location", "role": "departure"} to [Hengelo]{"entity": "location", "role": "destination"}
- I want to leave from [Berlin]{"entity": "location", "role": "departure"}
- I need to go to [Osnabrueck]{"entity": "location", "role": "destination"}
- want to go from [Munich]{"entity": "location", "role": "departure"} to [Hanover]{"entity": "location", "role": "destination"}
- I want to travel to [Berlin]{"entity": "location", "role": "destination"}
- I want to travel to [Frankfurt]{"entity": "location", "role": "destination"} from [Munich]{"entity": "location", "role": "departure"}
- I would like to go from [Hamburg]{"entity": "location", "role": "departure"} to [Osnabrueck]{"entity": "location", "role": "destination"}
- I'd like to go to [Hengelo]{"entity": "location", "role": "destination"} from [Osnabrueck]{"entity": "location", "role": "departure"}
- I'd like to go from [Berlin]{"entity": "location", "role": "departure"} to [Hengelo]{"entity": "location", "role": "destination"}
- from [Osnabrueck]{"entity": "location", "role": "departure"}
- to [Osnabrueck]{"entity": "location", "role": "destination"}
- from [berlin]{"entity": "location", "role": "departure"}
- to [Berlin]{"entity": "location", "role": "destination"}
- from [frankfurt]{"entity": "location", "role": "departure"}
- to [frankfurt]{"entity": "location", "role": "destination"}
- from [Munich]{"entity": "location", "role": "departure"}
- to [Munich]{"entity": "location", "role": "destination"}
- from [Hanover]{"entity": "location", "role": "departure"}
- to [Hannover]{"entity": "location", "role": "destination"}
- from [Hengelo]{"entity": "location", "role": "departure"}
- to [Hengelo]{"entity": "location", "role": "destination"}
- destination city is [muenchen] {"entity": "location", "role": "destination"}
- departure city is [hannover] {"entity": "location", "role": "departure"}
- I'll depart from [berlin] {"entity": "location", "role": "departure"}
- I'll go to [frankfurt] {"entity": "location", "role": "destination"}
- Departure will be [osnabrueck] {"entity": "location", "role": "departure"}
- Destination will be [hengelo] {"entity": "location", "role": "destination"}
- We are going to [Berlin] {"entity": "location", "role": "departure"}
- The train should go from [frankfurt] {"entity": "location", "role": "departure"} to [muenchen] {"entity": "location", "role": "destination"}
This is the configuration we are using:
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 2
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
model_confidence: softmax
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 50
constrain_similarities: true
- name: RulePolicy
What could I do to improve the recognition of departure and destiantion?
Thank you in advance.