How to handle a slot if it starts with . like in .net?
UserWarning: Misaligned entity annotation in message ‘Have you ever programmed in .net’ with intent ‘word2url’. Make sure the start and end values of entities ([(28, 32, ‘.net’)]) in the training data match the token boundaries ([(0, 4, ‘Have’), (5, 8, ‘you’), (9, 13, ‘ever’), (14, 24, ‘programmed’), (25, 27, ‘in’), (29, 32, ‘net’)]). Common causes:
entities include trailing whitespaces or punctuation
the tokenizer gives an unexpected result, due to languages such as Chinese that don’t use whitespace for word separation
Hi @joggerjoel would you able to share the contents of your config.yml file?
I would like to double-check the tokenizer(s) and entity extractor you’re using.
Also where does this user warning appear? Which rasa command(s) did you use?
Finally, have you tried running rasa shell --debug to check if your entity gets extracted as expected during a conversation?
- intent: word2url
examples: |
- My programming languages are [C#](language)
- I enjoy [C++](language)
- Do you program in [PHP](language)
- I am new to [Python](language)
- Do you know [Laravel](language)
- Have you ever programmed in [.net](language)
- Can you help me with [HTML](language)
- How do I learn about [kotlin](language)
- Are you any good at [C](language)
- When did you learn [golang](language)
- Have you worked on [CSS](language)
- Do you play [minecraft](language
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
# - name: RegexEntityExtractor
- name: ResponseSelector
epochs: 100
retrieval_intent: chitchat
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
- name: "DucklingHTTPExtractor"
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions: ["number", "email"]
# allows you to configure the locale, by default the language is
# used
# locale: "de_DE"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
# timezone: "Europe/Berlin"
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True