Entities ending with punctuations

vishu1994 · June 7, 2020, 6:11pm

my nlu data looks like :

- i am looking for [O +](blood_group) blood
- i am looking for [O+](blood_group) blood
- i am looking for [O-](blood_group) blood
- i am looking for [O -](blood_group) blood
- i am looking for [B +](blood_group) blood
- i am looking for [B+](blood_group) blood
- i am looking for [B -](blood_group) blood
- i am looking for [B-](blood_group) blood
- i am looking for [A+](blood_group) blood
- i am looking for [A +](blood_group) blood
- i am looking for [A-](blood_group) blood
- i am looking for [A -](blood_group) blood
- i am looking for [AB-](blood_group) blood
- i am looking for [AB -](blood_group) blood

I get this warning :

Misaligned entity annotation in message ‘i am looking for O - blood’ with intent ‘filter’. Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).

How can i get rid of this situation, i cant change the entities structure.

MuraliChandran14 · June 8, 2020, 8:25am

Hi @vishu1994,

The warning is coming from WhitespaceTokenizer.

If you are really concerned on the warning. Try different Tokenizer like Spacy

pipeline:

  - name: SpacyNLP

  - name: SpacyTokenizer

  - name: SpacyFeaturizer

  - name: RegexFeaturizer

  - name: LexicalSyntacticFeaturizer

  - name: CountVectorsFeaturizer

  - name: CountVectorsFeaturizer

    analyzer: "char_wb"

    min_ngram: 1

    max_ngram: 4

  - name: DIETClassifier

    epochs: 100

  - name: EntitySynonymMapper

  - name: ResponseSelector

    epochs: 100

or use Rasa-x to edit your nlu.md to match the start and end values of entities

vishu1994 · June 8, 2020, 10:55am

hey but the thing is its just not about warnings, those entities are not even considered while training.

MuraliChandran14 · June 8, 2020, 11:33am

@vishu1994, Yes, You are right I just did a dry run.

In WhitespaceTokenizer entities or not getting detected because of the warning you mentioned above.

Then I used Spacy and entities are getting detected

.

It might because WhitespaceTokenizer are not meant to understand special characters.

vishu1994 · June 8, 2020, 12:48pm

Really thanks for finding out some time and sharing the neccesary details.

Do you have any idea about tokenizer for hindi language.

Actually i am having both hindi and English data in the nlu shall i go for language model support of bert and try with multilingual models which can handle many languages.

MuraliChandran14 · June 8, 2020, 2:36pm

Sorry, I don t have any idea. May be this stack overflow might be good help for you

Topic		Replies	Views
Sinhala entity classifications Rasa Open Source	1	367	July 8, 2020
Having trouble formatting training examples that contains a '-' or other punctuation signs Rasa Open Source	4	2050	January 30, 2020
[HELP NEEDED] Misaligned entity annotation in message Rasa Open Source	6	1838	September 13, 2022
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1050	February 24, 2020
Warning for arabic annotation during training Rasa Open Source	0	324	March 11, 2022

Entities ending with punctuations

Related topics