Write intent

Alonso · December 13, 2021, 7:27am

Hello again,

I’m trying to get the user to type a character string, consisting of + or -, followed by three letters and a number, example: +COJ1, -TPL23, +ERD12

But when I train I get the following:

UserWarning: Misaligned entity annotation in message 'Code +COJ1' with intent 'code_specific'. Make sure the start and end values of entities ([(8, 13, '+COJ1')]) in the training data match the token boundaries ([(0, 7, 'Code'), (9, 13, 'COJ1')]). Common causes:
  1) entities include trailing whitespaces or punctuation
  2) the tokenizer gives an unexpected result, due to languages such as Chinese that don't use whitespace for word separation
  More info at https://rasa.com/docs/rasa/training-data-format#nlu-training-data

This is the intent:

- intent: code_specific
  examples: |
    - Code [new code](code_number)
    - Code [+COJ1](code_number)

The problem is that if I put Code [1](code_number) in the intent, when I run rasa shell the next action works correctly, but if I leave it like this it does not.

Does anyone know why?

Thank you very much!

ChrisRahme · December 13, 2021, 9:39am

Try using Regex to parse the entities.

It’s the recommended way for parsing ordered data like yours, always +/- followed by 3 letters and a number.

Alonso · December 13, 2021, 10:03am

I’ve read it but I don’t understand how to apply it

It isn’t necessary to separate the code, i.e. it’s the same group of characters, it isn’t necessary to separate the sign from the letters and the number.

nik202 · December 13, 2021, 12:15pm

@Alonso

Demo code with a regular expression please do check the regular expression again. ref: https://regex101.com

nlu:
- regex: code_specific
  examples: |
    - [+-a-zA-Z0-9a-zA-Z]+
- intent: inform
  examples: |
     -  Code [-TPL23](code_specific)
     -  Code [+COJ1](code_specific)

You just need to update the config file with RegexFeaturizer and RegexEntityExtractor | NLU Training Data

I hope this will help you.

Alonso · December 13, 2021, 2:58pm

Yes, thank you both very much

ChrisRahme · December 13, 2021, 3:37pm

I just wanna propose another Regex since Nik’s will work for anything (m_m, 4>3, 12+A will all match):

^(\+|\-)[a-zA-Z]{3}\d+$

Nik’s works for everything containing +, letters, numbers, and every ASCII character between + and a, all optional and in any order (basically the only thing that will break the pattern are the ASCII characters #32 to #42).

Mine works for tokens starting with + or -, followed by 3 uppercase or lowercase letters, then end with at least one digit.

In my Regex above, any amount of numbers at the end will be taken. If you want for example a minimum of 1 number and a maximum of 4, you can do the following:

^(\+|\-)[a-zA-Z]{3}\d{1,4}$

And, as Nik said, you need to keep at least 2 examples and add RegexFeaturizer and RegexEntityExtractor in your pipeline.

Alonso · December 13, 2021, 4:05pm

Thanks!!!

Topic		Replies	Views
Issue while classifying intent Rasa Open Source	5	589	December 18, 2019
Misaligned entity annotation Rasa Open Source	7	4614	June 3, 2020
Sinhala entity classifications Rasa Open Source	1	367	July 8, 2020
Slot containing .net Rasa Open Source	4	545	May 7, 2021
Rasa not picking special characters in an entity Rasa Open Source	9	3338	May 12, 2020

Write intent

Related topics