Write intent

Hello again,

I’m trying to get the user to type a character string, consisting of + or -, followed by three letters and a number, example: +COJ1, -TPL23, +ERD12

But when I train I get the following:

UserWarning: Misaligned entity annotation in message 'Code +COJ1' with intent 'code_specific'. Make sure the start and end values of entities ([(8, 13, '+COJ1')]) in the training data match the token boundaries ([(0, 7, 'Code'), (9, 13, 'COJ1')]). Common causes:
  1) entities include trailing whitespaces or punctuation
  2) the tokenizer gives an unexpected result, due to languages such as Chinese that don't use whitespace for word separation
  More info at https://rasa.com/docs/rasa/training-data-format#nlu-training-data

This is the intent:

- intent: code_specific
  examples: |
    - Code [new code](code_number)
    - Code [+COJ1](code_number)

The problem is that if I put Code [1](code_number) in the intent, when I run rasa shell the next action works correctly, but if I leave it like this it does not.

Does anyone know why?

Thank you very much!

Try using Regex to parse the entities.

It’s the recommended way for parsing ordered data like yours, always +/- followed by 3 letters and a number.

1 Like

I’ve read it but I don’t understand how to apply it :frowning:

It isn’t necessary to separate the code, i.e. it’s the same group of characters, it isn’t necessary to separate the sign from the letters and the number.

@Alonso

Demo code with a regular expression please do check the regular expression again. ref: https://regex101.com

nlu:
- regex: code_specific
  examples: |
    - [+-a-zA-Z0-9a-zA-Z]+
- intent: inform
  examples: |
     -  Code [-TPL23](code_specific)
     -  Code [+COJ1](code_specific)

You just need to update the config file with RegexFeaturizer and RegexEntityExtractor | NLU Training Data

I hope this will help you.

1 Like

Yes, thank you both very much

I just wanna propose another Regex since Nik’s will work for anything (m_m, 4>3, 12+A will all match):

^(\+|\-)[a-zA-Z]{3}\d+$

Nik’s works for everything containing +, letters, numbers, and every ASCII character between + and a, all optional and in any order (basically the only thing that will break the pattern are the ASCII characters #32 to #42).

Mine works for tokens starting with + or -, followed by 3 uppercase or lowercase letters, then end with at least one digit.

In my Regex above, any amount of numbers at the end will be taken. If you want for example a minimum of 1 number and a maximum of 4, you can do the following:

^(\+|\-)[a-zA-Z]{3}\d{1,4}$

And, as Nik said, you need to keep at least 2 examples and add RegexFeaturizer and RegexEntityExtractor in your pipeline.

1 Like

Thanks!!!

1 Like