I’m trying to get the user to type a character string, consisting of + or -, followed by three letters and a number, example: +COJ1, -TPL23, +ERD12
But when I train I get the following:
UserWarning: Misaligned entity annotation in message 'Code +COJ1' with intent 'code_specific'. Make sure the start and end values of entities ([(8, 13, '+COJ1')]) in the training data match the token boundaries ([(0, 7, 'Code'), (9, 13, 'COJ1')]). Common causes:
1) entities include trailing whitespaces or punctuation
2) the tokenizer gives an unexpected result, due to languages such as Chinese that don't use whitespace for word separation
More info at https://rasa.com/docs/rasa/training-data-format#nlu-training-data
The problem is that if I put Code [1](code_number) in the intent, when I run rasa shell the next action works correctly, but if I leave it like this it does not.
I’ve read it but I don’t understand how to apply it
It isn’t necessary to separate the code, i.e. it’s the same group of characters, it isn’t necessary to separate the sign from the letters and the number.
I just wanna propose another Regex since Nik’s will work for anything (m_m, 4>3, 12+A will all match):
^(\+|\-)[a-zA-Z]{3}\d+$
Nik’s works for everything containing +, letters, numbers, and every ASCII character between + and a, all optional and in any order (basically the only thing that will break the pattern are the ASCII characters #32 to #42).
Mine works for tokens starting with + or -, followed by 3 uppercase or lowercase letters, then end with at least one digit.
In my Regex above, any amount of numbers at the end will be taken. If you want for example a minimum of 1 number and a maximum of 4, you can do the following:
^(\+|\-)[a-zA-Z]{3}\d{1,4}$
And, as Nik said, you need to keep at least 2 examples and add RegexFeaturizer and RegexEntityExtractor in your pipeline.