I just wanna propose another Regex since Nik’s will work for anything (m_m
, 4>3
, 12+A
will all match):
^(\+|\-)[a-zA-Z]{3}\d+$
Nik’s works for everything containing +
, letters, numbers, and every ASCII character between +
and a
, all optional and in any order (basically the only thing that will break the pattern are the ASCII characters #32 to #42).
Mine works for tokens starting with + or -, followed by 3 uppercase or lowercase letters, then end with at least one digit.
In my Regex above, any amount of numbers at the end will be taken. If you want for example a minimum of 1 number and a maximum of 4, you can do the following:
^(\+|\-)[a-zA-Z]{3}\d{1,4}$
And, as Nik said, you need to keep at least 2 examples and add RegexFeaturizer
and RegexEntityExtractor
in your pipeline.