Rasa not picking special characters in an entity

hi team,

Rasa nlu is unable to pic special characters in an entity such as ( alfred,rodger , europe/london ). can u help me with this @akelad
Thanks

Hi @shubham1140!

( alfred,rodger , europe/london )

What special characters are you referring to? Can you point me to them?

i have the same problem with special characters such as . in .net and + in c++

- Knowledge in [C++](competency), [Python](competency), [Linux](competency) and [GIT](competency)
- i am good with [c#](competency) , [.net](competency) and [react](competency)
 f"Misaligned entity annotation for '{collected_text}' "
c:\users\mega\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\extractors\crf_entity_extractor.py:533: UserWarning: Misaligned entity annotation for 'C' in sentence 'Knowledge 
in C++, Python, Linux and GIT, administration tools' with intent 'inform'. Make sure the start and end values of the annotated training examples end at token boundaries (e.g. don't include trailing whitespaces or punctuation).
  f"Misaligned entity annotation for '{collected_text}' "
c:\users\mega\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\extractors\crf_entity_extractor.py:533: UserWarning: Misaligned entity annotation for 'net' in sentence 'i am good with c# , .net and react' with intent 'inform'. Make sure the start and end values of the annotated training examples end at token boundaries (e.g. don't include trailing whitespaces or punctuation).

Hello @nadachaabani1, @shubham - there’s a potential solution for you here - Having trouble formatting training examples that contains a '-' or other punctuation signs.

tl;dr - there’s a regex that in typical cases ignores special characters as delimiters in strings. So cases where I would like the string 75-100 to be extracted into two entities 75 and 100, would fail. A solution would be to modify the regex to your specific need.

the issue with me is that i need to consider (/, ‘,’) as a part of single entity not as a separators. for example (city/state is my single entity ) and the problem is rasa nlu is not picking this entity due to (/) present in this entity. @ganeshv , @akelad

@Tanja special characters like ( ‘,’ , ‘/’ ) etc in an entity values are not picked by rasa nlu . other example is if i write (doesn’t) , this is also not picked although trained on this

As already pointed out by @ganeshv, we have a regex in place that splits words on those characters into separate tokens. So if you are using the WhitespaceTokenizer this will happen. If you want to keep the words, you can first of all try a different tokenizer or update the regex by writing a custom tokenizer (you can use the WhitespaceTokenizer as an example and just update the regex over there).

Hey shubham, how are you?

I am having the same issue… have you solved it? I wonder which tokenizer you are using… I am working with SpacyTokenizer but, even if the vanilla lib keeps stuff like 05/02 together, it seems that a different treatment is adopted on this component.

We use spacytokenizer and I’m not having any problems with “-” or “.” in examples.

yeah spacy solved my issue here to detect 1 charactor symbol