What I want to do is to recognize patterns with two characters and up to three digits from user input and identify them as code entities. When I train my bot and give this a try, it only recognizes the examples I’ve given to it under intent:course_code and fails when some new pattern is given. Here is my config.yml file:
I think your config file and Regex part is fine.
The number of examples you gave for intent course_code is very less, you need to give atleast 10 - 15 examples for it to be effective.
Thanks a lot of replying and giving such a detailed answer! I found that the problem was a lot simpler than expected. While I was reading this article I noticed this little detail:
make sure RegexFeaturizer is in your nlp pipeline and present before CRFEntityExtractor
I checked my config.yml file and it wasn’t, so I made the changes and it is working perfectly now. I made sure to provide a few more examples too, as you mentioned
I’m actually using a lookup table which has values of professor names for a different intent. This is how this looks:
## intent:professor_email
- what's the email of mr [Johnson](professor_names);
- email of mr [Smith](professor_names)
- can you tell me mrs [Jones](professor_names) email
My lookup table is in a separate professor_names.txt file and in my domain.yml I’ve defined a professor_names entity and slot. The entity extraction works perfectly when I provide professor names included in the training data, but when I enter one new it’s not accurate at all.
Do you believe this is a problem with the number of training examples?
I was afraid that by providing many examples the bot would ignore the lookup table like mentioned here in the docs.
(Editing this, I was wrong on my initial reply about the whole concept of lookup tables.
I apologize. )
Extracting names is a very difficult task, Esp as it varies very widely and there is no fixed pattern or anything.Even I am struggling with this. But do have a look at CRFEntityExtractor
But other than that, run rasa with a --debug command. Then check if entities are getting extracted.
If not, I guess you’ll have to provide more examples from the lookup table to training data.
As it says in the documentation :
For lookup tables to be effective, there must be a few examples of matches in your training data. Otherwise the model will not learn to use the lookup table match features.
Also, (and I’m not sure about this ) tokenization may be case sensitive.So maybe entity extraction fails because of this.
To overcome this, if you are using WhiteSpaceTokenizer, add "case_sensitive": False.( not sure about this)
Do let me know if you found an apt solution somewhere.