Hi sorry for my maybe trivial question (I’m still a RASA “newbie”).
I’m experimenting RASA 2.8 capabilities for conversational apps in Italian language. As suggested in RASA documentation, I added the SPacyNLP component using an italian language model. Here my config.yml:
language: it
pipeline:
# pip3 install rasa[spacy]
# python3 -m spacy download it_core_news_sm
# python3 -m spacy download it_core_news_lg
- name: "SpacyNLP"
# language model to load
# italian large model: it_core_news_lg
# italian small model: it_core_news_sm
model: "it_core_news_sm"
case_sensitive: false
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies:
- name: MemoizationPolicy
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True
- name: TEDPolicy
max_history: 10
epochs: 100
constrain_similarities: true
Now, what is not clear to me is how this component improves the RASA NLU. Reading the recent Vincent article “Non English Tools for Rasa”, I understand the possible 3 helpers: tokenizer, featurizer, entities extractor. I can generally understand the added value of tokenizer and entities extractor (a bit more obscure how the featurizer helps), anyway I have two questions:
Q1. There is any practical example or “benchmark” demonstrating “how much” a (Specy or others) external model helps the RASA NLU working better? Any article to deep this topic?
Q2. About Spacy NER: I know that using the SpacyNLP component in the pipeline I can get Spacy entities using the Spacy naming. E.g. Spacy detects my name and surname as entity PERSON
if I check teh sentence: mi chiamo Giorgio Robino e vivo a Genova in Italia
:
Now, if I test the above sentence in one of my RASA chatbots I can’t see an expected PERSON entity:
$ rasa shell nlu
2021-08-24 15:52:07 INFO rasa.model - Loading model models/20210823-171315.tar.gz...
2021-08-24 15:52:24 INFO rasa.nlu.components - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-it_core_news_sm'.
NLU model loaded. Type a message and press enter to parse it.
Next message:
Mi chiamo Giorgio Robino e vivo a Genova, in Italia
{
"text": "Mi chiamo Giorgio Robino e vivo a Genova, in Italia",
"intent": {
"id": 5671084092719348945,
"name": "goodbye",
"confidence": 0.5075552463531494
},
"entities": [
{
"entity": "oxygen_saturation",
"start": 25,
"end": 26,
"confidence_entity": 0.9176041483879089,
"value": "e",
"extractor": "DIETClassifier"
}
],
I’ts because I have to add a PERSON
entity and at least 1 intent containing that entity in my training data? Or I missing something in my configuration file?
Q3. A related question to the above is: can I augment Spacy (or other external entities extractor) entities with those defined internally in RASA training data?
By example: Suppose I would like to extend the Spacy PERSON
entity set with others RASA application-defined names (maybe extending Italian names with Arabic or English names, etc). How to do?
What if I name in my RASA domain an entity lookup table with name PERSON? I get the sum of Spacy PERSON set plus the RASA PERSON set?