I have two different intents to capture body temperature and oxygen saturation values:
- Intent:
body_temperature_data
(containingbody_temperature
custom entity) - Intent:
oxygen_saturation_data
(containingoxygen_saturation
custom entity)
Here below a data/intents.yml
excerpt:
- intent: body_temperature_data
examples: |
- sto bene
- niente febbre
- normale
- sono senza febbre
- non ho febbre
- non mi sento la febbre
- ho qualche linea
- ho la febbre
- credo di avere la febbre
- mi sento un po di febbre
- mi sento caldo
- poca
- molto poca
- bassa
- alta
- molto alta
- [35](body_temperature)
- [35.5](body_temperature)
- [35.6](body_temperature)
- [35.7](body_temperature)
- [35.8](body_temperature)
- [35.9](body_temperature)
- [35 e 9](body_temperature)
- [35,9](body_temperature)
- [36](body_temperature)
- [36.0](body_temperature)
- [36.1](body_temperature)
- [36.2](body_temperature)
- [36.3](body_temperature)
- [36.4](body_temperature)
- [36.5](body_temperature)
- [36.6](body_temperature)
- [36.7](body_temperature)
- [36.8](body_temperature)
- [36.9](body_temperature)
- [36 e 7](body_temperature)
- [36,9](body_temperature)
- [37](body_temperature)
- [37.0](body_temperature)
- [37.1](body_temperature)
- [37.2](body_temperature)
- [37.3](body_temperature)
- [37.4](body_temperature)
- [37.5](body_temperature)
- [37.6](body_temperature)
- [37.7](body_temperature)
- [37.8](body_temperature)
- [37.9](body_temperature)
- [37,2](body_temperature)
- [37.5](body_temperature)
- [37 , 6](body_temperature)
- [37 . 6](body_temperature)
- [38](body_temperature)
- [38.0](body_temperature)
- [38.1](body_temperature)
- [38.2](body_temperature)
- [38.3](body_temperature)
- [38.4](body_temperature)
- [38.5](body_temperature)
- [38.6](body_temperature)
- [38.7](body_temperature)
- [38.8](body_temperature)
- [38.9](body_temperature)
- [38 , 1](body_temperature)
- [38 . 2](body_temperature)
- [39](body_temperature)
- [39,1](body_temperature)
- [39.1](body_temperature)
- [39.2](body_temperature)
- [39.3](body_temperature)
- [39.5](body_temperature)
- [39.6](body_temperature)
- [39.7](body_temperature)
- [39.8](body_temperature)
- [39.9](body_temperature)
- [40](body_temperature)
- [41](body_temperature)
- [trentacinque](body_temperature)
- [trentasei](body_temperature)
- [trentasei e otto](body_temperature)
- [trentasette](body_temperature)
- [trentasette emmezzo](body_temperature)
- [trentasette e mezzo](body_temperature)
- [trentasette punto otto](body_temperature)
- [trentasette e quattro lineette](body_temperature)
- [trentasette e 6 linee](body_temperature)
- [trentasette virgola sei](body_temperature)
- [trentasette punto sette](body_temperature)
- [trentasette punto otto](body_temperature)
- [trentotto](body_temperature)
- [trentotto punto uno](body_temperature)
- [trentotto e 2 linee](body_temperature)
- [trentotto e due](body_temperature)
- [trentotto punto tre](body_temperature)
- [trentotto e quattro](body_temperature)
- [trentotto virgola quattro](body_temperature)
- [trentotto emmezzo](body_temperature)
- [trentanove](body_temperature)
- [trentanove e due](body_temperature)
- [trentanove emmezzo](body_temperature)
- [quaranta](body_temperature)
- [quarantuno](body_temperature)
- intent: oxygen_saturation_data
examples: |
- [70](oxygen_saturation)
- [71](oxygen_saturation)
- [72](oxygen_saturation)
- [73](oxygen_saturation)
- [74](oxygen_saturation)
- [75](oxygen_saturation)
- [76](oxygen_saturation)
- [77 e 9](oxygen_saturation)
- [78,7](oxygen_saturation)
- [79](oxygen_saturation)
- [80](oxygen_saturation)
- [80.5](oxygen_saturation)
- [81](oxygen_saturation)
- [81.6](oxygen_saturation)
- [82](oxygen_saturation)
- [82.4](oxygen_saturation)
- [83](oxygen_saturation)
- [83.7](oxygen_saturation)
- [84](oxygen_saturation)
- [84.1](oxygen_saturation)
- [85](oxygen_saturation)
- [85.2](oxygen_saturation)
- [86](oxygen_saturation)
- [86.9](oxygen_saturation)
- [87](oxygen_saturation)
- [87.8](oxygen_saturation)
- [88](oxygen_saturation)
- [88.0](oxygen_saturation)
- [88.1](oxygen_saturation)
- [89.0](oxygen_saturation)
- [89](oxygen_saturation)
- [89.7](oxygen_saturation)
- [90](oxygen_saturation)
- [90.0](oxygen_saturation)
- [90.1](oxygen_saturation)
- [90.2](oxygen_saturation)
- [90.3](oxygen_saturation)
- [90.4](oxygen_saturation)
- [90.5](oxygen_saturation)
- [90.6](oxygen_saturation)
- [90.7](oxygen_saturation)
- [90.8](oxygen_saturation)
- [90.9](oxygen_saturation)
- [91](oxygen_saturation)
- [91.6](oxygen_saturation)
- [92](oxygen_saturation)
- [92.9](oxygen_saturation)
- [93](oxygen_saturation)
- [93.8](oxygen_saturation)
- [94](oxygen_saturation)
- [94.5](oxygen_saturation)
- [95](oxygen_saturation)
- [95.4](oxygen_saturation)
- [96](oxygen_saturation)
- [96.7](oxygen_saturation)
- [97](oxygen_saturation)
- [97.5](oxygen_saturation)
- [98](oxygen_saturation)
- [98.4](oxygen_saturation)
- [99](oxygen_saturation)
- [99 e 1](oxygen_saturation)
- [99.0](oxygen_saturation)
- [99.9](oxygen_saturation)
- [100](oxygen_saturation)
- [settanta](oxygen_saturation)
- [settantuno](oxygen_saturation)
- [settantadue](oxygen_saturation)
- [settantatre](oxygen_saturation)
- [settantaquattro](oxygen_saturation)
- [settantacinque](oxygen_saturation)
- [settantasei](oxygen_saturation)
- [settantasette](oxygen_saturation)
- [settantotto](oxygen_saturation)
- [settantanove](oxygen_saturation)
- [ottanta](oxygen_saturation)
- [ottantuno](oxygen_saturation)
- [ottantadue](oxygen_saturation)
- [ottantatre](oxygen_saturation)
- [ottantatre punto cinque](oxygen_saturation)
- [ottantatre punto sei](oxygen_saturation)
- [ottantaquattro emmezzo](oxygen_saturation)
- [ottantaquattro e sei](oxygen_saturation)
- [ottantaquattro punto sette](oxygen_saturation)
- [ottantaquattro punto sei](oxygen_saturation)
- [ottantaquattro punto nove](oxygen_saturation)
- [ottantacinque](oxygen_saturation)
- [ottantacinque punto cinque](oxygen_saturation)
- [ottantacinque e quattro](oxygen_saturation)
- [ottantasei](oxygen_saturation)
- [ottantasette](oxygen_saturation)
- [ottantotto](oxygen_saturation)
- [ottantanove](oxygen_saturation)
- [novanta](oxygen_saturation)
- [novanta punto tre](oxygen_saturation)
- [novanta punto otto](oxygen_saturation)
- [novantuno](oxygen_saturation)
- [novantuno punto cinque](oxygen_saturation)
- [novantuno punto nove](oxygen_saturation)
- [novantadue](oxygen_saturation)
- [novantatre](oxygen_saturation)
- [novantatre e sei](oxygen_saturation)
- [novantatre punto sette](oxygen_saturation)
- [novantatre virgola otto](oxygen_saturation)
- [novantatre e due](oxygen_saturation)
- [novantatre punto uno](oxygen_saturation)
- [novantatre virgola nove](oxygen_saturation)
- [novantaquattro](oxygen_saturation)
- [novantaquattro virgola due](oxygen_saturation)
- [novantaquattro virgola otto](oxygen_saturation)
- [novantacinque](oxygen_saturation)
- [novantacinque e cinque](oxygen_saturation)
- [novantacinque punto cinque](oxygen_saturation)
- [novantasei](oxygen_saturation)
- [novantasei e uno](oxygen_saturation)
- [novantasei e cinque](oxygen_saturation)
- [novantasette](oxygen_saturation)
- [novantasette e due](oxygen_saturation)
- [novantasette e sei](oxygen_saturation)
- [novantotto](oxygen_saturation)
- [novantotto e cinque](oxygen_saturation)
- [novantanove](oxygen_saturation)
- [novantanove emmezzo](oxygen_saturation)
- [cento](oxygen_saturation)
As the examples show, I would like to get entities values (and afterward slots in a form) expressed as
- numbers as digits (
35.5
), possibly texted by user on a chat messaging channel - numbers as letters (
trentacinque punto cinque
), possibly inputed via speech so the speech recognition engine returns generally a literal transcript for numbers.
See what happens if I test the RASA NLU:
$ rasa shell nlu --quiet
NLU model loaded. Type a message and press enter to parse it.
Next message:
90.3
{
"text": "90.3",
"intent": {
"id": -6401318193538980427,
"name": "body_temperature_data",
"confidence": 0.7580949664115906
},
"entities": [
{
"entity": "body_temperature",
"start": 0,
"end": 4,
"confidence_entity": 0.7051703929901123,
"value": "90.3",
"extractor": "DIETClassifier"
}
],
"intent_ranking": [
{
"id": -6401318193538980427,
"name": "body_temperature_data",
"confidence": 0.7580949664115906
},
{
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.18363729119300842
},
{
"id": -860430617479998517,
"name": "mood_unhappy",
"confidence": 0.010874141938984394
},
Next message:
novanta punto tre
{
"text": "novanta punto tre",
"intent": {
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.9999997615814209
},
"entities": [
{
"entity": "oxygen_saturation",
"start": 0,
"end": 17,
"confidence_entity": 0.9956320524215698,
"value": "novanta punto tre",
"extractor": "DIETClassifier"
}
],
"intent_ranking": [
{
"id": 8358940020600517004,
"name": "oxygen_saturation_data",
"confidence": 0.9999997615814209
},
{
"id": -860430617479998517,
"name": "mood_unhappy",
"confidence": 4.465892544658345e-08
},
So what happens is that if numbers are inserted as words/letters, RASA classify correctly intent oxygen_saturation_data
and entity oxygen_saturation
. So far, so good.
But If I insert numbers by digits (e.g. 90.3
), the intent and entity are wrong classified.
This surprise me because the examples set of two intents body_temperature
and oxygen_saturation
are two completely separated set of texts!
My question is WHY intent/entity is wrongly classified?
BTW, I tried to add quotation marks in examples:
- ['35.5'](oxygen_saturation)
Instead of:
- [35.5](oxygen_saturation)
but this rise this error/warning at train time:
/home/giorgio/.local/lib/python3.8/site-packages/rasa/shared/utils/io.py:97: UserWarning: Misaligned entity annotation in message ‘‘35.5’’ with intent ‘body_temperature_data’. Make sure the start and end values of entities ([(0, 6, “‘35.5’”)]) in the training data match the token boundaries ([(0, 5, “'35.5”)]). Common causes:
- entities include trailing whitespaces or punctuation
- the tokenizer gives an unexpected result, due to languages such as Chinese that don’t use whitespace for word separation More info at Training Data Format
My doubt is about having numbers (e.g. floating numbers as digits strings as 35.5
) as entities (and intents examples). Could be this the reason why RASA NLU fails (see rasa shell nlu report above)?
Any idea?
$ cat config.yml
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: it
pipeline:
# pip3 install rasa[spacy]
# python3 -m spacy download it_core_news_sm
# python3 -m spacy download it_core_news_lg
# https://rasa.com/docs/rasa/components#spacynlp
- name: "SpacyNLP"
# language model to load
# italian large model: it_core_news_lg
# italian small model: it_core_news_sm
model: "it_core_news_sm"
# when retrieving word vectors, this will decide if the casing
# of the word is relevant. E.g. `hello` and `Hello` will
# retrieve the same vector, if set to `False`. For some
# applications and models it makes sense to differentiate
# between these two words, therefore setting this to `True`.
case_sensitive: false
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies:
Thanks