Difficulty Separating Entities with a Number in Them

Hi, I’m using rasa 1.2.2 with supervised_embeddings on english text and am having trouble picking weights (eg. 10 grams, ten grams, 100 grams). My model keeps separating inputs such as “10 mg” into “integer” and “dosage”. I should ideally have one entity called “dosage” with value “10 mg”. My model does this for some inputs and not others:

Input      MValue                   Entities              Entity Confidence
10 ounce   ['10 ounce']             ['dosage']            0.991
10 pound   ['pound', '10']          ['dosage', 'integer'] 0.854
100 grams  ['gram', '100']          ['dosage', 'integer'] 0.862
100 iu     ['100 iu']               ['dosage']            0.991
100 mcg    ['100 microgram']        ['dosage']            0.991
100 mg     ['milligram', '100']     ['dosage', 'integer'] 0.949
1000 iu    ['1000 iu']              ['dosage']            0.991
1000 mg    ['milligram', '1000']    ['dosage', 'integer'] 0.949
10000 iu   ['10000 iu']             ['dosage']            0.991

How should I proceed to ensure that whenever a number is followed by a weight unit is recognised as a single entity. Do I need to provide more training data or having some manual check in my code further down the line?

Are you using the integer entity at all? where is it coming from?

Yes, both are being used in elastic search. Items in elastic search will have a dosage field that looks like ["100 mg"], so ideally I would like to extract the dosage as an integer followed by a weight unit.

Hmm what does your NLU data look like? Ideally if this is what you want it to extract, it would look something like

- [100](integer) [mg](weight)

however, have you tried also looking into duckling to extract the integers (as numbers)?