How to handle the numbers coming as text ex: (five, six)

Hi All,

Just started my journey with Rasa and working on voice to text based AI bot app. The user input sometimes comes as

  • I need five pizzas
  • Need two pizzas
  • Looking for two pizzas.

How I can define a slot type and intent set for this and extract the no of pizzas mentioned by the customer.

I am looking for [two](order_count) pizzas.

Appreciate your help.

Thanks Hari

You can use Spacy. These numbers as text are captured under cardinal entity. I am looking for [two](CARDINAL) pizzas.

You can look at this visualizer to see what can be captured: spcay entity visualizer

Thank you Srikar.

This is great suggestion. If I go with this approach, do I need to run spacy model and find the entities? May be build a pipeline to run Rasa NLU intent extraction and Spacy to merge the results?

Is there a way, I can tell the Rasa NLU pipeline to use spacy models and give me the expected SLOT value? Please clarify.

Yes you can do that. First add ner_spacy to your pipeline in your config file. Then declare CARDINAL as a slot in the domain and then provide a few examples in the nlu this way I am looking for [two](CARDINAL) pizzas.

Hi, thank you for your reply. I have been trying your solution. But results are not consistent.

Here is my NLU data

- I have around [six](CARDINAL) years  of experience on java development.
- I have around [two](CARDINAL) years  of experience on java development. 
- I have around [three](CARDINAL) years  of experience on SAP development. 
- I have around [four](CARDINAL) years  of experience as business analyst. 

in actions.py , have code as

cardinal_Val = next(tracker.get_latest_entity_values('CARDINAL'), None)
print("cardinal_Val entity value -->", cardinal_Val)

if tracker.get_slot('CARDINAL') is not None:
  input_data["CARDINAL"] = tracker.get_slot('CARDINAL')

All the times I’m not getting slot value. Some times its coming as entity as pure number and other times its coming as number text. Few times I’m not getting any value at all even though I trained the same number

Output:
    cardinal_Val entity value --> 7
    cardinal_Val entity value --> two

Can you please suggest me what could have gone wrong ??

Does your training data include digits as well or just examples of cardinals? How many examples do you have in the training set?

Alternately, you can try out the ner_duckling_http pipeline and see if it helps.

Here are my full sample intents:

## intent:about_exp1
- I have around [8](CARDINAL) years  of experience on web development. 
- i have [16](CARDINAL) years experience in [html](technology),[css](technology), and [javascript](technology),and [jquery](technology)
- i worked for [10](CARDINAL) plus years
- Have been working around [20](CARDINAL) plus years in technology
- in last [5](CARDINAL) years worked in [html](technology),[css](technology), and [javascript](technology),and [jquery](technology) 
- I started my career with [compindia](company)
- I worked in [IBM](company)
- I worked in [Apple](company)
- worked for [facebook](company)
- worked in [tirupati](location)  
- I have I had started as a [php developer](skill) initially 
- I worked there for [two](CARDINAL) years,so ahhh 
- worked on [web technologies](skill) like [html](technology),[css](technology), and [javascript](technology),and [jquery](technology) library and also worked on [php frameworks](technology) like [Mojavi](technology) and [Codeigniter](technology)
- have been working for [ten](CARDINAL)
- have been working for [10](CARDINAL)
- have been working for [ten](CARDINAL)
- I was working for [ten](CARDINAL) years
- worked for [fifteen](CARDINAL) years
- I have around [seven](CARDINAL) years  of experience on web development. 
- I have around [six](CARDINAL) years  of experience on java development.
- I have around [two](CARDINAL) years  of experience on java development. 
- I have around [three](CARDINAL) years  of experience on SAP development. 
- I have around [four](CARDINAL) years  of experience as business analyst. 

here is my current pipeline

pipeline:
- name: tokenizer_whitespace
- name: ner_crf
- name: nlp_spacy
- name: ner_spacy
- name: intent_featurizer_count_vectors
  OOV_token: oov
  token_pattern: (?u)\b\w+\b
- name: intent_classifier_tensorflow_embedding
  epochs: 50
- name: ner_duckling_http
  url: http://localhost:8000
  dimensions:
  - email
  - number
  - amount-of-money
- name: ner_synonyms
language: en

Just tried ner_duckling_http got the same output.

Do I have a very bad pipeline? or very limited training data?

Ohh, your data is perfectly fine. You have enough examples and your pipeline is great. I’m not sure why it isn’t recognizing sometimes.

I would suggest using ner_duckling_http for this as it would extract numbers in a text and numeric format and return both as numbers. I can see you have ner_duckling_http in your pipeline already - that’s great. Are you including number as entity into your domain?

Thank you Juste for your quick response. Appreciate your support on this. I changed the pipeline for number as seen below

## intent:about_exp1
- I have around [8](number) years  of experience on web development. 
- i have [16](number) years experience in [html](technology),[css](technology), and [javascript](technology),and [jquery](technology)
- i worked for [10](number) plus years
- Have been working around [20](number) plus years in technology
- in last [5](number) years worked in [html](technology),[css](technology), and [javascript](technology),and [jquery](technology) 
- I started my career with [compindia](company)
- I worked in [IBM](company)
- I worked in [Apple](company)
- worked for [facebook](company)
- worked in [tirupati](location)  
- I have I had started as a [php developer](skill) initially 
- I worked there for [two](number) years,so ahhh 
- worked on [web technologies](skill) like [html](technology),[css](technology), and [javascript](technology),and [jquery](technology) library and also worked on [php frameworks](technology) like [Mojavi](technology) and [Codeigniter](technology)
- have been working for [ten](number)
- have been working for [10](number)
- have been working for [ten](number)
- I was working for [ten](number) years
- worked for [fifteen](number) years
- I have around [seven](number) years  of experience on web development. 
- I have around [six](number) years  of experience on java development.
- I have around [two](number) years  of experience on java development. 
- I have around [three](number) years  of experience on SAP development. 
- I have around [four](number) years  of experience as business analyst.

still getting same response(s) as show below…

number entity value --> 7
number entity value --> two

When using duckling you don’t really have to label the numbers in your sentences. After looking more closely in you pipeline I have noticed that you are using two ner components (ner_crf and ner_spacy). Is there a specific reason for that?

I have run an example on my machine and I can see that you should get the output similar to this:

{'intent': {'name': 'restaurant', 'confidence': 0.8300734758377075}, 'entities': [{'start': 12, 'end': 16, 'text': 'five', 'value': 5, 'confidence': 1.0, 'additional_info': {'value': 5, 'type': 'value'}, 'entity': 'number', 'extractor': 'ner_duckling_http'}], 'intent_ranking': [{'name': 'restaurant', 'confidence': 0.8300734758377075}, {'name': 'about_exp1', 'confidence': 0.39224401116371155}], 'text': 'Looking for five chocolate cakes'}

Take a look at the component which was extracted with ‘ner_dcukling_http’ extractor - I think it give the result you are looking for (entity is ‘number’, ‘text’ is five, but the value is ‘5’)

Thank you Justie for testing this out.

As I new to RASA framework, how I can get this response/result in actions.py. Can you share the action python code?

coming to ner_spacy I was trying out spacy options as it was suggested by other member here. Will take them out, if duckling works out.

@Juste Srikar’s point about Spacy seems to imply that ner_spacy exposes the regular Spacy NER types it is trained on. Is that correct? Or is it only working because of using a training example of “CARDINAL”?

This post is old but for anyone looking for the solution, do this by: I needed to extract 13 as well as thirteen, so

  1. i added duckling in the pipeline:-

name: “DucklingHTTPExtractor”

url: “http://localhost:8000

dimensions: [ “number”, “ordinal”]

  1. ran duckling on docker

and you are done.

similarly you can add date etc to the array if you want to extract those entities too.