Entity extraction/slot type differences in Lex to Rasa Migration

I am attempting to migrate my chatbot system from aws lex to rasa but have been running into entity extraction problems. Lex treats entity extraction/slot setting as a variable. What is the best way to try and repeat this? I am attempting to use lookup tables, but partial queries are not working.

Here is how I would have simple training set up in aws:

Training Data:

  • Intent: I want {food}, Buy some {food}, etc.
  • Entity/slot (food): “pepperoni pizza”, “pad thai”, “ice cream”, etc.

Example Queries:

  • “I want to buy pepperoni pizza” -> food = pepperoni pizza
  • “I want to buy thai” -> food = pad thai
  • “Lets buy some ice” -> food = ice cream
  • “Lets buy some peperoni” -> failed

My (failed) attempts in RASA

Training Data:

  • Intent: I want some [pepperoni pizza]-(food), Buy some [ice]-(food)
  • Lookup table (food): “pepperoni pizza”, “pad thai”, “ice cream”, etc.

Example Queries:

  • “I want to buy a pepperoni pizza” -> food = pepperoni pizza
  • “I want to buy thai” -> food = thai
  • “Lets buy some ice” -> food = ice
  • “Lets buy some peperoni” -> food = peperoni

Also, this was something lex could not do but I would like to figure out which is misspellings. How would I classify “peperoni piza” to “pepperoni pizza”. Thanks!

Hi @jordanisaacs,

first of all I’d like to point you to the doc about NLU training data in which you can see how training data and certain situations might get handled in Rasa. Specifically to your problem, you could go this way:

version: "2.0"

- intent: order_food
  examples: |
    - I want some [pepperoni pizza](food)
    - I want some [ice]{"entity": "food"}
- synonym: food
  examples: |
    - peperoni pizza
    - icecr3am
- lookup: food
  examples: |
     - pepperoni pizza
     - pad thai
     - ice cream

You have to keep in mind though, that your training data needs to include a certain amount of samples to make a lookup table work properly:

When using lookup tables with RegexFeaturizer, provide enough examples for the intent or entity you want to match so that the model can learn to use the generated regular expression as a feature. When using lookup tables with RegexEntityExtractor, provide at least two annotated examples of the entity so that the NLU model can register it as an entity at training time.

Don’t forget to add RegexFeaturizer and RegexEntityExtractor and EntitySynonymMapper to your config.yml if you go this way and be aware of the fact that synonym mapping only happens after a certain entity has been extracted.

I hope that helps!

Kind regards
Julian