Inform acting quirky, might it be my training data?

jonathanpwheat · July 8, 2020, 2:52pm

Sorry this is a long post, it’s mostly code blocks so you can see what I’m doing.

TL;DR

My inform intent is only picking up on one example phrase, unless I specifically use training data values in my chat, then it works properly

The Long explanation of what I’ve done so far

I’m having difficulty getting my inform to work properly so I decided to look at some of the Rasa projects because I’m a learn by example kind of guy.

NLU Data

Examining the finance-bot NLU data’s inform intent, it looks like I have the proper formatting. Here’s a random snippet from the inform intent in nlu.md from the finanical-demo

- at [starbucks](vendor_name)
- [target](vendor_name)
- [Amazon](vendor_name)
- [Starbucks](vendor_name)
- [Target](vendor_name)
- I want to pay the [current balance](payment_amount)

My nlu.md data (that subsequently is giving me a headache) looks like this (same formatting):

## intent:inform
- my first name is [Steve](customer_first_name)
- my last name is [Franklin](customer_last_name)
- my email is [kayla@gmail.com](customer_email)
- please change my first name to [Jeffrey](customer_first_name)
- my first name should be [Jennifer](customer_first_name)
- that's wrong, my first name is not [John](customer_incorrect_first_name) it is [Jon](customer_first_name)
- my last name is [Johnson](customer_last_name) not [Jonson](customer_incorrect_last_name)
- please change my last name to [Simpson](customer_last_name)
- my last name should be [Marks](customer_last_name)
- that's wrong, my last name is not [Stevens](customer_incorrect_last_name) it is [Stephens](customer_last_name)
- can you hyphenate my last name [Franklin-Marshall](customer_last_name)
- my married name is now [Ford](customer_last_name)
- my maiden name is [Steppen](customer_last_name)

Stories

What is interesting to me (and quite confusing) is that there are no story examples that reference the inform intent in the finance-demo or the helpdesk-assistant either

I however, have two stories that look like this:

## Profile change contact first name
* inform{"customer_first_name": "Steven"}
  - utter_tell_new_customer_name

## Profile change contact last name
* inform{"customer_last_name": "Myers"}
  - utter_tell_new_customer_name

The issue is - I can say my first name is Jon - and that works and sets the entity customer_first_name properly as seen here:

Next message:
my first name is Jon
{
  "intent": {
    "name": "inform",
    "confidence": 0.9971489310264587
  },
  "entities": [
    {
      "entity": "customer_first_name",
      "start": 17,
      "end": 20,
      "value": "Jon",
      "extractor": "DIETClassifier"
    }
  ],

BUT… if I type my last name is Wheat, it picks up on the inform intent, however, doesn’t properly set the entity, as seen here:

Next message:
my last name is Wheat
{
  "intent": {
    "name": "inform",
    "confidence": 0.9683238863945007
  },
  "entities": [],

The kicker is, if I use the exact phrase from my training data (my last name is Franklin), it works:

Next message:
my last name is Franklin
{
  "intent": {
    "name": "inform",
    "confidence": 0.9954349994659424
  },
  "entities": [
    {
      "entity": "customer_last_name",
      "start": 16,
      "end": 24,
      "value": "Franklin",
      "extractor": "DIETClassifier"
    }
  ],

For the record, if I use the exact phrasing of any of my NLU data it response properly in all cases.

It is interesting that the first name phrase works with a random first name yet the last name does not. If I was doing things 100% wrong, I would expect nothing to work.

maybe my config needs to get tweaked?

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

policies:
  - name: AugmentedMemoizationPolicy
  - name: FormPolicy
  - name: MappingPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: MemoizationPolicy

Am I missing the larger point of the training data or how inform is supposed to work? Maybe I need more “identical” lines with different last names? or maybe I’m doing it completely wrong?

Thanks for reading and I’m open to suggestions

saurabh-m523 · July 10, 2020, 11:02am

Hi @jonathanpwheat!

I have a suggestion, please try putting a dense featurizer (like SpacyFeaturizer) in front of the DIETClassifier in the config. It could improve entity extraction.

jonathanpwheat · July 10, 2020, 1:36pm

I’ll try that right now, thank you for the suggestion. I’ll post back in a bit.

jonathanpwheat · July 10, 2020, 3:55pm

Hey @saurabh-m523, Sadly, I get the same result with Spacey - this is my new pipeline:

pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

And testing rasa shell nlu I type my last name is Wheat and it spits back -

Next message:
my last name is Wheat
{
  "intent": {
    "name": "inform",
    "confidence": 0.9849584698677063
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "inform",
      "confidence": 0.9849584698677063
    },
    {

saurabh-m523 · July 11, 2020, 3:45am

Hi @jonathanpwheat!

Hmm…

Well, the CRFEntityExtractor is able to use the dense features from the featurizers for entity extraction, so I thought maybe DIET could do that too.

Anyways, one more last thing to try from my side, keep the SpacyFeaturizer and put CRFEntityExtractor in the config as well (before DIET).

Please try it out, let’s see if it works

jonathanpwheat · July 13, 2020, 5:58pm

Same thing

I decided to run rasa interactive and try to manually train it.

Here’s a snip of that run (Spoiler alert, it’s not changing its behavior)

? Your input -> my last name is Wheat
? Your NLU model classified 'my last name is Wheat' with intent 'inform' and there are no entities, is this correct?  No
? What intent is it?  0.99 inform
? Please mark the entities using [value](type) notation my last name is [Wheat](customer_last_name)
------
? The bot wants to run 'utter_tell_new_customer_name', correct?  Yes
------
      Thank you, I've changed your name to Jon Wheat

So I run it again with a different last name and it STILL doesn’t pick it up. Clearly I’m doing something wrong.

? Your input -> my last name is Harvey
? Your NLU model classified 'my last name is Harvey' with intent 'inform' and there are no entities, is this correct?  No
? What intent is it?  0.99 inform
? Please mark the entities using [value](type) notation my last name is [Harvey](customer_last_name)

Maybe I had the wrong types associated with the slots?

  customer_first_name:
    type: unfeaturized
    auto_fill: true
  customer_last_name:
    type: unfeaturized
    auto_fill: true

Or do I need to use_entities[] when defining my inform intent?

For grins, I created a new rasa project if anyone wants to take a look at what I’m doing/seeing. It is super basic.

Topic		Replies	Views
Help needed in creating NLU Data Rasa Open Source	0	400	April 26, 2020
What happens if my nlu.md file contains multiple of the same intent? Rasa Open Source	0	508	April 8, 2020
Rasa NLU - Understanding Training Data Rasa Open Source	4	1599	March 24, 2020
Let's discuss intent best practices Rasa Open Source	3	1249	September 2, 2023
Need help: intent and entity not correct Rasa Open Source	2	765	July 4, 2019

Inform acting quirky, might it be my training data?

TL;DR

The Long explanation of what I’ve done so far

NLU Data

Stories

Related topics