Model training advice: Entities are misrecognized or wrong values extracted

Hello,

We’ve been building a chatbot that will, as part of its job, assist in updating particular fields in our invoice processing pipeline. The user will provide the field-value pairs that needs to be updated. The custom action is designed to return simply a json string with the recognized entity name and values as the key/value pairs. The service that’s relaying the user message to the chatbot will handle the rest, including validating the response from the chatbot to ensure no fields are misclassified. It’s not acceptable that the chatbot misclassifies an entity and return any key that does not need update.

The model we have trained has been great at intent recognition for this case. However, the entity classification and/or extraction has been abysmal. I’m posting here to request some help to understand why it is failing, and suggestions on how I can improve the model (perhaps my training examples are incorrect, inadequate or too few, or I’m using the wrong pipeline config?). The complete and final list of training examples can be found at the end of the post.

As a first step, I have added more training examples to the nlu file. Increased the example count from 15 to 35. While this improved the entity value extraction, there were still problems with correctly naming the entities, so I experimented with different notations to see if it would improve the results.

the example sentence: “freight cost is 10,00 euros”

Initially I was using the following notation in the nlu examples

- freight cost is [10,00](freight) euros

then I switched to the more detailed notation, as exemplified on the documentation Training Data Format

 - [freight cost is 10,00]{"entity":"freight","value":"10,00"} euros

While this drastically improved entity recognition (no mis-recognized entities), the extracted entities were extracted as

{"entity":"freight", "value":"freight cost is 10,00"} 

instead of

{"entity":"freight", "value":"10,00"}

interestingly enough, instead of “freight cost is 10,00 euros”, if the provided sentence was “freight cost: 10,00 euros” or “freight: EUR 10,00”, then the extraction would correctly be {“entity”:“freight”, “value”:“10,00”}

Next, in order to fix the value extraction issue, I changed the notation of the nlu examples to the following:

- freight cost is [10,00]{"entity":"freight","value":"10,00"} euros

This produced a kind-of-middle-ground model where the values are extracted very well, but about 20-30% of the entity names are misclassified, which was still a significant improvement compared to earlier notation of

 "freight cost is [10,00](freight) euros".

Yet, its still not nearly enough to be actually useful for our application. So, I’ve come to ask for help. What am I doing wrong in training the model, if anything? How can I improve it to not misclassify entity names, or fix the issue of extracting the entity value as “freight cost is 10,00” instead of just “10,00”

Any help is much appreciated, thanks.

As mentioned, here is the final version of the examples for the relevant intent. All possible entities and slots are defined in the domain. Using the default config.

- Freight is [5,00]{"entity":"freight","value":"5,00"} and tax rate is [19]{"entity":"invoice_tax","value":"19"}
- Payment terms are [30 Tage Netto]{"entity":"payment_terms","value":"30 Tage Netto"}
- Here are the fields you requested. Delivery note [LS1517156]{"entity":"delivery_note","value":"LS1517156"} order number [21021522]{"entity":"po_number","value":"21021522"}
- Invoice number and date is [737564]{"entity":"invoice_number","value":"737564"}, [15.05.2022]{"entity":"invoice_date","value":"15.05.2022"}
- Freight is [7,50]{"entity":"freight","value":"7,50"} packing is [0,00]{"entity":"packing","value":"0,00"}
- Freight is [15,00]{"entity":"freight","value":"15,00"} packing is [5,00]{"entity":"packing","value":"5,00"}
- payment terms: [10 Tage 2 30 Tage Netto]{"entity":"payment_terms","value":"10 Tage 2 30 Tage Netto"}
- Freight is EUR [10,00]{"entity":"freight","value":"10,00"} packing is EUR [5,00]{"entity":"packing","value":"5,00"}
- freight: [8.50]{"entity":"freight","value":"8.50"} packing: [5.00]{"entity":"packing","value":"5.00"}
- freight is [10]{"entity":"freight","value":"10"} euros and packing is [5]{"entity":"packing","value":"5"} euros
- Order number is [21022037]{"entity":"po_number","value":"21022037"}
- Order number is [21020805]{"entity":"po_number","value":"21020805"} and invoice number is [20437789929495/51]{"entity":"invoice_number","value":"20437789929495/51"}
- Tax is [0]{"entity":"invoice_tax","value":"0"}
- [19]{"entity":"invoice_tax","value":"19"} tax and [60 Tage netto]{"entity":"payment_terms","value":"60 Tage netto"}
- freight and packing are [10,00]{"entity":"freight","value":"10,00"} and [5,00]{"entity":"packing","value":"5,00"}
- packing and freight are [10,00]{"entity":"packing","value":"10,00"} and [20,00]{"entity":"freight","value":"5,00"}
- invoice number is [90387715]{"entity":"invoice_number","value":"90387715"} and delivery note is [8019937361]{"entity":"delivery_note","value":"8019937361"}
- date: [16.11.2022]{"entity":"invoice_date","value":"16.11.2022"} tax: [0]{"entity":"invoice_tax","value":"0"}
- [8 Tage 2 14 Tage Netto]{"entity":"payment_terms","value":"8 Tage 2 14 Tage Netto"}
- order number [21021626]{"entity":"po_number","value":"21021626"}
- invoice is [91182441]{"entity":"invoice_number","value":"91182441"} on [21.10.2022]{"entity":"invoice_date","value":"21.10.2022"}
- [0]{"entity":"invoice_tax","value":"0"} tax terms: [Bis zum 16.03.2023 ohne Abzug]{"entity":"payment_terms","value":"Bis zum 16.03.2023 ohne Abzug"}
- purchased on [12.12.2021]{"entity":"invoice_date","value":"12.12.2021"}
- delivery note is [22304696]{"entity":"delivery_note","value":"22304696"} payment terms are [14 Tage mit 2  Skonto 30 Tage netto]{"entity":"payment_terms","value":"14 Tage mit 2  Skonto 30 Tage netto"}
- delivery note: [22304699]{"entity":"delivery_note","value":"22304699"} freight: [0,00]{"entity":"freight","value":"0,00"}
- invoice tax is [19]{"entity":"invoice_tax","value":"19"}
- freight is EUR [8,00]{"entity":"freight","value":"8,00"}
- there's a [19]{"entity":"invoice_tax","value":"19"} tax applied Invoice is sent on [12.06.2022]{"entity":"invoice_date","value":"12.06.2022"}
- tax rate is [7]{"entity":"invoice_tax","value":"7"}
- freight cost is [20,00]{"entity":"freight","value":"20,00"} euros
- freight cost is [8]{"entity":"freight","value":"8"} euros
- there's a [5,00]{"entity":"freight","value":"5,00"} EUR shipping cost and the tax rate is [19]{"entity":"invoice_tax","value":"19"}
- invoice received on [10.08.21]{"entity":"invoice_date","value":"10.08.21"}
- invoice date: [10.08.21]{"entity":"invoice_date","value":"10.08.21"}
- delivery note is [1807033174]{"entity":"delivery_note","value":"1807033174"} invoice number is [257646966]{"entity":"invoice_number","value":"257646966"}
- invoice nr is [RE380183711]{"entity":"invoice_number","value":"RE380183711"} invoice arrived on [05.09.20]{"entity":"invoice_date","value":"05.09.20"}
- freight is [5]{"entity":"freight","value":"5"} euros and invoice tax is [7]{"entity":"invoice_tax","value":"7"}
- packing is [5]{"entity":"packing","value":"5"} euros and invoice number is [18152219]{"entity":"invoice_number","value":"18152219"}
- payment terms are [14 Tage netto]{"entity":"payment_terms","value":"14 Tage netto"} and invoice tax is [19]{"entity":"invoice_tax","value":"19"}
- [19]{"entity":"invoice_tax","value":"19"} invoice tax and [0]{"entity":"freight","value":"0"} freight cost
- delivery note is [LS484927]{"entity":"delivery_note","value":"LS484927"} invoice tax rate is [19]{"entity":"invoice_tax","value":"19"}
- invoice number is [1495023]{"entity":"invoice_number","value":"1495023"} and tax rate is [19]{"entity":"invoice_tax","value":"19"}

Are you using duckling? There’s a good overview of entity extraction here.

1 Like