OVERCOMING LIMITATIONS OF CONTEXT-INDEPENDENT RASA NLU SERVER
English, especially in its American variety, is a language whose words tend to be highly ambiguous in their meaning. It has multiple words that can simultaneously be nouns, verbs and adjective (e.g. access, hit, cool). Names, both personal and entity, such as business, represent a similar problem. Unlike other languages, where specialized words are used for first (given) names and even for last names (surnames), in English (once again especially in its American variety), almost any word could potentially end up as a first name or last name. Not just nouns like âtaylorâ but even adjectives such as âsharpâ, âstrongâ, âsmartâ can be used as first names or last names. And due to English and American custom of using last names as first names (e.g. Stuart, Madison, Washington, Jefferson, Lincoln, Grant, Jackson etc.), first names as last names (e.g. James, Warren, David, Abraham) and both of those as place names or company names, trying to figure out what a word means without any context is very difficult.
Rasa NLU does a good job of extracting the specific meaning of an ambiguous word used in a particular context (disambiguation). Given enough training data, the word âtaylorâ in âmy name is Taylorâ and âIâve been working as a taylor for 15 yearsâ will be properly parsed by Rasa NLU as {name} in the first case and {profession} in the second. But as Rasa NLU depends on context to provide disambiguation, when no context or limited context is provided, disambiguation becomes impossible. Consider the following dialogue:
B: Hi what is your name? (ask_first_name)
U: Georgia (give_first_name)
B: what is your last name? (ask_last_name)
U: James (give_last_name)
B: which city do you live in? (ask_city)
U: Jackson (give_city)
B: what state is that in? (ask_state)
U: Washington (give_state)
B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)
U: Madison (give_company_name)
We, as humans, understand that we are talking with Georgia (first name) James (last name), who lives in Jackson, Washington (state) and works for a company called âMadisonâ. But it is impossible for the Rasa_NLU to classify these responses as to their intent or to extract entities out of them, since any one of the responses could be, variously, a first name, a last name, a city name, a state name or a company name. This is due to the fact that Rasa NLUâs models are context-independent. Having RasaCore use a context-independent NLU model makes it much simpler to create the NLU models but also leads to the limitation described above.
The current approach to overcome this limitation is to use a general âinformâ intent that will collect 1-word or short answers, parse them as a general entity, e.g. ânameâ or âlocationâ and then have a custom action that will try to understand the exact meaning of the entity, e.g. is ânameâ a first_name or last_name, based on the state of the dialogue.
But we think thereâs a better, more standard way to do this. If we use a number of specialized NLU models running on different servers that are dedicated to understanding only 1 entity and 1 intents, and invoke the specific server based on the dialogue state, we can solve this problem trivially.
Thus intent classification and entity identification will be implied by the state of the dialogue, similar to the way this process happens in human speech, while having a specialized NLU model for the intent/entity in question will enable a highly reliable way to detect and validate the userâs response.
Letâs apply our approach to the dialogue above. We will run 5 NLU servers: NLU_first, NLU_last, NLU_city, NLU_state, NLU_company. Each one of those servers will have an NLU model trained to identify the entity in a short (itâs taylor) or one-word response in a definite way, i.e. NLU_first will identify all entities as âfirst_nameâ, NLU_city as âcityâ etc. The intents will be classified similarly, based on the intent the serverâs dedicated to.
We will also create a mapping between the utter_actions which prompted the user response and the NLU server to be used. So âask_first_nameâ is mapped to NLU_first, âask_last_nameâ is mapped to NLU_last, etc. We will also create a CustomAction that will be invoked following the general âinformâ intent, look up the last question the bot asked the user, then map that question to the appropriate NLU microservice and send the userâs input to the NLU microservice for entity extraction and validation
Now the user-bot interaction looks like this:
B: Hi what is your name? (ask_first_name)
U: Georgia
CustomAction: Send âGeorgiaâ to NLU_first server for processing, receive back entity: âfirst_name: Georgiaâ, intent: âgive_first_nameâ
B: what is your last name? (ask_last_name)
U: James
CustomAction: Send âJamesâ to NLU_last server for processing, receive back entity: âlast_name: Georgiaâ, intent: âgive_last_nameâ
B: which city do you live in? (ask_city)
U: Jackson
CustomAction: Send âJacksonâ to NLU_city server for processing, receive back entity: âcity: Jacksonâ, intent: âgive_cityâ
B: what state is that in? (ask_state)
U: Washington
CustomAction: Send âWashingtonâ to NLU_state server for processing, receive back entity: "state: Washington, intent: âgive_stateâ
B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)
U: Madison
CustomAction: Send âMadisonâ to NLU_company server for processing, receive back entity: âcompany_name: Madisonâ, intent: âgive_company_nameâ
Hereâs an example of how this type of dialogue would look in the stories:
-
greet
- ask_first_name
-
inform{"text": âItâs Georgia"}
-
CustomActionCNLU (send to first_name NLU)
-
slot{âfirst_name": âGeorgia"}
-
ask_last_name
-
-
inform{"text": âJames"}
-
CustomActionCNLU (send to last_name NLU)
-
slot{âlast_name": "Jamesâ}
-
ask_city
-
-
inform{"textâ: âTaylor"}
-
CustomActionCNLU (sent to city NLU)
-
slot{âcity": âTaylorâ}
-
ask_state
-
-
inform{âtext": âI live in Washington"}
-
CustomActionCNLU (send to state NLU)
-
{âstate": âWashington"}
-
utter_Bye
-
With the architecture suggested above, where each specialized question has an NLU server that is dedicated to its processing, we can decouple the NLU service (Rasa_NLU) from dialogue management service (Rasa_core), while at the same time providing the dialogue context to the NLU service.
And if this is taken up by the community we can distribute our NLU intelligence through the public cloud. Imagine that you have to deploy your bot in a different country, where addresses and names have different format. Instead of rewriting your NLU training data and retraining your NLU models, you could conceivably just point your CustomActionCNLU to a different microservice and migrate in no time.
There are some limitations to this approach of course. You still a good model to identify the âinformâ intent that will trigger the CustomAction call to the CNLU servers. And your CNLU servers have to use good models to be able to distinguish clearly irrelevant user replies, such as âHelloâ, âYesâ, âNoâ, from legitimate reponses, as bizarre as they may be, e.g. Kennesaw Mountain Landis (thatâs actually a person, not a place).
Here at Vocinity we hope that this approach to solving the general âinformâ problem will help the community. We developed a proof-of-concept of this solution and hope to release our code for it soon.