Providing conversation context to the NLU using microservices

(Leo) #1


English, especially in its American variety, is a language whose words tend to be highly ambiguous in their meaning. It has multiple words that can simultaneously be nouns, verbs and adjective (e.g. access, hit, cool). Names, both personal and entity, such as business, represent a similar problem. Unlike other languages, where specialized words are used for first (given) names and even for last names (surnames), in English (once again especially in its American variety), almost any word could potentially end up as a first name or last name. Not just nouns like “taylor” but even adjectives such as “sharp”, “strong”, “smart” can be used as first names or last names. And due to English and American custom of using last names as first names (e.g. Stuart, Madison, Washington, Jefferson, Lincoln, Grant, Jackson etc.), first names as last names (e.g. James, Warren, David, Abraham) and both of those as place names or company names, trying to figure out what a word means without any context is very difficult.

Rasa NLU does a good job of extracting the specific meaning of an ambiguous word used in a particular context (disambiguation). Given enough training data, the word “taylor” in “my name is Taylor” and “I’ve been working as a taylor for 15 years” will be properly parsed by Rasa NLU as {name} in the first case and {profession} in the second. But as Rasa NLU depends on context to provide disambiguation, when no context or limited context is provided, disambiguation becomes impossible. Consider the following dialogue:

B: Hi what is your name? (ask_first_name)

	U:  Georgia  (give_first_name)

B: what is your last name? (ask_last_name)

	U:   James  (give_last_name)

B: which city do you live in? (ask_city)

	U:  Jackson  (give_city)

B: what state is that in? (ask_state)

	U:  Washington  (give_state)

B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)

	U:  Madison  (give_company_name)

We, as humans, understand that we are talking with Georgia (first name) James (last name), who lives in Jackson, Washington (state) and works for a company called “Madison”. But it is impossible for the Rasa_NLU to classify these responses as to their intent or to extract entities out of them, since any one of the responses could be, variously, a first name, a last name, a city name, a state name or a company name. This is due to the fact that Rasa NLU’s models are context-independent. Having RasaCore use a context-independent NLU model makes it much simpler to create the NLU models but also leads to the limitation described above.

The current approach to overcome this limitation is to use a general “inform” intent that will collect 1-word or short answers, parse them as a general entity, e.g. “name” or “location” and then have a custom action that will try to understand the exact meaning of the entity, e.g. is “name” a first_name or last_name, based on the state of the dialogue.

But we think there’s a better, more standard way to do this. If we use a number of specialized NLU models running on different servers that are dedicated to understanding only 1 entity and 1 intents, and invoke the specific server based on the dialogue state, we can solve this problem trivially.

Thus intent classification and entity identification will be implied by the state of the dialogue, similar to the way this process happens in human speech, while having a specialized NLU model for the intent/entity in question will enable a highly reliable way to detect and validate the user’s response.

Let’s apply our approach to the dialogue above. We will run 5 NLU servers: NLU_first, NLU_last, NLU_city, NLU_state, NLU_company. Each one of those servers will have an NLU model trained to identify the entity in a short (it’s taylor) or one-word response in a definite way, i.e. NLU_first will identify all entities as “first_name”, NLU_city as “city” etc. The intents will be classified similarly, based on the intent the server’s dedicated to.

We will also create a mapping between the utter_actions which prompted the user response and the NLU server to be used. So “ask_first_name” is mapped to NLU_first, “ask_last_name” is mapped to NLU_last, etc. We will also create a CustomAction that will be invoked following the general “inform” intent, look up the last question the bot asked the user, then map that question to the appropriate NLU microservice and send the user’s input to the NLU microservice for entity extraction and validation

Now the user-bot interaction looks like this:

B: Hi what is your name? (ask_first_name)

	U:  Georgia  

CustomAction: Send “Georgia” to NLU_first server for processing, receive back entity: “first_name: Georgia”, intent: “give_first_name”

B: what is your last name? (ask_last_name)

	U:   James  

CustomAction: Send “James” to NLU_last server for processing, receive back entity: “last_name: Georgia”, intent: “give_last_name”

B: which city do you live in? (ask_city)

	U:  Jackson  

CustomAction: Send “Jackson” to NLU_city server for processing, receive back entity: “city: Jackson”, intent: “give_city”

B: what state is that in? (ask_state)

	U:  Washington  

CustomAction: Send “Washington” to NLU_state server for processing, receive back entity: "state: Washington, intent: “give_state”

B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)

	U:  Madison  

CustomAction: Send “Madison” to NLU_company server for processing, receive back entity: “company_name: Madison”, intent: “give_company_name”

Here’s an example of how this type of dialogue would look in the stories:

  • greet

    • ask_first_name
  • inform{"text": “It’s Georgia"}

    • CustomActionCNLU (send to first_name NLU)

    • slot{“first_name": “Georgia"}

    • ask_last_name

  • inform{"text": “James"}

    • CustomActionCNLU (send to last_name NLU)

    • slot{“last_name": "James”}

    • ask_city

  • inform{"text“: “Taylor"}

    • CustomActionCNLU (sent to city NLU)

    • slot{“city": “Taylor”}

    • ask_state

  • inform{“text": “I live in Washington"}

    • CustomActionCNLU (send to state NLU)

    • {“state": “Washington"}

    • utter_Bye

With the architecture suggested above, where each specialized question has an NLU server that is dedicated to its processing, we can decouple the NLU service (Rasa_NLU) from dialogue management service (Rasa_core), while at the same time providing the dialogue context to the NLU service.

And if this is taken up by the community we can distribute our NLU intelligence through the public cloud. Imagine that you have to deploy your bot in a different country, where addresses and names have different format. Instead of rewriting your NLU training data and retraining your NLU models, you could conceivably just point your CustomActionCNLU to a different microservice and migrate in no time.

There are some limitations to this approach of course. You still a good model to identify the “inform” intent that will trigger the CustomAction call to the CNLU servers. And your CNLU servers have to use good models to be able to distinguish clearly irrelevant user replies, such as “Hello”, “Yes”, “No”, from legitimate reponses, as bizarre as they may be, e.g. Kennesaw Mountain Landis (that’s actually a person, not a place).

Here at Vocinity we hope that this approach to solving the general “inform” problem will help the community. We developed a proof-of-concept of this solution and hope to release our code for it soon.

Issue with entity detection - fails to detect outside of the training set
What is the best method for entity extraction for names?
Problem with entity extraction
Entity base on context
How to get entities depending on intents?
Slot filling based on intent
Lookup Table or Multiple Examples?
(Juste) #2

Super cool @lgrinberg! Thanks a lot for sharing! :slight_smile::rocket:

(Alan Nichol) #3

thanks @lgrinberg for sharing - really cool stuff

(Leo) #4

Would you like me to post the code we used to achieve this?

(Akshit) #5

It will be great to understand this with an example! :slight_smile:

(Leo) #6

I posted the files here:

A few notes: I ran the separate RASA_NLU servers on my local machine, so the URL’s are all localhost:$PORT_NUMBER, and the port number is the variable. Obviously you can run it from different URL’s, by modifying the code slightly. I also used pretty much the same NLU data for my different services. Using the same data for first names and last names worked very well, and it worked decently for state, county and city names. It worked really badly for profession names. That’s expected since profession names differ markedly in terms of format and content from the general pool of first/last names. Lastly I also used lookup tables of first names and last names, but you really don’t need to do that, as I didn’t use any lookup tables for city and county names.

The code is raw and needs cleaning up and reformatting but it should do the job.