Providing conversation context to the NLU using microservices

OVERCOMING LIMITATIONS OF CONTEXT-INDEPENDENT RASA NLU SERVER

English, especially in its American variety, is a language whose words tend to be highly ambiguous in their meaning. It has multiple words that can simultaneously be nouns, verbs and adjective (e.g. access, hit, cool). Names, both personal and entity, such as business, represent a similar problem. Unlike other languages, where specialized words are used for first (given) names and even for last names (surnames), in English (once again especially in its American variety), almost any word could potentially end up as a first name or last name. Not just nouns like “taylor” but even adjectives such as “sharp”, “strong”, “smart” can be used as first names or last names. And due to English and American custom of using last names as first names (e.g. Stuart, Madison, Washington, Jefferson, Lincoln, Grant, Jackson etc.), first names as last names (e.g. James, Warren, David, Abraham) and both of those as place names or company names, trying to figure out what a word means without any context is very difficult.

Rasa NLU does a good job of extracting the specific meaning of an ambiguous word used in a particular context (disambiguation). Given enough training data, the word “taylor” in “my name is Taylor” and “I’ve been working as a taylor for 15 years” will be properly parsed by Rasa NLU as {name} in the first case and {profession} in the second. But as Rasa NLU depends on context to provide disambiguation, when no context or limited context is provided, disambiguation becomes impossible. Consider the following dialogue:

B: Hi what is your name? (ask_first_name)

	U:  Georgia  (give_first_name)

B: what is your last name? (ask_last_name)

	U:   James  (give_last_name)

B: which city do you live in? (ask_city)

	U:  Jackson  (give_city)

B: what state is that in? (ask_state)

	U:  Washington  (give_state)

B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)

	U:  Madison  (give_company_name)

We, as humans, understand that we are talking with Georgia (first name) James (last name), who lives in Jackson, Washington (state) and works for a company called “Madison”. But it is impossible for the Rasa_NLU to classify these responses as to their intent or to extract entities out of them, since any one of the responses could be, variously, a first name, a last name, a city name, a state name or a company name. This is due to the fact that Rasa NLU’s models are context-independent. Having RasaCore use a context-independent NLU model makes it much simpler to create the NLU models but also leads to the limitation described above.

The current approach to overcome this limitation is to use a general “inform” intent that will collect 1-word or short answers, parse them as a general entity, e.g. “name” or “location” and then have a custom action that will try to understand the exact meaning of the entity, e.g. is “name” a first_name or last_name, based on the state of the dialogue.

But we think there’s a better, more standard way to do this. If we use a number of specialized NLU models running on different servers that are dedicated to understanding only 1 entity and 1 intents, and invoke the specific server based on the dialogue state, we can solve this problem trivially.

Thus intent classification and entity identification will be implied by the state of the dialogue, similar to the way this process happens in human speech, while having a specialized NLU model for the intent/entity in question will enable a highly reliable way to detect and validate the user’s response.

Let’s apply our approach to the dialogue above. We will run 5 NLU servers: NLU_first, NLU_last, NLU_city, NLU_state, NLU_company. Each one of those servers will have an NLU model trained to identify the entity in a short (it’s taylor) or one-word response in a definite way, i.e. NLU_first will identify all entities as “first_name”, NLU_city as “city” etc. The intents will be classified similarly, based on the intent the server’s dedicated to.

We will also create a mapping between the utter_actions which prompted the user response and the NLU server to be used. So “ask_first_name” is mapped to NLU_first, “ask_last_name” is mapped to NLU_last, etc. We will also create a CustomAction that will be invoked following the general “inform” intent, look up the last question the bot asked the user, then map that question to the appropriate NLU microservice and send the user’s input to the NLU microservice for entity extraction and validation

Now the user-bot interaction looks like this:

B: Hi what is your name? (ask_first_name)

	U:  Georgia  

CustomAction: Send “Georgia” to NLU_first server for processing, receive back entity: “first_name: Georgia”, intent: “give_first_name”

B: what is your last name? (ask_last_name)

	U:   James  

CustomAction: Send “James” to NLU_last server for processing, receive back entity: “last_name: Georgia”, intent: “give_last_name”

B: which city do you live in? (ask_city)

	U:  Jackson  

CustomAction: Send “Jackson” to NLU_city server for processing, receive back entity: “city: Jackson”, intent: “give_city”

B: what state is that in? (ask_state)

	U:  Washington  

CustomAction: Send “Washington” to NLU_state server for processing, receive back entity: "state: Washington, intent: “give_state”

B: Do you work for Taylor Systems or Madison Inc.? (ask_company_name)

	U:  Madison  

CustomAction: Send “Madison” to NLU_company server for processing, receive back entity: “company_name: Madison”, intent: “give_company_name”

Here’s an example of how this type of dialogue would look in the stories:

  • greet

    • ask_first_name
  • inform{"text": “It’s Georgia"}

    • CustomActionCNLU (send to first_name NLU)

    • slot{“first_name": “Georgia"}

    • ask_last_name

  • inform{"text": “James"}

    • CustomActionCNLU (send to last_name NLU)

    • slot{“last_name": "James”}

    • ask_city

  • inform{"text“: “Taylor"}

    • CustomActionCNLU (sent to city NLU)

    • slot{“city": “Taylor”}

    • ask_state

  • inform{“text": “I live in Washington"}

    • CustomActionCNLU (send to state NLU)

    • {“state": “Washington"}

    • utter_Bye

With the architecture suggested above, where each specialized question has an NLU server that is dedicated to its processing, we can decouple the NLU service (Rasa_NLU) from dialogue management service (Rasa_core), while at the same time providing the dialogue context to the NLU service.

And if this is taken up by the community we can distribute our NLU intelligence through the public cloud. Imagine that you have to deploy your bot in a different country, where addresses and names have different format. Instead of rewriting your NLU training data and retraining your NLU models, you could conceivably just point your CustomActionCNLU to a different microservice and migrate in no time.

There are some limitations to this approach of course. You still a good model to identify the “inform” intent that will trigger the CustomAction call to the CNLU servers. And your CNLU servers have to use good models to be able to distinguish clearly irrelevant user replies, such as “Hello”, “Yes”, “No”, from legitimate reponses, as bizarre as they may be, e.g. Kennesaw Mountain Landis (that’s actually a person, not a place).

Here at Vocinity we hope that this approach to solving the general “inform” problem will help the community. We developed a proof-of-concept of this solution and hope to release our code for it soon.

20 Likes

Super cool @lgrinberg! Thanks a lot for sharing! :slight_smile::rocket:

1 Like

thanks @lgrinberg for sharing - really cool stuff

Would you like me to post the code we used to achieve this?

3 Likes

It will be great to understand this with an example! :slight_smile:

I posted the files here:

A few notes: I ran the separate RASA_NLU servers on my local machine, so the URL’s are all localhost:$PORT_NUMBER, and the port number is the variable. Obviously you can run it from different URL’s, by modifying the code slightly. I also used pretty much the same NLU data for my different services. Using the same data for first names and last names worked very well, and it worked decently for state, county and city names. It worked really badly for profession names. That’s expected since profession names differ markedly in terms of format and content from the general pool of first/last names. Lastly I also used lookup tables of first names and last names, but you really don’t need to do that, as I didn’t use any lookup tables for city and county names.

The code is raw and needs cleaning up and reformatting but it should do the job.

4 Likes

Thank you the the detailed write up! We at Dialogue (dialogue.co) are also thinking about providing more context to NLU. Our immediare use case is support for multi-language chats ( user’s language preference is the “context” an NLU service needs)

Have you considered the idea of extending the interpreter to pass the tracker object into it, so that the interpreter can use the slots to invoke the right NLU endpoint. Furthermore, passing the tracker state into /parse API so that the NLU service can use the slot values as context.

So we did it differently, because we did not want to modify the existing Rasa code and because in our use case we only need to provide context after an “inform” intent, so we don’t need the context passing and processing built into the interpreter that’ll be invoked on every user message. Your case, if I understand it correctly, is that you will have user input in French and English and have two different models to use for the NLU, but with the same bot logic. So you want to run one bot with two different NLU models and send your user input either to French or English NLU model throughout the conversation based on a slot that you set in the beginning. We have a similar problem in case of a multi-user conference, where we have track which user said what and interpret different users’ input as different intents (e.g. user_a_hello and user_b_hello) . But instead of extending the interpreter, we are thinking of using a proxy server to work as a RasaNLUHTTPInterpreter, so that rasa_core would send the messages to what it thinks is a Rasa NLU server and then that proxy server would distribute the messages accordingly.

If you insert a proxy server that RasaNLUHTTPInterpreter would be calling, wouldn’t that server needs to receive the tracker state, in order to determine which NLU service to call?

In the implementation I’m working on, I subclass RasaNLUHTTPInterpreter to create a ContextualRasaNLUHTTPInterpreter that accepts the tracker as a parameter to the parse() method, analyzes the slot value and determines the the NLU HTTP endpoint to call.

You’re right, that is the better way to do it. We were thinking of avoiding passing the tracker to the interpreter, by having the proxy server query an external server which would store the speaker_identity value passed from our gateway to the external server. So while we wouldn’t be touching the interpreter, we’d need 3 pieces here, our voice gateway that would tell us which speaker the message came from and the server that would store the speaker_identity.

But besides being cumbersome, this approach has the fatal flaw of only being able to support one simultaneous conversation. So in the end we decided to pass the tracker to the Interpreter too.

Great, we all seem to be aligned that making interpreter aware of the conversation state would be a good addition to Rasa.

I’ve logged Dynamic configuration of NLU endpoint (Tracker Aware Interpreter) · Issue #1764 · RasaHQ/rasa_core · GitHub, will submit a PR, and would love your feedback.

1 Like

I built a very crude proof-of-concept for this, you can pass the tracker to the interpreter in the _parse_message in the processor.py, and then configure the interpreter to take the tracker as an input and then put your custom functionality in it.
It works. We’re working to build a cleaner proof-of-concept now, hope to have it ready by next week

Many thanks @lgrinberg for sharing your concept.

Your example is related to word with different meaning (e. g. last name vs. city). This is a good example where context matters, but there are so much more situation where context is key. Also words, which are not ambiguous, can have a different meaning depending on the context.

Let’s say we have a chatbot that talks about renting or buying real estate:

  1. Context: None / User message: “apartment”
    • Since we have no context, we basically do not know what the intent of the user is (even he may think it is clear, that he wants to buy an apartment)
  2. Context: Rental / User message: “apartment”
    • The user mentioned that he wants to rent real estate. Since we are in the rental context it is clear that he wants to rent an apartment (and not a house). The utterance “I want to rent an apartment” and the utterance “apartment” should match the same intent
  3. Context: Buy / User message: “apartment”
    • Another user is more interested in buying an apartment and mentioned that in the previous dialog. Now “apartment” should match the same intent as “I would like to buy an apartment”

Another great example is “yes” and “no”, which meaning depends fully on the context (“yes” can be equal to “yes, I want to rent a house” OR “yes, I want to sell my apartment”)

These examples show two things:

  • A single word can have different intents depending on the dialog context
  • The same utterances/samples can appear in multiple intents

Your approach is definitely a way to solve this problem with Rasa, but it gets extremely complex, if you have a lot of context information. It would be really great, if it would be possible to define a matching context for the defined intents (“Only match in the context…”). Some comercial NLU platforms like Dialogflow support this and it makes it super easy to configure the intents.

@Juste, @amn41 I am wondering about the opinion of the Rasa team. Is this something you guys see in a future release, or do you think it is not an important feature? Do you have any other suggestion on the context topic?

Many thanks to all of you!

Alex, are you working on a PR for this? I was thinking of modifying the Rasa code myself to take multiple interpreters from command line and pass the tracker to them, but don’t want to duplicate efforts.

You guys are doing some great work, and I think you’re heading in the right direction. But, I’d like to interject an idea if I could. I may be trying to solve a completely different problem, but it seems that the solution is the same, i.e. narrowing the context Rasa is trying to work in. The difference between your problem space and mine is that you’re working with a question-response system, and I’m working with a command-and-control system.

Let me provide my context. I’m working on what might be called an “intermediate” sized project. Currently, it has 45 distinct intents with 24 custom entities, where nearly all the entities could be any free form text. Essentially what happens is the user tells the computer how to control a web browser. I’m starting to think this just might be the worst case scenario for any sort of NLU. This project was working reasonably well using Dialogflow on the backend, but I’ve been tasked with removing the program’s requirement for a network connection. (Sigh!) And, so begins my journey to learning NLU :slight_smile:

It is clear to me that understanding without context will not work in my case. How would Rasa ever be able to pick the button entity out of, “Click on the ‘Send Igrinberg a message’ button.” So directing each utterance to a model with custom entities for just that intent is the solution. The issue I have with the micro-services approach is the deployment and maintenance of 45 services, especially as model development is ongoing. The process would be to train 45 context models plus the “interpreter”, maintain the mapping structure, and then deploy them all.

Would it be possible to modify Rasa to do this for us? What I’m thinking is a modification to the pipeline. Rasa would do exactly what it does now with two additions.

A) When training, there would be an overall model created that covers all intents, just like it is done now. In addition, a model for each individual intent that only used data for that intent would also be created.

B) When used, the exact same pipeline would be used up to the point that an utterance is classified to an intent. At that point, the utterance would be handed off to the model of the leading intent, where the pipeline would be repeated. Spacy or Duckling could be used, but the real work would be done by a custom recognizer like ner_crf or mitie.

Rasa would respond with the output from the specific intent.

Again, I’m only a couple weeks into this whole NLU world, so I’m kind of hoping someone will respond that this is done already if I just flip a configuration switch somewhere. But, from a production perspective, I think this would be transparent to the maintainer and user.

I am not sure I understood you correctly, but I think you don’t want to go through the hassle of using microservices and instead want those extra contextual models “built-in”. The advantage of the microservices approach is that it uses existing Rasa functionality. But to avoid the hassle of maintaining separate microservices, I’ve been working on modifying Rasa code to use multiple NLU interpreters. I’ll probably post a write-up on the code changes (use at your own risk) by the end of the month or so. If you want the information quicker, send me an email at leo@vocinity.com

@lgrinberg sir can you provide me this github repo ya Other example ya any other blog.Thank in advance.i waiting for your favorable reply.

If using this, I guess it is not possible to handle the case like this:

bot: what is your first name?
user: Mike
bot: what is your last name?
user: oh sorry, first name is John, and last name is Mike

correct me if i am wrong.

No, I think you could. You would have to extend your microservices model to support multiple entities. Your regular model should be able to understand "first name is {first name}, last name is {last name}. So if you merge your regular model training data for this intent and also train it to understand the one-word answer as {name}, your microservice could return {name}, {first_name} or {last_name} as entity. and then your action code should be smart enough to understand that {name} should be interpreted as {last name} based on the context (last question asked). I hope that makes sense.

I tried to build and apply joint model (intent + entity) on NLU instead of using pipelines but I would like to give this a try.

Very good idea!

One thing though, I am guessing the micro NLU servers can only be created as many as the number of the ports? This may mean that if there is a big chatbot that supports multiple domains, there needs to be a lot of micro NLU servers correct?