Removing projects for Rasa NLU server

Another VA product that I work on has this feature and includes algorithms that intelligently switches between “projects”. ML as an underlying technology suffers from lack of modularity. It is pretty hard to “divide and conquer” using the current approach so complexity does not scale well. I doubt that one could get it completely right in a single attempt. Evolution is needed. It would be good to get multiple ideas on the table at once to stimulate thinking. My 2 cents worth.

@maybeno Yeah, I guess I’d be uncomfortable with multiple clients sharing the same model, but for a single app that just needs to enforce partitioning, it seems like it work. I just haven’t dug into the pipeline protocol and components in much detail yet.

Arguably the problem is just NLU-related, because none of the dialog training would cross context so it seems unlikely that any of the policies would make an illegal prediction (and, again, some failsafe could probably be put into place).

A related thought (perhaps uglier) is to modify the training pipeline to insert an intent-specific token in all the training data. And then at runtime, insert the same token, depending upon context.

e.g.,

intent:restaurant_greet

  • hi there
  • hello

… When training, the pipeline rewrites the first part of the intent name by inserting:

intent:restaurant_greet

  • restaurant_magic_token hi there
  • restaurant_magic_token hello

So the NLU model is actually trained with that unique token. Then at runtime, the user enters:

“hey there” into the restaurant context chatbot and then this is turned into “restaurant_magic_token hey there” before being processed by Rasa.

Slightly embarrassing to write that, but it might work?

@gvasend I guess the issue is whether the approach is a “soft” partition within a model or a hard partition outside a model. Is that a reasonable way to think about it? Is there something in-between?

Hi @mmm3bbb,

I think putting everything in a single instance would cause performance issues in the end and you will loose your overview regarding the needs of your system represented in a single config file.

How about a two step approach:

  1. Set up a server that provides an API which takes the domain context to.identify the service that holds the particular model for that and passes the request through.
  2. Setup a docker environment in which a docker-compose file orchestrates every single rasa instance that is needed for the languages (even an action server can be run alongside)
  3. Pass a given domain context labeled input to its specific docker instance

If both “systems” are placed on one server the http performance (or whatever you might want to use) would be acceptable and you have a highly scalable architecture (CloudFoundry, Kubernetes).

Most important: every language is kept separate which reenables the overview and configuration capabilities.

What do you think?

Regards Julian

@mmm3bbb I think it depends on the problem. In reality, complex problems may have overlapping domains. Trying to get a one size fits all approach to NLU ends up with a lot of ambiguity in a complex, overlapping domain(s). If the problem is fairly narrow and not interrelated with other domains then hard partitioning helps to simplify control over the user experience. It is a pretty complex problem so I would take baby steps. First, allow “projects” to coexist (i.e. break down barriers). I would make sure that partitions can be programmable (hard or soft) and then as experience with that architecture evolves one can add more intelligence to the partition. Personally, I think this is an interesting baby step towards AGI. I would focus on making it flexible so simple problems can be done simply and yet allow adding more intelligence to complex overlapping problems.

Another variation of the approach would be to encapsulate a project at one level and then have a level up that allows interoperation between projects.

@JulianGerhard Right now I have two Rasa servers (and action servers) running and a Flask app that accepts requests from the app (which includes a scenario id) and routes it to the correct Rasa server’s REST connector.

So I’d replace the two Rasa servers with what you suggest. Definitely worth looking into. I’m a little scared what running 20 Rasa instances plus 20 Action servers times whatever horizontal scaling is required means for system resources though. :-/

I guess one nice thing about this is that I might be able to move some of the smalltalk into a single Rasa model and out of the individual scenario models… and then cache smalltalk results in the Flask app.

1 Like

Hey @tyd could you elaborate on what alternatives other users have found?

Also would love to take you up on the offer to do a zoom call if you’re available. We’re trying to scale out our project to multiple customers (each with their own model) and not having projects is making that infeasible.

1 Like

I’m using multiple containers for two different chatbots, which I access through a single frontend “queue” API, that sends the messages to different containers, according to destination.

That way I can swap models for each chatbot.

Each Rasa instance has its own action server, that I can update separately.

1 Like

An alternative is what we’re doing to support multilingual bots. We forked and changed Rasa so a single bot supports an arbitrary number of languages (1 model for every language), and we also adapted the fingerprints so you only train a language if its data or pipeline changed. Usually, a project has 2 or 3 languages, this probably doesn’t scale that well as memory requirements can grow a lot with many projects. See GitHub - botfront/rasa-for-botfront: A fork to be used with Botfront, an open source chatbot platform built with Rasa.

1 Like

One approach we have been working on is to have NLU only server for each model front ended by a proxy (you can use AWS ALB or something with similar capabilities. So, in order to determine context for the first request that comes in, the proxy would send the request to all NLU servers for intent classification. The NLU server that returns the highest probability for the matched intent wins. Once the context is established, subsequent request can directly go the NLU server and downstream components behind like rasa core, actions server etc.

Can you explain how to load models dynamically as run time.

Or you can load models in dictionary dynamically

I also have a similar requirement, I am using 10 different chatbots in my organization, and using 10 different high configuration servers will increase the cost heavily projects would have been my goto for it but it has been removed, I am still trying to find alternatives without workarounds.

2 Likes

Have you find out the way as I am also in same situation having 30 different use case chatbots & are multi lingual too. Your guidance would save a lot

1 Like