Removing projects for Rasa NLU server

We are going to remove projects in the next major release for the Rasa NLU server. E.g., your Rasa NLU server will not be able to handle multiple projects anymore. It will just have a single Rasa NLU model loaded at any time.

Why are we doing this?

The reasons are that the usability for the every day user is hard:

  • locally when training models you often only need a single project, nevertheless you need to specify it everywhere
  • the project management parts of the codebase are quite fragile and lead to multiple out of memory and race condition issues at scale (e.g. models getting unloaded when someone is still using them)
  • the implementation does not work well together with asyncio
  • we need to unify model handling between Rasa Core & NLU after the merge and Rasa Core does not have the concept of projects

However, the idea is to reintroduce a similar logic at a later point. To be able to build a solution that fits everyone best, we would like to collect some feedback from you:

  • Are you using projects currently?
  • How do you use them?
  • Do you miss any functionality?

Please, feel free to comment with your feedback. Thank you!

2 Likes

Hi Rasa,

Actually, I am trying to use one Agent for different NLU models. Making the agent to switch the model and having many users are interacting with it, is very costly in terms of time and leads to bad user experience. On the other hand, if I make the multiple agents, it will be difficult to manage them. So, what does Rasa advise its users to do in such cases?

Thanks

No problem,you can use multiple agent for multiple models,

agent 1=Agent.load(model1)
agent2 = Agent.load(model2)

Thanks @soundaraj for your reply.

My case is different, I might had more than 7 models. In this case, I will need to think for a good approach to switch the agents. That is the reason why I mentioned in my comment it will be difficult to manage those agents.

1 Like

@varton

Hey,

I am not sure if I got your problem right, but currently I am using docker to manage this. Letā€™s say we have:

  • 3 customers
  • 2 bots per customer
  • 1 or more servers

In total that makes 6 different bots, all depending on different data. Actually I have developed a very efficient way to take JSON as input source and creating the bots out of it in one single request. Since the customers and their bots have unique Ids, generated from my DBMS, I am able to create 6 bots out of it with:

  • docker-compose file
  • NLG server service
  • Action server service
  • MongoDB shared DB service
  • MongoExpress service

To run them, even on the same server, you simply have to calculate different ports for the services such that the exposed ports inside the services remain as they have to be and the ports on the host are unique - meaning that ā€œspeaking with a particular botā€ just means: ā€œpicking the right portā€.

By doing it this way, scalability and maintenance are relatively easy.

Does that answer your question?

Hi,

Iā€™m interested in this as well. Do you run 6 rasa instances (on different ports) each with one model or a single rasa instance loading a model whenever a request comes for a specific agent?

Hi @nickopris,

actually those are 6 different instances with a model per each instance. Maybe I should explain it a little further:

Rasa leaves us many possibilities. It is possible to do it my way or many othersā€¦ in terms of performance, I didnā€™t want to load a model during runtime. Maybe there might be low latency while doing this, but imho I think itā€™s not the best architecture.

On the meta level there is a latency measurement and as soon as the response times goes over a predefined threshold, another container / worker / swarmnode could easily be provided with a load balancer. Thatā€™s why I would recommend docker or a docker-like architecture.

Actually the NLG-server seems to be a bottleneck in terms of performance and this is why I donā€™t want to ā€œloose timeā€ while loading/requesting the model.

Feel free to ask if you are interested.

Regards

That is exactly what I had in mind seeing that it takes a little for rasa to reset once I switch the models. I havenā€™t started on this yet, waiting on the client to give us the go ahead but will most likely have to do something similar.

Is it possible for you to share an example of your json file? I assume you just have entries for intents, actions, forms, slots, templates and stories which you parse and spread across in files that rasa uses for training?

What happens if you need a new custom action? Again I assume that it needs to be in place before your json file is uploaded?

Hi,

glad to hear that I could help. Regarding the JSONs:

Basically I have forked an empty rasa bot into a builder directory, meaning that during runtime, a python library fetches data needed for the bot and takes the builder directory to fill in the information needed. I decided that JSON is much easier to use than markdown in terms of serialization and persistence. The only thing remaining is the stories.md - the JSON is converted into proper markdown in the RASA input format. So in this step, there is actually no magic. The data for the JSON training files can be stored elsewhere. If Iā€™d recommend a message broker, Iā€™d choose:

https://www.rabbitmq.com/

since it could easily plugged into any pipeline.

Now the exciting thing: actions.py

I have asked myself how to generalize those CustomActions or FormActions - is it even possible? I thought about 3 scenarios:

  1. Having exactly one predefined GeneralAction(Action) and one predefined GeneralFormAction(FormAction) which does all the heavy lifting
  2. Accepting the fact that actions.py has to be predefined by a developer for specific cases
  3. Prepare actions.py in a way, that after instanciating, it adds every needed Class and their methods at runtime based on the information in domain.yml

For deciding how to proceed it is important to think about:

  1. What are the things needed at build time?
  2. What are the things needed at run time?

Keep in mind that of course it would be possible to just modify the source code since it is open source, but I wanted to keep it mostly update-safe. I canā€™t go into deep details in this posting since that would cause a wall of text - i try to keep my explanations simple, please donā€™t hesitate to ask.

For scenario one, you have to modify your stories and your domain since both Action-types needs the name which registers the action to rasa core. Then everytime you start an action or form, it will be the general one. The triggering point which decides which information actually has to be filled into the action depends on the intent which lead to the execution. If it is a feasible approach for you, you then have to modify the validation method such that it calls the ā€œoldā€ version: validation every slot in one single method. The rules for validation, slot_mapping and submit could be injected during runtime.

Scenario two tends to be self-explanatory. Imho a feasible way would be to open the file in browser and let a developer decide how to proceed. This way has a strong dependancy to UI/UX since it is advanced.

Actually scenario three has been implemented by me using a facade pattern to add classes and methods at runtime. After longer investigation, I donā€™t believe that this is a proper approach since it heavily depends on validating every rule that inserts a class or a method. A single change in the source data could cause several other problems and for productive use I really canā€™t recommend this version for now.

If I should help you or give you more advices, it would be crucial to know more about your specific scenario because I think generalization is good BUT has itā€™s limits.

Regards

Thank you for taking the time to detail this.

Iā€™m not at the stage of building this yet, so in theory this sounds great. The single predefined generic Action and Form classes will take us only so far. They may be okey as a starting point and depending on client requirements it can expand to a lot of custom code when data need to be fetched/pushed from/to various external services.

Iā€™ll know more once I get a full spec from client. Then the fun begins. :slight_smile:

I think that removing projects and model significantly increase infrastructure complexity since you have to deploy multiple NLU instances. IMHO itā€™s far more difficult to deploy Rasa in production if your current platform allows creating bots at any time.

I understand why this was done, but still, that is my feedback.

@Tanja

Right now we utilize multiple projects within the context of a single RASA instance as each project contains domain-specific information (and subsequently) a model that the bot is trained on. With this change, we would theoretically need to bring up a single instance of RASA for each project (Model) running on a separate port on the same box or running on a separate server using the same port. Did I understand that correctly? What is the recommendation on that?

We do not build our application services in Python, instead we use .NET Web API to interact with the Rasa instance via API. The intention is that the NLU service will score and return a result back to the WebAPI and only then our application can progress.

@saucepleez

I am using bot framework for the same. What it does is create instance of rasa NLU at every bot framework user & then works accordingly like normal api calls. Still in developing phase but i am integrating nlu pipeline with Microsoft bot framework for much bigger user base

I understand where you are going with that. That makes sense I guess in some cases but in the context of our usage, we do not have the luxury of using cloud or external hosted providers for the work due to data sensitivity and privacy issues, there is no compromise on that. I also do not have the convenience of simply ā€˜spinningā€™ up containers and simply assigning each project to a container IP for training and scoring.

The only option left is as mentioned above which is to figure out how to expose each ā€œbotā€ or ā€œmodelā€ on a different port on the same machine. So we will need some way to constantly invoke a new CLI instance or manage multiple CLI instances. This sounds like more of a headache to manage, especially as we do not have readily access to ā€œproductionā€ servers in a secure development environment.

Bit late to the party here, but just to add with our use case, we are currently using nlu only docker image in kubernetes and use projects to separate models used in an account based chatbot platform. Each account has a different model. By looking at the current rasa documentation etc it looks like this already in place as there doesnā€™t appear to be any options for projects in the current docker rasa/rasa image. Current plan is to do this by running rasa servers per account which is quite inefficient from an infrastructure usage and management standpoint. Are there any further plans for allowing multiple models in the future?

1 Like

@Tanja is there any update on this? This thread and the closed github issue have been quiet for a while.

Does this work have a place on the roadmap yet?

1 Like

Hi @jamesmf. We have spoken to a bunch of users and learned that most have found alternatives, so we are leaning towards not reintroducing them in the near term. Happy to hop on a quick zoom call to hear more about how you are using them, if you are up for it

Iā€™m currently looking at alternatives to this as I possibly will be needing a large number of individual bots. Each having their own separate web server seems to be a bit over provisioned but I will know more as I dig in. Iā€™m slightly disappointed to hear it wonā€™t be brought back into the code base in the near term.

@tyd Could you maybe do a blog post on these alternatives that are working for people? My situation is a voice-based language practice app that has 20+ different models. Each of these represents a different domain (ordering at restaurant, job interview, exchanging currency, etc.). Spinning up 20 Rasa servers and then figuring out how to scale individual higher-demand scenario servers seems like a pretty tough approach.

Iā€™m wondering if itā€™s feasible to jam everything into one giant model and then pass in the current scenario context to custom NLU pipeline components that will only predict intents from the scenario context (e.g., all intents are prefixed by scenario code like job_greet, job_deny, ā€¦ restaurant_greet, restaurant_denyā€¦).

Then if context:ā€˜jobā€™ is sent in, the pipeline will discard all non job_* intents in its prediction.

Or is that too naive?

I thought of something similar @mmm3bbb but not sure how possible it would be. There would have to be some guarantee that the model could never switch to a different context. How disruptive would that be if you were doing an interview and then asked if you would like fries with that :slight_smile:. And if companies are building these for clients the last thing they would want is a mix up of one company with another.