Replacing the NLU Pipeline by a custom Interpreter in Rasa 2.0

Hello,

I am seeking your advice on how to replace the NLU pipeline in Rasa 2.0 and on an “interpreter injection” concern I noticed during the training phase.

First, let’s discuss the discrepancy between rasa train core and rasa shell.

The startup sequence of rasa train core and rasa shell is different: when running rasa shell the create_interpreter factory function is called before the agent is created, the factory uses the EndpointConfig (from endpoints.yml) to instantiate our custom interpreter and forwards that interpreter to the Agent. When running rasa train core, the creation of the interpreter is delegated to the Agent, the EndpointConfig parameter is not forwarded to the Agent’s contructor, the Agent calls the factory and gets a RegexInterpreter.

My concern is that the RegexInterpreter.featurize_message method function would be called during policy training instead of our own featurization function. For now the RegexInterpreter.featurize_message “does nothing” so there would not be a mismatch in the featurization.

Please let me know what you think: from my perspective, the startup sequence of rasa train needs to be updated to allow instantiating a custom NLU (or a RasaNLUHttpInterpreter) to make sure that the core featurization introduced in New core featurization #6296 uses the right interpreter when featurizing instead of RegexInterpreter’s featurization function.

  • I found out that this major refactoring is in progress: Refactor Agent / Processor / TrackerStore #5257, so perhaps the “interpreter injection” concern I am raising will be addressed in-or-around this issue and for the moment I can experiment with my custom interpreter with the certainty that there is no featurization happening.

Now let’s discuss the best way to implement a custom NLU in Rasa 2.0.

I would like to confirm that 1.0 mechanics described in the Legacy Docs is still supported and that it is still the recommended approach. I am referring to these articles:

I am able to instantiate a RasaNLUHttpInterpreter with the sample endpoints.yml below during the evaluation phase (rasa shell), therefore I believe that the legacy documentation is still valid.

# endpoints.yml
nlu:
  type: http
  url: http://my.nlu.server:5000/nlu

I have identified two other approaches:

  1. Keep using the Rasa NLU pipeline and replace the components with a single, custom component that extends rasa.nlu.components.Component

  2. Replace the Rasa NLU by a custom interpreter that extends rasa.shared.nlu.interpreter.NaturalLanguageInterpreter (or RegexInterpreter)

    • Pro:
      • Full control over the behavior of the parse and featurize_message methods
      • Function invokation instead of REST call.
    • Cons:
      • Not much documentation available online on how to proceed (not a big deal)
      • As mentionned above rasa train core is not loading my interpreter

The second approach is very similar to using a RasaNLUHttpInterpreter except that I can skip a REST call to the server. It has the caveat that right now I can’t instantiate it during the traing phase.

Please let me know what you recommend.

Thanks for the help! Simon