Experimentation framework support


I am trying to learn the experimentation frameworks that can be used to manage Rasa experiments.

We run a large number of experiments for hyperparameter tuning, and are trying to find methods to manage the experiments more efficiently, i.e be able to visually track experiments, inspect precision/accuracy differences, view configs, and optionally integrate with a downstream model registry.

Is there any first class framework support available to do this (either open source or commercial solutions would work)


Hi Sumeet,

usually, before advising folks on doing a large grid-search I always prefer to give a small warning. Have you shown your assistant to actual users already? If not, I’d like to briefly mention this venn diagram.

There may be a risk that you’re optimizing for intents and entities that your users aren’t interested in. If you’ve collected data from a source that isn’t actual users then this is a common risk to be aware of. In these situations it would be better to put effort into data quality instead of looping over many settings.

That said for running large grid searches we’ve got a few resources that might help.

  • There’s a small benchmarking guide here. This is the pattern that I personally like to follow. I have a custom script that uses jinja to generate lots of settings files.
  • I’ve created rasalit to help with visualising these results.
  • We’ve also got a nlu-hyperopt project that might make it easier to run larger grid-searches.
  • If you use DIET/TED then you can also configure them to log to tensorboard. There’s an introductory blogpost here but be sure to read the docs too.

Related: I’ve been thinking about adding support for weights and biases in rasa nlu examples but have yet to start investigating that.

Wow! This is really helpful. Thank you @koaning for sharing.