Visualize training metrics in TensorBoard

With Rasa Open Source 1.9, we added support for TensorBoard. TensorBoard provides visualizations and tooling for machine learning experiments. In Rasa Open Source 1.9 we use TensorBoard to visualize training metrics of our in-house built machine learning models, i.e. EmbeddingIntentClassifier, DIETClassifier, ResponseSelector, EmbeddingPolicy, and TEDPolicy. Visualizing training metrics help you to understand if your model has trained properly. You can, for example, see if you have trained your model long enough, i.e. if you have specified the correct number of epochs. If you enable the option to evaluate you model every x number of epochs on a hold out validation dataset, via the options evaluate_on_number_of_examples and evaluate_every_number_of_epochs, you can also see if your model generalizes well and does not overfit.

How to enable TensorBoard?

To enable TensorBoard you need to set the model option tensorboard_log_directory to a valid directory in your config.yml file. You can set this option for EmbeddingIntentClassifier, DIETClasifier, ResponseSelector, EmbeddingPolicy, or TEDPolicy. If a valid directory is provided, the training metrics will be written to that directory during training. By default we write the training metrics after every epoch. If you want to write the training metrics for every training step, i.e. after every minibatch, you can set the option tensorboard_log_level to "minibatch" instead of "epoch" in your config.yml file.

After you trained your model, for example via rasa train, all metrics are written to the provided directory. The directory will contain a subdirectory with the model name and another subdirectory with a timestamp. This allows you to reuse the same directory for multiple models and runs. To start TensorBoard execute the following command:

tensorboard --logdir <path-to-directory>

Once you open a browser at http://localhost:6006/ you can see the training metrics.

Let’s take a look at an example

The following config was used to train Sara.

pipeline:
  - name: WhitespaceTokenizer
  - name: CRFEntityExtractor
  - name: CountVectorsFeaturizer
    OOV_token: "oov"
    token_pattern: (?u)\b\w+\b
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: EmbeddingIntentClassifier
    epochs: 50
    ranking_length: 5
    evaluate_on_number_of_examples: 500
    evaluate_every_number_of_epochs: 5
    tensorboard_log_directory: ".tensorboard"
    tensorboard_log_level: "epoch"

As you can see that we specified a TensorBoard log directory. We also specified that we want to evaluate our model every 5 epochs on hold out validation dataset. After we trained the model we can see the training metrics in TensorBoard.

The orange curve corresponds to the hold out validation dataset and the blue curve shows the metrics for the training data (see legend on the left). If you use, for example the DIETClassifier, you will see plots for the following metrics: i_loss, i_acc, e_loss, e_f1, and t_loss. i is short for intent, e for entities, and t_loss shows the total loss.

We might add further support for TensorBoard in the future. Until then we would love to hear your feedback and ideas what else we could add to TensorBoard.

3 Likes

Hi @Tanja,

if anyone stumbles upon a blank white page after following your tutorial: this is an already known issue which can be fixed by downgrading tensorboard to 2.0.0.

Thanks for your explanation!

Kind regards
Julian

1 Like