`checkpoint_model: True` leads to `KeyError: 'val_i_acc'` after 2.4.0 update

After updating from 2.3.4 to 2.4.2 and training, I get this error:

Epochs:   4%|█████▏                                                                                                                    | 6/141 [00:42<15:50,  7.04s/it, t_loss=3.81, i_acc=0.945, e_f1=0.706]
Traceback (most recent call last):
  File "E:\Program Files\Python\Python38\lib\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "E:\Program Files\Python\Python38\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\__main__.py", line 134, in <module>
    main()
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\cli\train.py", line 58, in <lambda>
    train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\cli\train.py", line 90, in train
    training_result = rasa.train(
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\train.py", line 94, in train
    return rasa.utils.common.run_in_loop(
  File "E:\Documents\USJ\Engineering\Semestre 6\FYP\Workspace\_venv\lib\site-packages\rasa\utils\common.py", line 307, in run_in_loop
    result = loop.run_until_complete(f)
  File "E:\Program Files\Python\Python38\lib\asyncio\base_events.py", line 608, in run_until_complete
    return future.result()
  File "E:\...\_venv\lib\site-packages\rasa\train.py", line 163, in train_async
    return await _train_async_internal(
  File "E:\...\_venv\lib\site-packages\rasa\train.py", line 342, in _train_async_internal
    await _do_training(
  File "E:\...\_venv\lib\site-packages\rasa\train.py", line 388, in _do_training
    model_path = await _train_nlu_with_validated_data(
  File "E:\...\_venv\lib\site-packages\rasa\train.py", line 812, in _train_nlu_with_validated_data
    await rasa.nlu.train(
  File "E:\...\_venv\lib\site-packages\rasa\nlu\train.py", line 115, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "E:\...\_venv\lib\site-packages\rasa\nlu\model.py", line 209, in train
    updates = component.train(working_data, self.config, **context)
  File "E:\...\_venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 854, in train
    self.model.fit(
  File "E:\...\_venv\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "E:\..\_venv\lib\site-packages\rasa\utils\tensorflow\temp_keras_modules.py", line 229, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "E:\...\_venv\lib\site-packages\tensorflow\python\keras\callbacks.py", line 416, in on_epoch_end
    callback.on_epoch_end(epoch, numpy_logs)
  File "E:\...\_venv\lib\site-packages\rasa\utils\tensorflow\callback.py", line 68, in on_epoch_end
    if self._does_model_improve(logs):
  File "E:\...\_venv\lib\site-packages\rasa\utils\tensorflow\callback.py", line 90, in _does_model_improve
    [
  File "E:\...\_venv\lib\site-packages\rasa\utils\tensorflow\callback.py", line 91, in <listcomp>
    float(current_results[key]) > self.best_metrics_so_far[key]
KeyError: 'val_i_acc'

It’s caused by having checkpoint_model: True in the pipeline. Training works after removing it.

Minimal reproducible example:

  • rasa init in empty folder
  • Change config.yml to this: config.yml (1.1 KB)

Opened an issue on GitHub.

Thanks for reporting, this makes sense to address on Github since it is a bug report =)

1 Like