Incompatible Tensor Shape when training DIETClassifier

Dear friends,

I unfortunately ran into incompatible tensor shapes when training the DIETClassifier. I essentially use the Spacy pipeline recommended in Tuning Your NLU Model , with the difference that I remove the two CountVectorFeaturizer.

My pipeline (a slightly modified variant of the one generated by rasa init) is as follows:

language: en

pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

Rasa version:

Rasa Version     : 2.0.6
Rasa SDK Version : 2.0.0
Rasa X Version   : None
Python Version   : 3.7.3
Operating System : Darwin-18.7.0-x86_64-i386-64bit

The full stack trace is as follows:

Traceback (most recent call last):
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/bin/rasa", line 10, in <module>
    sys.exit(main())
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/cli/train.py", line 90, in train
    nlu_additional_arguments=extract_nlu_additional_arguments(args),
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/train.py", line 55, in train
    loop,
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/utils/common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/train.py", line 110, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/train.py", line 207, in _train_async_internal
    old_model_zip_path=old_model,
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/train.py", line 246, in _do_training
    additional_arguments=nlu_additional_arguments,
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/train.py", line 547, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/nlu/train.py", line 114, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/nlu/model.py", line 206, in train
    updates = component.train(working_data, self.config, **context)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 768, in train
    self.component_config[BATCH_STRATEGY],
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 184, in fit
    ) = self._get_tf_train_functions(eager, model_data, batch_strategy)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 426, in _get_tf_train_functions
    train_dataset_function, self.train_on_batch, eager, "train"
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 408, in _get_tf_call_model_function
    tf_call_model_function(next(iter(init_dataset)))
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/Users/alexeyrodriguez/.pyenv/versions/3.7.3/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  In[0] mismatch In[1] shape: 2 vs. 1: [1,4,1,2] [1,4,1,64] 0 0
	 [[node text_encoder/transformer_encoder_layer/multi_head_attention/MatMul_1 (defined at /lib/python3.7/site-packages/rasa/utils/tensorflow/transformer.py:309) ]] [Op:__inference_train_on_batch_6896]

Errors may have originated from an input operation.

Hi @alexey.rodriguez, just so I understand: does this error happen only when you aren’t using CountVectorFeaturizer?

Thank you for the reply.

I remember I managed to break it with a different variation of the pipeline but I can’t reproduce at the moment. If I do I will post.

Alexey

Hi there, I think I found the issue here.

Basically I was using the Spacy pipeline with a “small” model. Small models do not have word vectors included. So, if I remove the CountVectorFeaturizer featurizers, the tokens end up having no features and I guess that produces tensors with unusual shapes. If use a large Spacy model that has embeddings, the problem goes away (I tried with a large German model). Keep in mind that I didn’t double check that tokens indeed have no features when there are no embeddings and no count vector featurizers, it’s still a hypothesis but maybe useful for someone running into the same issue.