### Rasa Open Source version
2.8.17
### Python version
3.8
### What …happened?
Given:
1. A config with `SpacyTokenizer` with a `token_pattern`, SpacyFeaturizer, and at least one non-spacy token based featurizer (CountVectorsFeaturizer, LexicalSyntacticFeaturizer):
```
language: en
pipeline:
- name: SpacyNLP
case_sensitive: false
model: en_core_web_md
- name: SpacyTokenizer
token_pattern: "(\\d+|\\D+)"
- name: SpacyFeaturizer
- name: CountVectorsFeaturizer
- name: DIETClassifier
epochs: 10
constrain_similarities: true
```
2. Input with multiple digits adjoined to a word e.g.
```
it's 12euro
```
At inference time, inputting the example sentence above causes DIETClassifier to complain of mismatched input dimensions (stacktrace below). In the absence of a second featurizer i.e. given only SpacyFeaturizer, this does not happen, implying that the final tokens iterated over by SpacyFeaturizer differs from those iterated over by the non-Spacy featurizer. It could be because SpacyFeaturizer uses the _spacy_ tokens [here](https://github.com/RasaHQ/rasa/blob/main/rasa/nlu/featurizers/dense_featurizer/spacy_featurizer.py#L68) not necessarily the ones Rasa uses.
The goal is that if the amount and the currency are stuck together without white space like this:
```
it's 12euro
```
This would be split into tokens `it's`, `12`, `euro`.
At inference time, inputting the example sentence above causes DIETClassifier to complain of mismatched input dimensions (stacktrace below). In the absence of a second featurizer i.e. given only SpacyFeaturizer, this does not happen, implying that the final tokens iterated over by SpacyFeaturizer differs from those iterated over by the non-Spacy featurizer. It could be because SpacyFeaturizer uses the _spacy_ tokens [here](https://github.com/RasaHQ/rasa/blob/main/rasa/nlu/featurizers/dense_featurizer/spacy_featurizer.py#L68) not necessarily the ones Rasa uses.
### Command / Request
```shell
rasa train nlu
rasa shell nlu
> it's 12euro
```
### Relevant log output
```shell
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/nlu/model.py", line 470, in parse
component.process(message, **self.context)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 988, in process
out = self._predict(message)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 904, in _predict
return self.model.run_inference(model_data)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/models.py", line 318, in run_inference
] = self._rasa_predict(batch_in)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/models.py", line 280, in _rasa_predict
outputs = tf_utils.sync_to_numpy_or_python_type(self.predict_step(batch_in))
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/models.py", line 228, in predict_step
return self.batch_predict(batch_in)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 1707, in batch_predict
text_transformed, _, _, _, _, attention_weights = self._tf_layers[
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/rasa_layers.py", line 992, in call
seq_sent_features, mask_combined_sequence_sentence = self._tf_layers[
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/rasa_layers.py", line 645, in call
sequence_features_combined = self._combine_sequence_level_features(
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/rasa_layers.py", line 570, in _combine_sequence_level_features
sequence_features_combined = self._tf_layers[f"sparse_dense.{SEQUENCE}"](
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1037, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/rasa/utils/tensorflow/rasa_layers.py", line 338, in call
return tf.concat(dense_features, axis=-1)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 1769, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1213, in concat_v2
_ops.raise_from_not_ok_status(e, name)
File "/Users/melinda/.pyenv/versions/rasa2/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,3,128] vs. shape[1] = [1,2,300] [Op:ConcatV2] name: concat
```
### Defintion of Done
- [ ] Facilitate discussion to scope fix
- [ ] Create new issue for refinement for fix