ValueError: Sequence dimensions for sparse and dense features don't coincide

saurabh-m523 · December 27, 2019, 10:10am

Here is my config:

language: en
pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"

I upgraded to rasa 1.6.0 and wanted to see how custom features will affect the CRFEntityExtractor. I don’t understand why this error is coming. Please help.

Here is the complete traceback:

Traceback (most recent call last):
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\saurabhd\AppData\Local\Programs\Python\Python36\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\cli\train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 46, in train
    kwargs=kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 484, in run_until_complete
    return future.result()
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 97, in train_async
    kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 184, in _train_async_internal
    kwargs=kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 241, in _do_training
    persist_nlu_training_data=persist_nlu_training_data,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 470, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\train.py", line 86, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\classifiers\embedding_intent_classifier.py", line 708, in train
    session_data = self.preprocess_train_data(training_data)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\classifiers\embedding_intent_classifier.py", line 684, in preprocess_train_data
    label_attribute=INTENT_ATTRIBUTE,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\classifiers\embedding_intent_classifier.py", line 411, in _create_session_data
    _sparse, _dense = self._extract_and_add_features(e, TEXT_ATTRIBUTE)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\classifiers\embedding_intent_classifier.py", line 303, in _extract_and_add_features
    f"Sequence dimensions for sparse and dense features "
ValueError: Sequence dimensions for sparse and dense features don't coincide in 'fabric in house' for attribute 'text'.

Tanja · January 2, 2020, 9:45am

@saurabh-m523 You need to update your config

pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
  return_sequence: true
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"

I added return_sequence: True to the CountVectorsFeaturizer. return_sequence tells the featurizer to return a feature vector per token. If you want to use it, you need to set it to all featurizers, otherwise the dimensions of the features in the EmbeddingIntentClassifier do not match.

saurabh-m523 · January 2, 2020, 10:55am

I thought return_sequence: True is required only for ‘Dense featurizer’ (according to doc).

If you want to pass custom features, such as pre-trained word embeddings, to CRFEntityExtractor , you can add any dense featurizer (except ConveRTFeaturizer ) to the pipeline before the CRFEntityExtractor . Make sure to set "return_sequence" to True for the corresponding dense featurizer.

Since CountVectorsFeaturizer is a ‘Sparse featurizer’ (doc), I didn’t add return_sequence: True to it.

Now it gives the following error:

ValueError: Cannot concatenate sparse features as sequence dimension does not match: 7 != 1. Make sure to set 'return_sequence' to the same value for all your featurizers.

Here is the complete traceback:

Traceback (most recent call last):
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\saurabhd\AppData\Local\Programs\Python\Python36\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\cli\train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 46, in train
    kwargs=kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\asyncio\base_events.py", line 484, in run_until_complete
    return future.result()
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 97, in train_async
    kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 184, in _train_async_internal
    kwargs=kwargs,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 241, in _do_training
    persist_nlu_training_data=persist_nlu_training_data,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\train.py", line 470, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\train.py", line 86, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py", line 67, in train
    self._text_features_with_regex(example, attribute)
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py", line 76, in _text_features_with_regex
    message, extras, feature_name=SPARSE_FEATURE_NAMES[attribute]
  File "c:\users\saurabhd\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\featurizers\featurizer.py", line 67, in _combine_with_existing_sparse_features
    f"Cannot concatenate sparse features as sequence dimension does not "
ValueError: Cannot concatenate sparse features as sequence dimension does not match: 7 != 1. Make sure to set 'return_sequence' to the same value for all your featurizers.

Just out of curiosity, I tried adding return_sequence: True to RegexFeaturizer but with no luck , it still gives the same (slightly different) error: ValueError: Cannot concatenate sparse features as sequence dimension does not match: 1 != 0. Make sure to set 'return_sequence' to the same value for all your featurizers.

Tanja · January 2, 2020, 12:01pm

If you want to add custom features to your CRFEntityExtractor you need to add return_sequence: True to one of the sparse featurizers. However, if you set return_sequence: True on one featurizer you need to also set it for all other featurizers, otherwise the training of the EmbeddingIntentClassifier will fail. So, setting return_sequence: True for the RegexFeaturizer is correct. Sorry, I missed it earlier.

I used your pipeline

language: en

pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
  return_sequence: true
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
  return_sequence: true
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"

to train the formbot example (rasa/examples/formbot) and it trained without any errors. Can you please do the same? If that is working for you, I seems to be related to the data. Not sure why exactly. Any chance you can share part of your data? And can you please share the complete log? Thanks.

saurabh-m523 · January 28, 2020, 9:52am

Hi @Tanja ! Sorry for late reply. I figured that this is happening when I include a custom TrainingDataImporter in my config. Below is my complete config:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: oov
  return_sequence: true
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
  return_sequence: true
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"
- name: "retrieval_action_fallback.ResponseThreshold"


# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: "MemoizationPolicy"
    max_history: 21
  - name: "KerasPolicy"
    featurizer:
    - name: MaxHistoryTrackerFeaturizer
      max_history: 21
      state_featurizer:
        - name: BinarySingleStateFeaturizer
  - name: "MappingPolicy"
  - name: "FormPolicy"
  - name: "FallbackPolicy"
    nlu_threshold: 0.6
    core_threshold: 0.3
    fallback_action_name: "action_default_fallback"


importers:
- name: "RasaFileImporter"
- name: "my_training_data_importer.MyImporter"
  project_name: "pro"

Since a config file is required by the importer, I have provided the below config:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: oov
  return_sequence: true
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
  return_sequence: true
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"
- name: "retrieval_action_fallback.ResponseThreshold"


# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: "MemoizationPolicy"
    max_history: 21
  - name: "KerasPolicy"
    featurizer:
    - name: MaxHistoryTrackerFeaturizer
      max_history: 21
      state_featurizer:
        - name: BinarySingleStateFeaturizer
  - name: "MappingPolicy"
  - name: "FormPolicy"
  - name: "FallbackPolicy"
    nlu_threshold: 0.6
    core_threshold: 0.3
    fallback_action_name: "action_default_fallback"

I am importing nlu training data for response selector from another folder. I can’t think of any reason why including a custom importer would break the training?

It is training fine without the importer. Please help.

Tanja · January 28, 2020, 10:34am

Are you still getting the above error? Or something else?

saurabh-m523 · January 28, 2020, 10:45am

Getting the same error.

thusithaC · January 29, 2020, 11:10am

I seem to have the same issue.

f"Cannot concatenate sparse features as sequence dimension does not "
ValueError: Cannot concatenate sparse features as sequence dimension does not match: 0 != 1. Make sure to set 'return_sequence' to the same value for all your featurizers.

My config is as below.

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "en"
pipeline:
- name: "custom.synonym_replacer_module.SynonymReplacerModule"
  language: "en"
- name: "SpacyNLP"
  model: "spacy/en/wiki"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
  return_sequence: True
- name: "SpacyFeaturizer"
  return_sequence: True
- name: "CRFEntityExtractor"
  features: [ ["low", "title", "upper"],
    [
        "bias",
        "low",
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "upper",
        "title",
        "digit",
        "pattern",
        "text_dense_features",
    ],
   ["low", "title", "upper"]]
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  return_sequence: True
- name: "CountVectorsFeaturizer"
  return_sequence: True
  analyzer: "char_wb"
  min_ngram: 2
  max_ngram: 4
- name: "EmbeddingIntentClassifier"
  embed_dim: 20
  droprate: 0.2
  epochs: 75
  random_seed: 45
  C2: 0.001

Tanja · January 29, 2020, 11:40am

We simplified the code a bit and removed the option return_sequence. The featurizer and classifiers/extractors will take care of it. You don’t have to worry about adding any additional option anymore. We gonna release a new version of Rasa with this update today. Can you please update to Rasa 1.7.0 once it is out and try again? Please let me know how it went. Thanks.

Tanja · January 29, 2020, 4:14pm

Sorry, my bad. The changes are not yet released, they will be released (most likely) with Rasa 1.8.0. I’ll take a closer look tomorrow, might have an idea why this is happening.

saurabh-m523 · January 29, 2020, 4:40pm

Just saying, when I trained it on rasa 1.5, it trained without any errors. This error is coming in rasa 1.6.0 and in 1.6.1.

JiteshGaikwad · January 30, 2020, 4:17am

hey @Tanja, the error still occurs. I am getting the same error even after I updated the rasa to version 1.7.0, below is the error I got:

ValueError: Cannot concatenate sparse features as sequence dimension  does not match: 1 != 0. Make sure to set 'return_sequence' to the same value for all your featurizers.

my pipeline config:

language: en

pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
  return_sequence: true
- name: "SpacyFeaturizer"
  return_sequence: true
- name: "RegexFeaturizer"
  return_sequence: true
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"

Another thing it was mentioned in the changelog #4978 about the return_sequence option of the featurizers but I didn’t found any proper documentation which states how to use return_sequence option, I tried to check in the documentation for RegexFeaturizer & CountVectorsFeaturizer ( since both of them are of the type Sparse featurizer ) but I did found out how to use return_sequence option I mean what are the values need to be set for return_sequence option

Correct me if I had missed out something in the documentation

JiteshGaikwad · January 30, 2020, 4:42am

Hey @Tanja, I had made some changes to the pipeline and it works if I specify only RegexFeaturizer or CountVectorsFeaturizer but If I specify both of them it fails while training RegexFeaturizer, below are the pipeline which works and which fails:

Worked(only CountVectorsFeaturizer used) :

language: en
pipeline:
- name: SpacyNLP
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: CountVectorsFeaturizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: EmbeddingIntentClassifier
- name: ResponseSelector

Worked(only RegexFeaturizer used):

language: en
pipeline:
- name: SpacyNLP
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: SklearnIntentClassifier
- name: ResponseSelector

Failed(both CountVectorsFeaturizer & RegexFeaturizer used) :

language: en
pipeline:
- name: SpacyNLP
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: CountVectorsFeaturizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: EmbeddingIntentClassifier
- name: ResponseSelector

let me know If I am missing out something in the configuration

Thanks.

saurabh-m523 · January 30, 2020, 5:12am

This will be fixed in rasa 1.8.0 not in 1.7.0

Tanja · January 30, 2020, 8:27am

I’m sorry for the confusion, but I mixed up a couple of things.

In the Rasa.1.7.0 release we removed the option return_sequence (see Releases · RasaHQ/rasa · GitHub). You don’t need to set the option anymore in your config file. The featurizers and classifiers will take care of it.

@JiteshGaikwad We added a paragraph to TextFeaturizer on the Components page. Does that help?

Maybe a general explanation: We are currently developing a sequence model to classify entities and intents. In preparation for that, we needed to adapt our featurizers. In Rasa 1.5.0 all featurizers would just return one feature vector for the whole message. However, sequence models need a feature vector per token. That is the reason we introduced this change. Now, in Rasa 1.7.0, all featurizers will return a feature vector per token plus an additional feature vector for the complete message. Thus, the classifier/extractor can decide what vector to use. If the model is a sequence model it will use the feature vectors of the tokens, otherwise the feature vector of the complete message. Hope that clarifies things a bit.

@saurabh-m523 Do you still get the error after upgrading to Rasa 1.7.0? If yes, can you please share the stack trace again? I want to see when exactly the error occurs.

@JiteshGaikwad Can you please also share you stack trace? I’ll update the error message as the return_sequence option does not exist anymore.

saurabh-m523 · January 30, 2020, 10:15am

Hi @Tanja! Yes I’m still getting the same error on rasa 1.7.0.

Here is my config:

language: en
pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"
- name: "ResponseSelector"
- name: "retrieval_action_fallback.ResponseThreshold"

Here is the complete traceback:

Training NLU model...
2020-01-30 15:35:22 INFO     absl  - Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
initialised the class
2020-01-30 15:35:23 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en'
2020-01-30 15:35:39 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en'.
2020-01-30 15:35:39 INFO     rasa.nlu.selectors.embedding_response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trainedon training examp
les combining all retrieval intents.
2020-01-30 15:35:44 INFO     rasa.nlu.training_data.training_data  - Training data stats:
        - intent examples: 15809 (11 distinct intents)
        - Found intents: 'affirm', 'greet', 'get_costing', 'get_dpr_status', 'deny', 'get_fg_status', 'out_of_scope', 'make_po', 'inform', 'mark_activity_close', 'faq'
        - Number of response examples: 834 (85 distinct response)
        - entity examples: 14848 (13 distinct entities)
        - found entities: 'po_detail_type_raw_number', 'num_type', 'po_type', 'user_garment_text_description', 'po_activity', 'po_item_type', 'activity_name', 'raw_vendor_name', 'thing_to_make
', 'id', 'item_type', 'info_type', 'po_detail_type'

2020-01-30 15:35:45 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2020-01-30 15:36:05 INFO     rasa.nlu.model  - Finished training component.
2020-01-30 15:36:05 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2020-01-30 15:36:06 INFO     rasa.nlu.model  - Finished training component.
2020-01-30 15:36:06 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\utils\common.py:351: UserWarning: The out of vocabulary token 'oov' was configured, but could not be found in any one of the NLU mes
sage training examples. All unseen words will be ignored during prediction.
  More info at https://rasa.com/docs/rasa/nlu/components/#countvectorsfeaturizer
2020-01-30 15:36:19 INFO     rasa.nlu.model  - Finished training component.
2020-01-30 15:36:19 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
Traceback (most recent call last):
  File "C:\Users\saurabhd\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\saurabhd\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\Saurabh\bot_demo\bot_staging\venv\Scripts\rasa.exe\__main__.py", line 9, in <module>
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\cli\train.py", line 140, in train_nlu
    persist_nlu_training_data=args.persist_nlu_data,
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\train.py", line 414, in train_nlu
    persist_nlu_training_data,
  File "C:\Users\saurabhd\AppData\Local\Programs\Python\Python36\lib\asyncio\base_events.py", line 484, in run_until_complete
    return future.result()
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\train.py", line 445, in _train_nlu_async
    persist_nlu_training_data=persist_nlu_training_data,
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\train.py", line 474, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\nlu\train.py", line 86, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\nlu\model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py", line 60, in train
    self._text_features_with_regex(example, attribute)
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py", line 69, in _text_features_with_regex
    message, extras, feature_name=SPARSE_FEATURE_NAMES[attribute]
  File "e:\saurabh\bot_demo\bot_staging\venv\lib\site-packages\rasa\nlu\featurizers\featurizer.py", line 62, in _combine_with_existing_sparse_features
    f"Cannot concatenate sparse features as sequence dimension does not "
ValueError: Cannot concatenate sparse features as sequence dimension does not match: 1 != 0. Make sure to set 'return_sequence' to the same value for all your featurizers.

Tanja · January 30, 2020, 1:04pm

I just tried to reproduce the problem but failed. I guess the problem is related to your data.

Can you please try the following: Use the following pipeline to train a model with the formbot example data (rasa/examples/formbot at master · RasaHQ/rasa · GitHub).

pipeline:
- name: "SpacyNLP"
  case_sensitive: false
- name: "SpacyTokenizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
  min_ngram: 1
  max_ngram: 3
  lowercase: true
  OOV_token: 'oov'
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "EmbeddingIntentClassifier"
  loss_type: "margin"

Is that working for you? If yes, the problem must be related to your data. Can you send me your data or a part of it that is not working? That would help me to debug the problem further. Thanks for the help!

JiteshGaikwad · January 31, 2020, 3:21am

Hey @Tanja, Sorry for the late reply. Thanks for the clarification, my bad I didn’t read it properly . the TextFeaturizer doc helped me to get that. I have attached the screnshot of the error trace, let me know if this helps:

saurabh-m523 · January 31, 2020, 12:10pm

Hi @Tanja!

I have created this project repo. Please clone it and begin training to reproduce the problem (on Rasa 1.7.0).

Thanks and Regards

Saurabh Kumar

saurabh-m523 · February 3, 2020, 5:09am

Hi @Tanja!

Were you able to reproduce the problem?

Topic		Replies	Views
Warning from Rasa Utils, Error from RegexEntity Extractor and Rule Policy Tutorials, Resources & Videos	0	725	October 15, 2020
SklearnIntentClassifier rasa 3.0 Rasa Open Source	4	706	December 2, 2024
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1050	February 24, 2020
SklearnIntentClassifier rasa 3.0 Rasa Open Source	1	240	April 6, 2023
Features in CRFEntityExtractor Rasa Open Source	1	887	February 25, 2020

ValueError: Sequence dimensions for sparse and dense features don't coincide

Related topics