Error while training rasa with small english model(en_core_web_sm) and SklearnIntentClassifier

Error while training (rasa train nlu)

Training NLU model...
2021-01-27 14:30:42 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en_core_web_sm'
2021-01-27 14:30:43 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_sm'.
2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 6 (2 distinct intents)

2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'greet', 'inform'
2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 4 (2 distinct entities)
2021-01-27 14:30:43 INFO     rasa.shared.nlu.training_data.training_data  -   Found entity types: 'account_number', 'name'
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Finished training component.
2021-01-27 14:30:43 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Traceback (most recent call last):
  File "/root/nlp-search/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/cli/train.py", line 159, in train_nlu
    domain=args.domain,
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/train.py", line 470, in train_nlu
    domain=domain,
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/utils/common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/train.py", line 512, in _train_nlu_async
    additional_arguments=additional_arguments,
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/train.py", line 547, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/nlu/train.py", line 114, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/nlu/model.py", line 204, in train
    updates = component.train(working_data, self.config, **context)
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/nlu/classifiers/sklearn_intent_classifier.py", line 111, in train
    for example in training_data.intent_examples
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/nlu/classifiers/sklearn_intent_classifier.py", line 111, in <listcomp>
    for example in training_data.intent_examples
  File "/root/nlp-search/lib/python3.6/site-packages/rasa/nlu/classifiers/sklearn_intent_classifier.py", line 133, in _get_sentence_features
    "No sentence features present. Not able to train sklearn policy."
ValueError: No sentence features present. Not able to train sklearn policy.

Below are the details of pipeline and config used

Pipeline ( I do not want to use DIETClassifier)

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
  - name: SpacyNLP
    model: en_core_web_sm
    case_sensitive: false
  - name: SpacyTokenizer
    intent_tokenization_flag: True
    intent_split_symbol: " "
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: CRFEntityExtractor
    "features": [
      ["low", "title", "upper"],
      [
        "bias",
        "pattern",
      ],
      ["low", "title", "upper"],
    ]
  - name: EntitySynonymMapper
  - name: "RegexEntityExtractor"
  - name: SklearnIntentClassifier

Config

version: "2.0"

nlu:
- regex: account_number
  examples: |
    - \d{10,12}

- intent: inform
  examples: |
    - my account number is [1234567891](account_number)
    - This is my account number [1234567891](account_number)

- intent: greet
  examples: |
    - Hey
    - Hi
    - hey there [Sara](name)
    - Hi [Sara](name)

It works fine when I replace small(en_core_web_sm) with medium(en_core_web_md). Any pointers on what is causing the issue.

Regards, Priyanka

1 Like

Sounds like a limitation of the spacy model itself. SklearnIntentClassifier is expecting sentence features, and this model does not provide them.