How to config token_pattern for CountVectorsFeaturizer in config.yml?

journey · July 22, 2019, 12:02pm

In the config.yml I want to config token_pattern: r’(?u)\b\w+\b’ for chinese under CountVectorsFeaturizer Component. But it doesn’t work.

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "zh"
pipeline:
- name: "JiebaTokenizer"
  dictionary_path: "jieba_dict"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  token_pattern: r'(?u)\b\w+\b'
- name: "EmbeddingIntentClassifier"

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy
  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    core_threshold: 0.3
    fallback_action_name: 'action_default_fallback'

CountVectorsFeaturizer reads my configuration r’(?u)\b\w+\b’ as a normal string not a regex. and failed in train method and go into exception brach:

        try:
            # noinspection PyPep8Naming
            X = self.vectorizer.fit_transform(lem_exs).toarray()
        except ValueError:
            self.vectorizer = None (will come here)
            return

Topic		Replies	Views
When doing "rasa init", why does the config.yml file have two "CountVectorFeaturizer"? Rasa Open Source	2	565	September 8, 2021
Exception: Failed to validate component CountVectorsFeaturizer. Missing property: 'tokens' Rasa Open Source	7	969	February 29, 2020
NLU pipeline in Config.yml file Rasa Open Source	0	209	October 3, 2023
Issue Migration Config.yml Rasa 1 to Rasa 2.0 Rasa Open Source	4	1281	July 5, 2021
Hugging Face custom Tokenizer Rasa Open Source	2	324	March 26, 2024

How to config token_pattern for CountVectorsFeaturizer in config.yml?

Related topics