Valid Custom Pipeline?

Hi I was testing out some custom pipelines, namely trying to put the convert featurizer in with the supervised embeddings pipeline, and was wondering if the format of the pipeline is correct or not?

Configuration for Rasa NLU.

# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: ConveRTTokenizer
  - name: RegexFeaturizer
  - name: CRFEntityExtractor
  - name: EntitySynonymMapper
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: ConveRTFeaturizer
  - name: EmbeddingIntentClassifier
    epochs: 300
    embed_dim: 20

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: FormPolicy
  - name: MemoizationPolicy
  - name: MappingPolicy
  - name: FallbackPolicy
    nlu_threshold: 0.2
    core_threshold: 0.2
  - name: EmbeddingPolicy               # Recurrent Embedding Dialogue Policy, uses RNN for dialogue management
    epochs: 150
    max_history: 50  
    batch_size: [32,64]
    featurizer:
      - name: MaxHistoryTrackerFeaturizer
        state_featurizer:
        - name: LabelTokenizerSingleStateFeaturizer
    augmentation_factor: 0

Hi, what version of Rasa are you using? If you are using Rasa 1.7.0 and onwards, you don’t need to WhitespaceTokenizer in there.

Yes I am using 1.7.0. I’ve removed WhitspaceTokenizer and the CountVectorsFeaturizers without any impact on accuracy. The accuracy is at 0.999 but the loss never goes below 0.7. Is this a worry, or does it not matter since accuracy is so high?

It will be helpful if you create a separate test data split and evaluate the model accuracy on that split. Use rasa data split nlu