Is it possible to train two intent classifiers on different datasets?

In my example classifiers are DIET and KNN over Faiss. DIET is good enough to handle general and great linearly separable intents e.g. hello, yes, no and so on. I need KNN to handle overlapped intents used in my custom policy.

So my problem is how to attach different sources to different classifiers in pipeline config?

1 Like

is how to attach different sources to different classifiers in pipeline config

What do you mean by this? Are you asking how to choose between predictions of different classifiers? Did you write a custom classifier for KNN and the embeddings are pre-trained?

I need two classifiers and one NLU pipeline.

# config.yml

pipeline:
- name: WhitespaceTokenizer
- name: LanguageModelFeaturizer
  model_name: "bert"
  model_weights: "rasa/LaBSE"
- name: FaissClassifier
  prefix: faiss
- name: DIET
  prefix: diet

For this example i train DIET classifier on intents started with “diet_” prefix (e.g. diet_hello). And FaissClassifier trains on intents started with faiss prefix.

# nlu.yml:

nlu:
- intent: faiss_skill_cities
  examples: |
    - run game cities
    - lets play in cities game
    - go play a game cities

- intent: diet_hello
  examples: |
    - good morning
    - hi
    - hello

I want to separate my nlu data to train different classifiers on different data

nlu result stay as before: intent_ranking = sorted(knn_intent_ranking + diet_intent_ranking) intent = intent_ranking[0]

So my question - are the any other workaround to do the same?

I’m curious why not train on both? And then use a custom component to select the intent?

There are some reasons:

  1. we train faiss intents every hour, but diet intents one time per day or more rare. we have around 20 predefined diet intents and we dont change them a lot. so we train on different sources.
  2. faiss intents often overlapped due to domain and context limitations and diet is not. diet intents linearly separated as they domain and context agnostic. so we train on different sources.

we have 20k intents with 5-10 examples each. so we solve ranking task instead of classification. in my experiments diet fails on ranking task and we use custom classifier based on faiss.

I’m curious why not train on both? And then use a custom component to select the intent?

there is no problem to select intent. problem is to train on different sources. when we use 20Đş dataset to train diet it fails solve classification task as expected.

thank you a lot for your questions and helping!

Oh, I see. There is no out-of-box solution for this. You can either:

  1. Add logic to the train methods of your components which selectively ignores training data (you would need to extend DIET to add this). It sounds like you are doing something similar already.

OR:

  1. You can pre-train the faiss component separately. This is somewhat similar to how the spacy components (for example SpacyFeaturizer), which are not actually trained within rasa. This may be a bit cleaner if you’re training the faiss component more frequently.