Is it possible to train two intent classifiers on different datasets?

artemsnegirev · August 11, 2021, 5:33am

In my example classifiers are DIET and KNN over Faiss. DIET is good enough to handle general and great linearly separable intents e.g. hello, yes, no and so on. I need KNN to handle overlapped intents used in my custom policy.

So my problem is how to attach different sources to different classifiers in pipeline config?

fkoerner · August 12, 2021, 7:31am

is how to attach different sources to different classifiers in pipeline config

What do you mean by this? Are you asking how to choose between predictions of different classifiers? Did you write a custom classifier for KNN and the embeddings are pre-trained?

artemsnegirev · August 12, 2021, 9:19am

I need two classifiers and one NLU pipeline.

# config.yml

pipeline:
- name: WhitespaceTokenizer
- name: LanguageModelFeaturizer
  model_name: "bert"
  model_weights: "rasa/LaBSE"
- name: FaissClassifier
  prefix: faiss
- name: DIET
  prefix: diet

For this example i train DIET classifier on intents started with “diet_” prefix (e.g. diet_hello). And FaissClassifier trains on intents started with faiss prefix.

# nlu.yml:

nlu:
- intent: faiss_skill_cities
  examples: |
    - run game cities
    - lets play in cities game
    - go play a game cities

- intent: diet_hello
  examples: |
    - good morning
    - hi
    - hello

I want to separate my nlu data to train different classifiers on different data

nlu result stay as before: intent_ranking = sorted(knn_intent_ranking + diet_intent_ranking) intent = intent_ranking[0]

So my question - are the any other workaround to do the same?

fkoerner · August 17, 2021, 7:02am

I’m curious why not train on both? And then use a custom component to select the intent?

artemsnegirev · August 17, 2021, 7:21am

There are some reasons:

we train faiss intents every hour, but diet intents one time per day or more rare. we have around 20 predefined diet intents and we dont change them a lot. so we train on different sources.
faiss intents often overlapped due to domain and context limitations and diet is not. diet intents linearly separated as they domain and context agnostic. so we train on different sources.

we have 20k intents with 5-10 examples each. so we solve ranking task instead of classification. in my experiments diet fails on ranking task and we use custom classifier based on faiss.

I’m curious why not train on both? And then use a custom component to select the intent?

there is no problem to select intent. problem is to train on different sources. when we use 20к dataset to train diet it fails solve classification task as expected.

thank you a lot for your questions and helping!

fkoerner · August 18, 2021, 8:36am

Oh, I see. There is no out-of-box solution for this. You can either:

Add logic to the train methods of your components which selectively ignores training data (you would need to extend DIET to add this). It sounds like you are doing something similar already.

OR:

You can pre-train the faiss component separately. This is somewhat similar to how the spacy components (for example SpacyFeaturizer), which are not actually trained within rasa. This may be a bit cleaner if you’re training the faiss component more frequently.

Topic		Replies	Views
Intent evaluation Rasa Open Source	3	880	July 14, 2020
Custom training data load into DIET Rasa Open Source	13	764	November 9, 2021
Using multiple intent classifiers in the same pipeline Rasa Open Source	1	752	April 23, 2019
Pipeline with 2 or more models with different intents running sequentialy Rasa Open Source	0	329	July 20, 2022
How to use multiple intent classifier in Rasa nlu pipeline Rasa Open Source	4	1503	April 16, 2020

Is it possible to train two intent classifiers on different datasets?

Related topics