Custom training data load into DIET

statimo · October 20, 2021, 5:32pm

Hello

I am trying to build a custom training data loader for the DIET Classifier. I guess it’s the definition _create_model_data that loads the training data for the training, but not sure.

In my case I don’t want to change the training data and I want to use the pipeline in the usual way. I have training data with multiple intents but the DIET classifier should only train on the second intent (and skip the first one). Does anyone have a hint, where I can change the way the intents-labels are loaded into the DIET Classifier?

Thanks a lot!

statimo · October 21, 2021, 8:19am

Or do I need to code a custom tokenizer that selects the “right intent” for training the DIET Classifier?

koaning · October 27, 2021, 11:41am

Before diving into technicalities here, is there a reason why you need this “second intent” feature? What goal are you trying to accomplish in your conversation?

statimo · October 28, 2021, 8:55am

Hi, I want to train two models on the same messages. So each model is trained to find other intents given the assumption that the messages have two intents. It is also for research purposes.

I tried to select the first intent via the tokenzier:

    def _tokenize_on_split_symbol(self, text: Text) -> List[Text]:

        words = (
            text.split(self.intent_split_symbol)[0]
            if self.intent_tokenization_flag
            else [text]
        )

But nothing is happening, I still got both intents in the DIET Classifier while training - not only the first via [0]…

koaning · October 29, 2021, 12:36pm

Have you seen our portion on multi intent classification?

statimo · October 29, 2021, 2:38pm

Hi,

yes, I’ve seen everything. The Multi-Intent-Classification is not an option for me.

That’s why I want to do it this way.

koaning · November 1, 2021, 9:01am

Right. In that case, Rasa does not natively allow for classifiers that detect two intents. We have the “multi-intent trick” but to my knowledge that’s it. An alternative might be to use end-to-end learning but that’s more like “intent-less” action prediction.

statimo · November 1, 2021, 9:23am

I know! I don’t want to use a classifier that detect two intents. That’s why I need to do a custom load into the classifier…

In my case I don’t want to change the training data and I want to use the pipeline in the usual way. I have training data with multiple intents but the DIET classifier should only train on the second intent (and skip the first one).

statimo · November 1, 2021, 9:26am

I can’t find the way the data get’s into the Classifier.

I thought with multi-intents I can use the tokenizer to get only one. But it doesn’t work (see above)

koaning · November 1, 2021, 10:17am

It’s unclear what exactly you need here. Could you explain your use-case a bit more? As in, could you describe the kind of virtual assistant you’re trying to create and what needs to happen? That would help me understand what is broken.

statimo · November 1, 2021, 11:01am

I have training data with multiple intents ( abc+xyz ). I will use two pipelines for each intent, so I need to have a custom training data load into the DIET Classifiers.

Simple example: I want to order something to do fast cooking. (intent_order+product_pressurecooker)

The first pipeline will detect main intents, the second will detect products in that case. It’s just an example, but I hope you understand the idea behind that.

What I need:

Rasa classes are highly interrelated and I struggle to find the way the training data/messages get loaded into the DIET Classifier. I tried it via the tokenizer, but it doesn’t work. I want to split the intents in the training data and only use the first/second one.

koaning · November 1, 2021, 2:11pm

In your example though … why not have an intent buy and then have the product of interest be an entity? That way, a user can indicate that they’re interested in buying multiple items in a single utterance.

statimo · November 1, 2021, 3:06pm

It was just an example and even here: You cannot use entities. “Something to cut vegetables very fast”, “an instrument to find metal in the ground” are not entities in the usual sense.

I am informed about entities, multi-intent classification and so on.

I described the information I need. Could you please help me out?

statimo · November 9, 2021, 1:08pm

Maybe a more specific question: Is it enough to manipulate the def preprocess_train_data for filtering the intents? Or is it important to manipulate the training data even before?

Topic		Replies	Views
Pipeline with 2 or more models with different intents running sequentialy Rasa Open Source	0	329	July 20, 2022
Intent evaluation Rasa Open Source	3	880	July 14, 2020
Is it possible to train two intent classifiers on different datasets? Rasa Open Source	5	630	August 18, 2021
Integrating Custom Pytorch models to RASA pipeline Rasa Open Source	2	1322	January 12, 2023
Training Exactly One Intent Rasa Open Source	3	1050	May 21, 2019

Custom training data load into DIET

Related topics