Train a classifier using results of a previously trained one

Hallo everyone,

I have a setup with two classifiers: the builtin embedding_classifier and a custom one.

The custom one needs to do some analysis on the results of the embedding classifier and decide on some confidence thresholds. This would be the its “training” step. I am using now some ugly workaround involving an external script doing the analysis in a dedicated run and dumping some pickle that the custom classifier then reads at prediction time, but I would like to move this to the “train” method of the class, so I can do everything in one go and persistify the confidence thresholds with the model.

Here are my questions:

  1. is the training order guaranteed to be the one in which the classifiers are mentioned in the configuration? Can I be sure that when the training of the second classifier starts, the embedding_classifier has been already trained?
  2. how do I access the freshly trained embedding_classifier instance from within the custom classifier during the training session?

Thanks for your help,

Andrea.

I have been looking at the code for the Trainer class, and I think this is currently not supported.

As far as I understand, the Trainer instance creates all components in the pipeline and stores them in one of its members. The main problem is that (unless I am mistaken) the components themselves have no link back to the Trainer that created them. This makes it impossible at training time to access another component instance already trained by the same Trainer instance.

I also see that the ComponentBuilder has some caching capability. But again, from a Component I am not sure one can go back to the originating ComponentBuilder, so this does not help in my case. Also, the embedding_classifier leaves the cache_key to None, so caching is in any case disabled.

Could maybe any of the developers confirm that I am not mistaken?

Is this limitation intentional, or the use case just never arose?

Thanks for your help,

Andrea.