Have my own language tokenizer and specific classifiers

Hi. Suppose I have Thai tokenizer and also I have my own PyTorch trained classifiers. Let says I have 2 classifiers for different purposes. Let them read same sentence and return the prediction as output1 and output2 as follows

output1 = model1(["first_tokenized_word", "second_tokenized_word", ..., "n_th_tokenized_word"]
output2 = model2(["first_tokenized_word", "second_tokenized_word", ..., "n_th_tokenized_word"]
  1. How can I plug my models in to the Rasa pipeline? I have no intention to train the model in the Rasa. Because the training might imbalanced and I can’t use ImbalancedDatasetSampler to fix it.

I have read thee document, but could not be able to understand.

  1. I have seen the pipeline with adjustable parameter but in the Component's constructor has no **kwargs. Then how to supply the my model configuration in nlu_config.yml like this.
model = RNN(input_size=10, hidden_size=256, n_layper=2, dropout=0.2)

After trial and error for a while I can figure out some of my questions. I was confused with PyTorch training loop.

To answer first question.

  1. create file __init__.py in order to let my local directory becomes a module
  2. create file sentiment.py and implement your own and set pipeline like this.
    language: "en"

    pipeline:
    - name: "sentiment.MyComponentA"
    - name: "sentiment.MyComponentB"
  1. instantiate model = Model() somewhere in the class(MyCompomentA or MyCompomentB).

  2. To complete train(). Access the dataset and follow the PyTorch Dataloader. I have to read training set from training_data.training_examples

     (Pdb) training_data.training_examples[0].text
     'hey'
     (Pdb) training_data.training_examples[0].data
     {'intent': 'greet'}
    
  3. Then save model