Problems with ConveRT hyperparameters

I’m trying to train my model with the ConveRT pipeline, and would like to explore the fine-tuning of the hyperparameters.

I’m getting this error:

Training model 'config_388' failed. Error: If embeddings are shared, text features and label features must coincide. Check the output dimensions of previous components.

with this config file:

language: en
pipeline:
- name: "ConveRTTokenizer"
- name: "ConveRTFeaturizer"
- name: "EmbeddingIntentClassifier"
  share_hidden_layers: True
  embed_dim: 30
  num_neg: 30
  similarity_type: 'auto'
  batch_strategy: 'sequence'
  loss_type: 'softmax'
  ranking_length: 10
  hidden_layers_sizes_b: [256, 128]

What can be the reason?

Thank you, Tiziano

@tiziano i believe you need to set the intent_tokenization_flag and the intent_split_symbol parameters in your tokenizer, see here. This is because you set the share_hidden_layers to true, so the intent labels need to be featurized too

1 Like

Ok, thank you, I’ll try that!

Just tried to set intent_tokenization_flag to True but I’m getting the same error :confused:

This is my config file:

language: en
pipeline:
- name: "ConveRTTokenizer"
  intent_tokenization_flag: True
- name: "ConveRTFeaturizer"
- name: "EmbeddingIntentClassifier"
  hidden_layers_sizes_b: [256, 128]
  share_hidden_layers: True
  embed_dim: 40
  num_neg: 30
  loss_type: 'margin'
  mu_pos: 0.9
  mu_neg: -0.5
  epochs: 100
  evaluate_every_num_epochs: 25
  use_max_sim_neg: False
  C2: 0.003
  C_emb: 0.8
  droprate: 0.1

hidden layers cannot be shared when dense featurizer is used

Ok, thank you.

Can you please tell me where can I find more information about the functioning of the components? In the docs each hyperparameter has a brief line of text for explaining what it is, but often it’s not enough for understanding.

As an example:

loss_type sets the type of the loss function, it should be either softmax or margin

What are softmax and margin? How do they work?

Thank you, Tiziano

Unfortunately, we don’t have written down explanation. The best way to understand them is check corresponding code

I recommend to use softmax loss

Ok thanks, I’ll do it

Why? I’ve done some tests on my data and margin resulted with a better f1-score

it is theoretically better loss function, but if the other one works better for you, then use the other one

1 Like