Strange beghaviour with tensorflow embedding - need advices

Some sentences which are in training data for one intent are not getting recognized as this intent when testing on same sentence. The strange thing is that most of the words are not contained in examples for other intents. I have 400 intent examples with 6 intents.

The only words the right intent and the miclassified intent share are what and must. But these words are also common in the right intent examples…

What is possibly going on there?

Are there any good advices to follow for a good intent classification?

what pipeline are you using?

pipeline:
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "tokenizer_whitespace"
- name: "intent_entity_featurizer_regex"
- name: "ner_crf"

That’s strange. When you say some, is it a large percentage? As always, it might help if you share your training data. But I don’t think we’ve ever encountered something like this before where two completely different intents are misclassified