I am trying to switch over from the spacy_sklearn pipeline to the supervised_embeddings pipeline because it seems to handle my domain specific words better. I am seeing better intent classification now, but I am noticing that it fails to extract all the required entities. It sometimes ignores entities that it should have extracted. I don’t think the size of the data is the problem, I have 1,000 training data cases for each intent, and have tried upping it to 10,000 with no change. Anyone have any idea why the entity extraction isn’t working correctly?
Hey @emayor, can you share your pipeline configuration?
I actually figured it out - seems like the whitespace tokenizer in this pipeline doesn’t like it when entities have no spaces e.g. ‘9k’ 9 being the quantity and k being the unit. I retrained without all punctuation and with proper spacing and everything seems to work now.