Hi! I am trying to understand best practices for the DIET classifier. I will only consider intent classification for simplicity and not NER. Here is my config:
- name: WhitespaceTokenizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: DIETClassifier epochs: 100 constrain_similarities: true
As you can see I followed the recommended defaults and added two
CountVectorsFeaturizers one at word level and another one on character level. The inclusion of character n-grams should lead to a better approximation of out of vocabulary words and also add robustness for misspellings at inference time.
@koaning I would love to see a video lecture on Semantic Hashing. I also have a couple of questions:
- Are the features from the two
- How does the positional encoding work with this?