Semantic Hashing with DIETClassifier

Hi! I am trying to understand best practices for the DIET classifier. I will only consider intent classification for simplicity and not NER. Here is my config:

- name: WhitespaceTokenizer
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: DIETClassifier
     epochs: 100
     constrain_similarities: true

As you can see I followed the recommended defaults and added two CountVectorsFeaturizers one at word level and another one on character level. The inclusion of character n-grams should lead to a better approximation of out of vocabulary words and also add robustness for misspellings at inference time.

@koaning I would love to see a video lecture on Semantic Hashing. I also have a couple of questions:

  • Are the features from the two CountVectorsFeaturizers simply concatenated?
  • How does the positional encoding work with this?

Thanks!

I’m not 100% familiar with the term “semantic hashing” but after googling the term it sounds like it’s mostly what most embedding models do. They “hash” tokens to a numeric representation where the distance between hashed tokens represents a proxy for similarity.

To answer your questions.

  1. Yes, they are simply concatenated. All sparse features and all dense features are merely concatenated together. The diagram below (from this blogpost) shows this nicely.

  1. To clarify, you’re talking about the positional encoding in the transformer layer of DIET? If so, it’s worth pointing out that the effect of the positional encoding isn’t that great when we’re dealing with short texts. Second, there positional encoding applies to the tokens going in. It’s effectively just “a vector that we’re adding depending on if the token is the 1st or the nth token in the utterance”. Feel free to ask for more details, but just to check, have you seen this part of our series on the attention mechanism?

Thanks, @koaning! Very helpful. I actually got the term Semantic Hashing and corresponding paper directly from the Rasa code: rasa/count_vectors_featurizer.py at 41e3b227101e6ace3f85c2d99a7f48f4528a8b93 · RasaHQ/rasa · GitHub