Question about how DIET processes new data

Hello, I am quite new to rasa and right now I am trying to understand the DIET architecture.

I have read through some resources on this and I understand how we can calculate a loss through the DIET architecture to adjust the weights on the edges of the neural networks used.

But how do we get the entities and intents from a just inserted new and unknown sentence outside of the training phase?

For example: I ask the agent “Please give me the number of person X”

Do all the tokens outside of the training phase still get through all the components as shown in the widget (http://bl.ocks.org/koaning/raw/f40ca790612a03067caca2bde81e7aaf/)?

If yes, are we then iterating through all the entities our system has and and putting them into the CRF and then look for the best fit ?

Thanks for your help!

Sincerely botsi155

Hey @botsi155, welcome to the forum and apologies for the long wait!

When a previously trained DIET model is running in the prediction mode, it takes a new sentence (in the form of tokens) and passes it through all the components except the ones that depend on knowing the intent and entities in that sentence. Looking at the DIET diagram you’ve linked to:

  • the nodes Intent: play_game, Entity: O, Entity: game_name won’t exist
  • the similarity and loss nodes also won’t run as they’re meant for comparing the predicted and the true intent/entities
  • additionally, at prediction time, no masking is applied, i.e. the __MASK__ input will instead be pong and, in turn, the pong node in the diagram won’t exist

At prediction time, the model knows (from training) what are the possible entity types and intents. The CRF will consider all possible taggings of the tokens and will output the most “promising” tagging (such as the O; game_name; game_name tagging in the diagram). The transformer followed by the embedding layer will output a representation (embedding) of the sentence, which will be compared with the learned “prototypical” representations of all intents. The predicted intent will then be the one whose representation is the most similar to the sentence’s representation.

Let me know if this helps :slight_smile:

1 Like