What is the most robust an easy way of reusing common language knowledge when providing data for intent detection? I think most bot authors need to handle the many similar ways of phrasing certain questions, and from what I see the current best practice is just copy pasting or generating many variations for each project.
For example ‘expressing interest in’ could be a generic intent, to be tuned to each bot authors domain, but right now each project needs to list ‘could you tell me more about’, ‘I would like to hear about’, ‘I am interested in…’ and countless other ways of saying essentially the same thing. Same with greetings and chitchat, which currently are being provided as copy pasted data.
From what I understand, even if modern language models can tell that such phrases are very similar, there’s no easy or practical way of leveraging that in intent detection at the moment.