I have a bot which makes use of some fuzzy matching to pair suburbs and postcodes which are not currently part of any intent but are parsed manually at certain points in the form. I also have an intent which is ‘interrupt’ which involves things like quit and restart and break out of the form and so on.
The problem is that if i want good detection on single word responses for suburbs, i need a large-ish training set (lookup table isn’t quite right because i need the fuzzy matching, and my understanding from the blog post is that isn’t there yet), but if i have a lot of training data for the suburbs this ends up giving misclassifications for words like ‘quit’ because there are suburbs ‘asquith’ and other things - single word replies make this tricky which is why for now I have relied on context, but this still lead to some conflicts with ‘quit’ intent.
I also played with having a separate NLU classifier for just this purpose to discriminate either suburb vs all else or quit vs all else. The latter had fewer false positives but had very poor generalisation.
Given that the number of suburbs (~10k) is almost always going to be greater than the number of ways i can say ‘quit’, what do people advise? Also, are there larger databases for this sort of thing i should be looking at?
A rough work around is to just maintain a list of short ‘quit’ phrases and keep them handy when parsing the suburbs but that will likely have many gaps. Also a basic word distance approach for this fails pretty quickly.
Advice greatly appreciated!