I am very new to Rasa and NLP, spent reading documentation and video and finally have gathered courage to dirty my hands with chatbot creation.
I am trying to make a chatbot for medical shop. Which will provide information about medicine and suggest on doses etc based on person’s profile.
I have a basic chat bot ready, but the issue is that chatbot is not recognizing the medicine names beyond what I have trained for. I need to tell the chatbot about all the medicines, which is a big list.
I can generate a list of medicines or I can collect articles from net about medicines. But I am not sure how to train the chatbot for this list.
At present my pipeline looks like this (same as created by default)
As messages from user will be in English, i want to continue using pre-trained data, but I wish to amend the pre-trained data with my domain specific words.
Is there any other way to do it, I thought of using lookup table, but not sure if this will be a good idea, given there can be thousands of medicines and people can write name in some different orders.
For this, I’m not an expert but I use lookup table ; I already build a lookup table with “French firstnames list of 2019”, so some thousands of example, and it works well to detect entity, associated with some intent training
In the Rasa NLU examples repo… FlashText is used for Lookups of Large List that uses an exact matches. I would imagine for names of Medicines, it is pretty unique most of the times, i suppose.
I tried regexp, the issue is that extracted entities are not all the words that are there in the lookup. I wanted a way to extract the whole part of the lookup. From my understanding I needed a way to do some featurization of whole string, and use that to find the best match from the user’s input.
I have followed steps mentioned in here https://towardsdatascience.com/give-some-semantic-love-to-your-keyword-search-c35f16df2ee
I create a file ‘action_helper.py’, which has a class LookupExtractor, the class creates a model
to extract full text from lookup table.
In my rasa chatbot,
I get the slots filled using form, where I ask user for various items in the form.
My thoughts, when bot receives a sentence from user for a form question, ValidateForm is called. in the ValidateForm, I initialize the LookupExtractor to create the embedded vector for each lookup based on model en_core_web_sm .
When validateForm is called for the lookup item, i pass the whole sentence to the lookupExtractor, and it returns the full text from lookup based on extraction. I set the extracted value to the slot “medName”
This solution works, but I believe this is not optimal
in lookupExtractor initialization, I load en_core_web_sm using spacy, the same is done by rasa pipeline (so model is loaded two times)
When validateForm is called, the slot medName is filled by NLU. Then using lookup extraction, I fill the slot again using 'return {“med_name”: med_name}`. When running rasa in interaction mode, I see the slot has two value, the original that is extracted by nlu processor, and then the value I put after lookupExtraction. which I think is not correct, because probably the older value will also influence conversation.
Is there any way, to remove the older value of slot medName, before entering the new name?
With the very minimal knowledge of Rasa, I think I need to write custom featurizer and custom classifier. Please correct me if I am wrong! If its true, please point me to a simple example or tutorial for this. I could not find one, which can be easy enough to understand it and use to create my own.