In rasa, chatbot user might enter its PII data but I don’t want to store the information in sqllite tracker. Is there any mechanism to scrub this PII information.
Currently there isn’t something build in to handle this. How would this PII data look like? What would you store in the tracker store instead of the information?
We are planning to store user conversation but we can’t store PII data from user input . Do you have any suggestion which can help apart from rasa.
I’d recommend building a middleware api endpoint layer that would process the user input to redact the PII data. You can redact the PII data in two ways:
- Before sending it to the Rasa webhook. This will ensure that there is no PII data, but some functionality that requires PII data for processing via the chat will be lost.
- After sending it to the Rasa webhook and redact the PII data while storing the data in a custom db table to track the chat history. And have a policy to delete records from the tracker store. This will ensure that you retain all the functionality, while limiting the exposure of the PII data to a few records.
Some options to redact PII data:
- GitHub - vmenger/deduce: Deduce: de-identification method for Dutch medical text
- GitHub - madisonmay/CommonRegex: A collection of common regular expressions bundled with an easy to use interface.
- GitHub - solvvy/redact-pii: Remove personally identifiable information from text.
- Cloud Data Loss Prevention (DLP) documentation | Cloud Data Loss Prevention | Google Cloud