Entity extraction problem due to Slack adding link

Rasa Core version :0.12.3

Python version : 3.6

Operating system Windows, 64

Issue : Cannot extract entity from messages coming from Slack, because Slack recognizes entity as link/email address

Hey,

So I’m trying to connect my bot to Slack, which works fine, but I have an email address entity , which is nicely recognized if I test the bot locally but not with Slack. The reason is that the email address comes from Slack as:

mailto:example@example.com|example@example.com

I found something about turning off unfurl or parsing, but I’m not sure how to do this with RASA. Potentially my thoughts of fixing this could be:

  • Manipulate the strings coming from Slack - but at which point can I so that?
  • Pass some arguments to the app to disable the above link -but where can I do that again?

Thanks!

For your first option (Manipulate the strings coming from Slack - but at which point can I so that?) you can do it at two points :

  • A custom component in the pipeline, but I’ll manipulate all the strings, not only the one from Slack (maybe if you have the information of where the message comes from you can change that) (Custom Components)
  • Or create a custom channel that will treat messages coming from slack and modify the message before it goes in the pipeline (Chat & Voice platforms)

I don’t know for the second option, I had the same problem but didn’t manage to find a solution.

1 Like

@lauraperge did you find a solution for this? I am facing a similar issue

@akelad can you help with this issue?

@erohmensing any ideas about this?

I think this would be a great PR for someone to contribute. Seems like we should manipulate the strings coming from slack in the input channel. Looks like there is already some sanitisation that happens with respect to removing the format for user tags here: rasa/slack.py at master · RasaHQ/rasa · GitHub

@zain @huberrom @lauraperge would any of you be interested in working on this problem? Would probably be as simple as adding another regex to that sanitizing that checks for the mailto: format and converts it back into regular email.