How to extract intent and entities from a tawk.to json file

ralex · May 4, 2020, 8:25am

Hi All, I am new on how developing an intelligent chatbot. My company has been collecting many conversations from twak.to as json files. It has much information, but the most important part of it is the chat between the customer and the agent. Please, see below an example of how it looks like: “messages”: [

{
  "sender": {
    "t": "v"
  },
  "type": "msg",
  "time": "2020-04-15T07:35:19.452Z",
  "msg": "Salve, mi serve un certificato di diploma conforme agli articoli 23-24 direttiva europea 2005/36 prr il riconoscimento della qualifoca professionale di medico all estero. Come fare per ottenerlo?"
},
{
  "sender": {
    "t": "a",
    "n": "Operatore7"
  },
  "type": "msg",
  "time": "2020-04-15T07:37:58.348Z",
  "msg": "gentile studente, deve fare richiesta inviando una mail a: prova@dummy.it"
}]

As you can see we have the sequence of messages between the customer (identified by the “v” value of the “t” key ) and agent (identified by the “a” value of the “t” key ). Keep in mind this is just an example, but a complete file has generally many questions and answers blocks during a conversation. Thus, my question is : How can I parse this custom json file to figure out/extract “intent” and “entities” ? We have hundreds of files and would be impossible to manually extract such information one by one. Thanks for attention.

mloubser · May 14, 2020, 7:48pm

Labeling unlabeled data is a bit of a tricky thing because without a model that already has some knowledge of intents/entities to look for, there’s no rule to govern the labels. What you could do, is bootstrap your model by deciding on the main entities and intents in the corpus, then use that model to predict intents for more data, manually correct the predictions, which will improve your model, and so on. In the end, though, you’re going to need to start by hand.

Topic		Replies	Views
Entity association to multiple intent Rasa Open Source	4	1128	February 18, 2019
Can not extract entities Rasa Open Source	15	602	October 19, 2022
Rasa nlu markdown format Rasa Open Source	3	811	July 16, 2019
Entities extraction rasa 2 Rasa Open Source	2	252	February 9, 2021
Extracting entities and intents from scraped text Getting Started with Rasa	0	145	July 3, 2022

How to extract intent and entities from a tawk.to json file

Related topics