How to extract intent and entities from a tawk.to json file

Hi All, I am new on how developing an intelligent chatbot. My company has been collecting many conversations from twak.to as json files. It has much information, but the most important part of it is the chat between the customer and the agent. Please, see below an example of how it looks like: “messages”: [

{
  "sender": {
    "t": "v"
  },
  "type": "msg",
  "time": "2020-04-15T07:35:19.452Z",
  "msg": "Salve, mi serve un certificato di diploma conforme agli articoli 23-24 direttiva europea 2005/36 prr il riconoscimento della qualifoca professionale di medico all estero. Come fare per ottenerlo?"
},
{
  "sender": {
    "t": "a",
    "n": "Operatore7"
  },
  "type": "msg",
  "time": "2020-04-15T07:37:58.348Z",
  "msg": "gentile studente, deve fare richiesta inviando una mail a: prova@dummy.it"
}]

As you can see we have the sequence of messages between the customer (identified by the “v” value of the “t” key ) and agent (identified by the “a” value of the “t” key ). Keep in mind this is just an example, but a complete file has generally many questions and answers blocks during a conversation. Thus, my question is : How can I parse this custom json file to figure out/extract “intent” and “entities” ? We have hundreds of files and would be impossible to manually extract such information one by one. Thanks for attention.

Labeling unlabeled data is a bit of a tricky thing because without a model that already has some knowledge of intents/entities to look for, there’s no rule to govern the labels. What you could do, is bootstrap your model by deciding on the main entities and intents in the corpus, then use that model to predict intents for more data, manually correct the predictions, which will improve your model, and so on. In the end, though, you’re going to need to start by hand.