I have an unusual application specific goal, apparently unrelated to common chatbot developement (with RASA), but instead it could by a common interesting case (please vote this thread/reply to confirm it, if this is true).
My goal is to build a βdata extrapolatorβ that extract structured data (sort of semantic NER), having
-
In input: the full conversation transcript between two ore more humans, conversing about some (almost defined) topic/domain, with a sort of almost "free-form dialogue. Common scenarios? Business minute-meeting recordings, a phone conversation, a patient-monitor visit, etc. etc. I call these block of multi-user conversation blocks: a dialog session.
-
In output I want to obtain a sort of βsummarizationβ, not in (usual) terms of producing a natural language abstract of a full text (the full conversation in this case), but instead I want to collect relevant data (in a set of domains/topics) extracting those following the context of the conversation.
Here a rough block diagram:
user_id: 1 user_id: 2 user_id: N
β β β
β β β
ββββΌββββββββββββββΌββββββββββββββΌβββββββ
β β [1]
β - user_id: '1' β
β sentence: bla bla bla β
β timestamp: '1631519401' β
β β
β - user_id: '2' β
β sentence: bla bla bla bla bla bla β
β timestamp: '1631519443' β
β β
β - user_id: '2' β
β sentence: bla bla β
β timestamp: '1631519522' β
β β
β - user_id: '1' β
β sentence: bla bla bla bla β
β timestamp: '1631519589' β
β β
βββββββββββββββββββββ¬ββββββββββββββββββ
transcript.yaml β
β
ββββββββββββββΌβββββββββββββ
β β [2]
β structured data β
β extrapolator β
β β
ββββββββββββββ¬βββββββββββββ
β
data.json β
ββββββββββββββββΌβββββββββββββββ
β { β [3]
β "data1": β
β { β
β "attribute1": "...", β
β "attribute2": "...", β
β "attribute3": "..." β
β β
β }, β
β "data2": β
β { β
β "attribute4": "...", β
β "attribute5": "..." β
β }, β
β β
β "data3": "...", β
β "data4": "..." β
β β
β } β
βββββββββββββββββββββββββββββββ
Goal 1: As main need, I want to collect a data summarization as a structure of relevant data, with a postprocessor (the βextrapolatorβ), so with a software acting as an off-line elaboration. Possibly annotating the data sources (e.g. who said some entity, in what context of the discourse), but a first result could be the achievement of a rough structure.
In the long term/as a secondary refined goal, I want the bot acting in real-time, being part od the conversation among humans, (in this hypothetical scenario the tool would be a sort of βbot-in-the-loopββ¦ but for the moment, let focus on a off-line elaboration (goal 1).
Now my point how to build this βconversation data extrapolatorβ [2], using RASA. I confess Iβm a bit confused and Iβm here to ask help for a solution using RASA or any other useful tool.
Here my first gift thinking in RASA terms: in the human-to-human turn taking, I want to recognize conversational patterns of typical sequence of turns. These pattern sequences are maybe comparable with RASA stories.
Makes sense?
An idea Iβm thinking about is to recognize all relevant stories contained in the full conversation (transcript.yaml [1]). I imagine each of these stories as a multi-turn sequences terminated by an action that collect relevant slots (elaborating intents/entities). By example:
# stories.yml concept
stories:
- story: alfa
steps:
- intent: intent_alfa_1 # user_1 say
- intent: intent_alfa_2 # user_2 reply
- intent: intent_alfa_3 # user_1 say
- action: store_slots_alfa
- story: beta
steps:
- intent: intent_beta_1 # user_2 say
- intent: intent_beta_2 # user_1 reply
- intent: intent_beta_3 # user_2 say
- action: store_slots_beta
- story: gamma
steps:
- ...
- ...
So the weird (but I guess feasible) approach is to model with RASA stories each different-user sentence as an intent (containing entities).Thatβs something different in comparison with usual chatbot-like intent-action sequence. Here we have almost concatenation of intents with a final action that store collected slots.
UPDATE
At run-time, we have to set-up a RASA client interface (via HTTP or RASASDK?) so the RASA run-time engine must return a feedback to the client for each elaborated sentence, so I believe each story must be a sequence of intent/action, as usual⦠where in this case the action could be an unique next
custom action that tracks/log the intent/entities info, and return βok, and submit the next sentenceβ to the caller.
The previous stories.yml becomes in practice:
# stories.yml
# set of stories (sub-conversation patterns)
# practical implementation
stories:
- story: alfa
steps:
- intent: intent_alfa_1 # input: user_1 say something
- action: next # output: track who/when/intent/entities & feedback to client
- intent: intent_alfa_2 # input: user_2 reply something
- action: next # output: track who/when/intent/entities & feedback to client
- intent: intent_alfa_3 # input: user_1 say something
- action: store_slots_alfa
- story: beta
steps:
- intent: intent_beta_1 # input: user_2 say
- action: next # output: track who/when/intent/entities & feedback to client
- intent: intent_beta_2 # input: user_1 reply
- action: next # output: track who/when/intent/entities & feedback to client
- intent: intent_beta_3 # input: user_2 say
- action: store_slots_beta
- story: gamma
steps:
- ...
- ...
The hypotesis is this:
- At train-time I submit to rasa a relevant set of stories, covering possible conversations.
- At run-time a batch program inject to RASA step-by-step, submitting sequentially each userβs utterance (reading from transcript.yaml [1]). During processing, RASA custom actions store slots in the final βdata structureβ object, with some logic/reasoning about collected slots.
user_id: 1 user_id: 2 user_id: N
β β β
β β β
ββββΌββββββββββββββΌββββββββββββββΌβββββββ
β β
β - user_id: '1' β
β sentence: bla bla bla β domain-realted stories
β timestamp: '1631519401' β ββββββββββββ
β β β β
β - user_id: '2' β β βββββββββ΄βββ
β sentence: bla bla bla bla bla bla β ββββ€ β
β timestamp: '1631519443' β β βββββββββ΄ββββ
β β ββββ€ β
β - user_id: '2' β β β
β sentence: bla bla β βββββββ¬ββββββ
β timestamp: '1631519522' β β
β β βββββββββΌββββββββββ
β - user_id: '1' β β β
β sentence: bla bla bla bla β β RASA train β
β timestamp: '1631519589' β β β
β β βββββββββββββββββββ€
βββββββββββββββββββββ¬ββββββββββββββββββ β β
transcript.yaml β β RASA model β
β β β
ββββββββββββββΌβββββββββββββ βββββββββββββββββββ€
β β β β
β structured data β β RASA run β
β extrapolator βββββββββββββββββββββββ€ β
β β βββββββββββββββββββ
ββββββββββββββ¬βββββββββββββ
β
conversation_data.json β
ββββββββββββββββΌβββββββββββββββ
β { β
β "data1": β
β { β
β "attribute1": "...", β
β "attribute2": "...", β
β "attribute3": "..." β
β β
β }, β
β "data2": β
β { β
β "attribute4": "...", β
β "attribute5": "..." β
β }, β
β β
β "data3": "...", β
β "data4": "..." β
β β
β } β
βββββββββββββββββββββββββββββββ
Does all this make sense?
Any suggestion / drawbacks / pitfalls?
Or there is any smarter alternative (maybe without using RASA) ?
Thanks
Giorgio