Human-to-Human Conversations Data Extrapolator (using RASA)

I have an unusual application specific goal, apparently unrelated to common chatbot developement (with RASA), but instead it could by a common interesting case (please vote this thread/reply to confirm it, if this is true).

My goal is to build a β€œdata extrapolator” that extract structured data (sort of semantic NER), having

  • In input: the full conversation transcript between two ore more humans, conversing about some (almost defined) topic/domain, with a sort of almost "free-form dialogue. Common scenarios? Business minute-meeting recordings, a phone conversation, a patient-monitor visit, etc. etc. I call these block of multi-user conversation blocks: a dialog session.

  • In output I want to obtain a sort of β€œsummarization”, not in (usual) terms of producing a natural language abstract of a full text (the full conversation in this case), but instead I want to collect relevant data (in a set of domains/topics) extracting those following the context of the conversation.

Here a rough block diagram:

            user_id: 1     user_id: 2    user_id: N
                β”‚             β”‚             β”‚
                β”‚             β”‚             β”‚
             β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
             β”‚                                     β”‚ [1]
             β”‚ - user_id: '1'                      β”‚
             β”‚   sentence: bla bla bla             β”‚
             β”‚   timestamp: '1631519401'           β”‚
             β”‚                                     β”‚
             β”‚ - user_id: '2'                      β”‚
             β”‚   sentence: bla bla bla bla bla bla β”‚
             β”‚   timestamp: '1631519443'           β”‚
             β”‚                                     β”‚
             β”‚ - user_id: '2'                      β”‚
             β”‚   sentence: bla bla                 β”‚
             β”‚   timestamp: '1631519522'           β”‚
             β”‚                                     β”‚
             β”‚ - user_id: '1'                      β”‚
             β”‚   sentence: bla bla bla bla         β”‚
             β”‚   timestamp: '1631519589'           β”‚
             β”‚                                     β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             transcript.yaml     β”‚
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                         β”‚ [2]
                    β”‚     structured data     β”‚
                    β”‚       extrapolator      β”‚
                    β”‚                         β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                   data.json     β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ {                           β”‚ [3]
                  β”‚   "data1":                  β”‚
                  β”‚   {                         β”‚
                  β”‚       "attribute1": "...",  β”‚
                  β”‚       "attribute2": "...",  β”‚
                  β”‚       "attribute3": "..."   β”‚
                  β”‚                             β”‚
                  β”‚   },                        β”‚
                  β”‚   "data2":                  β”‚
                  β”‚   {                         β”‚
                  β”‚       "attribute4": "...",  β”‚
                  β”‚       "attribute5": "..."   β”‚
                  β”‚   },                        β”‚
                  β”‚                             β”‚
                  β”‚   "data3": "...",           β”‚
                  β”‚   "data4": "..."            β”‚
                  β”‚                             β”‚
                  β”‚ }                           β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Goal 1: As main need, I want to collect a data summarization as a structure of relevant data, with a postprocessor (the β€œextrapolator”), so with a software acting as an off-line elaboration. Possibly annotating the data sources (e.g. who said some entity, in what context of the discourse), but a first result could be the achievement of a rough structure.

In the long term/as a secondary refined goal, I want the bot acting in real-time, being part od the conversation among humans, (in this hypothetical scenario the tool would be a sort of β€œbot-in-the-loop”… but for the moment, let focus on a off-line elaboration (goal 1).

Now my point how to build this β€œconversation data extrapolator” [2], using RASA. I confess I’m a bit confused and I’m here to ask help for a solution using RASA or any other useful tool.

Here my first gift thinking in RASA terms: in the human-to-human turn taking, I want to recognize conversational patterns of typical sequence of turns. These pattern sequences are maybe comparable with RASA stories.

Makes sense?

An idea I’m thinking about is to recognize all relevant stories contained in the full conversation (transcript.yaml [1]). I imagine each of these stories as a multi-turn sequences terminated by an action that collect relevant slots (elaborating intents/entities). By example:

# stories.yml concept
stories:
- story: alfa
  steps:
  - intent: intent_alfa_1 # user_1 say
  - intent: intent_alfa_2 # user_2 reply
  - intent: intent_alfa_3 # user_1 say 
  - action: store_slots_alfa

- story: beta
  steps:
  - intent: intent_beta_1 # user_2 say 
  - intent: intent_beta_2 # user_1 reply
  - intent: intent_beta_3 # user_2 say
  - action: store_slots_beta

- story: gamma
  steps:
  - ...
  - ...

So the weird (but I guess feasible) approach is to model with RASA stories each different-user sentence as an intent (containing entities).That’s something different in comparison with usual chatbot-like intent-action sequence. Here we have almost concatenation of intents with a final action that store collected slots.

UPDATE At run-time, we have to set-up a RASA client interface (via HTTP or RASASDK?) so the RASA run-time engine must return a feedback to the client for each elaborated sentence, so I believe each story must be a sequence of intent/action, as usual… where in this case the action could be an unique next custom action that tracks/log the intent/entities info, and return β€œok, and submit the next sentence” to the caller.

The previous stories.yml becomes in practice:

# stories.yml
# set of stories (sub-conversation patterns)
# practical implementation 
stories:
- story: alfa
  steps:
  - intent: intent_alfa_1 # input: user_1 say something
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_alfa_2 # input: user_2 reply something
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_alfa_3 # input: user_1 say something
  - action: store_slots_alfa

- story: beta
  steps:
  - intent: intent_beta_1 # input: user_2 say 
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_beta_2 # input: user_1 reply
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_beta_3 # input: user_2 say
  - action: store_slots_beta

- story: gamma
  steps:
  - ...
  - ...

The hypotesis is this:

  • At train-time I submit to rasa a relevant set of stories, covering possible conversations.
  • At run-time a batch program inject to RASA step-by-step, submitting sequentially each user’s utterance (reading from transcript.yaml [1]). During processing, RASA custom actions store slots in the final β€œdata structure” object, with some logic/reasoning about collected slots.
user_id: 1     user_id: 2    user_id: N

       β”‚             β”‚             β”‚
       β”‚             β”‚             β”‚
    β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
    β”‚                                     β”‚
    β”‚ - user_id: '1'                      β”‚
    β”‚   sentence: bla bla bla             β”‚            domain-realted stories
    β”‚   timestamp: '1631519401'           β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                                     β”‚            β”‚          β”‚
    β”‚ - user_id: '2'                      β”‚            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”
    β”‚   sentence: bla bla bla bla bla bla β”‚            └───          β”‚
    β”‚   timestamp: '1631519443'           β”‚               β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”
    β”‚                                     β”‚               └───           β”‚
    β”‚ - user_id: '2'                      β”‚                  β”‚           β”‚
    β”‚   sentence: bla bla                 β”‚                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
    β”‚   timestamp: '1631519522'           β”‚                        β”‚
    β”‚                                     β”‚                β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ - user_id: '1'                      β”‚                β”‚                 β”‚
    β”‚   sentence: bla bla bla bla         β”‚                β”‚  RASA train     β”‚
    β”‚   timestamp: '1631519589'           β”‚                β”‚                 β”‚
    β”‚                                     β”‚                β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚                 β”‚
    transcript.yaml     β”‚                                  β”‚    RASA model   β”‚
                        β”‚                                  β”‚                 β”‚
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
           β”‚                         β”‚                     β”‚                 β”‚
           β”‚     structured data     β”‚                     β”‚    RASA run     β”‚
           β”‚       extrapolator      ◄──────────────────────                 β”‚
           β”‚                         β”‚                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
 conversation_data.json β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ {                           β”‚
         β”‚   "data1":                  β”‚
         β”‚   {                         β”‚
         β”‚       "attribute1": "...",  β”‚
         β”‚       "attribute2": "...",  β”‚
         β”‚       "attribute3": "..."   β”‚
         β”‚                             β”‚
         β”‚   },                        β”‚
         β”‚   "data2":                  β”‚
         β”‚   {                         β”‚
         β”‚       "attribute4": "...",  β”‚
         β”‚       "attribute5": "..."   β”‚
         β”‚   },                        β”‚
         β”‚                             β”‚
         β”‚   "data3": "...",           β”‚
         β”‚   "data4": "..."            β”‚
         β”‚                             β”‚
         β”‚ }                           β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Does all this make sense?
Any suggestion / drawbacks / pitfalls?
Or there is any smarter alternative (maybe without using RASA) ?

Thanks
Giorgio

1 Like

This makes me think of database transactions. I’m curious why β€œconcatenation of intents” and storing after a few user sentences, instead of storing after every user sentence? Intuitively that saves network calls, but not sure if it matters much since users don’t expect extremely fast responses in chatbot applications.