Human-to-Human Conversations Data Extrapolator (using RASA)

solyarisoftware · September 13, 2021, 10:35am

I have an unusual application specific goal, apparently unrelated to common chatbot developement (with RASA), but instead it could by a common interesting case (please vote this thread/reply to confirm it, if this is true).

My goal is to build a “data extrapolator” that extract structured data (sort of semantic NER), having

In input: the full conversation transcript between two ore more humans, conversing about some (almost defined) topic/domain, with a sort of almost "free-form dialogue. Common scenarios? Business minute-meeting recordings, a phone conversation, a patient-monitor visit, etc. etc. I call these block of multi-user conversation blocks: a dialog session.
In output I want to obtain a sort of “summarization”, not in (usual) terms of producing a natural language abstract of a full text (the full conversation in this case), but instead I want to collect relevant data (in a set of domains/topics) extracting those following the context of the conversation.

Here a rough block diagram:

            user_id: 1     user_id: 2    user_id: N
                │             │             │
                │             │             │
             ┌──▼─────────────▼─────────────▼──────┐
             │                                     │ [1]
             │ - user_id: '1'                      │
             │   sentence: bla bla bla             │
             │   timestamp: '1631519401'           │
             │                                     │
             │ - user_id: '2'                      │
             │   sentence: bla bla bla bla bla bla │
             │   timestamp: '1631519443'           │
             │                                     │
             │ - user_id: '2'                      │
             │   sentence: bla bla                 │
             │   timestamp: '1631519522'           │
             │                                     │
             │ - user_id: '1'                      │
             │   sentence: bla bla bla bla         │
             │   timestamp: '1631519589'           │
             │                                     │
             └───────────────────┬─────────────────┘
             transcript.yaml     │
                                 │
                    ┌────────────▼────────────┐
                    │                         │ [2]
                    │     structured data     │
                    │       extrapolator      │
                    │                         │
                    └────────────┬────────────┘
                                 │
                   data.json     │
                  ┌──────────────▼──────────────┐
                  │ {                           │ [3]
                  │   "data1":                  │
                  │   {                         │
                  │       "attribute1": "...",  │
                  │       "attribute2": "...",  │
                  │       "attribute3": "..."   │
                  │                             │
                  │   },                        │
                  │   "data2":                  │
                  │   {                         │
                  │       "attribute4": "...",  │
                  │       "attribute5": "..."   │
                  │   },                        │
                  │                             │
                  │   "data3": "...",           │
                  │   "data4": "..."            │
                  │                             │
                  │ }                           │
                  └─────────────────────────────┘

Goal 1: As main need, I want to collect a data summarization as a structure of relevant data, with a postprocessor (the “extrapolator”), so with a software acting as an off-line elaboration. Possibly annotating the data sources (e.g. who said some entity, in what context of the discourse), but a first result could be the achievement of a rough structure.

In the long term/as a secondary refined goal, I want the bot acting in real-time, being part od the conversation among humans, (in this hypothetical scenario the tool would be a sort of “bot-in-the-loop”… but for the moment, let focus on a off-line elaboration (goal 1).

Now my point how to build this “conversation data extrapolator” [2], using RASA. I confess I’m a bit confused and I’m here to ask help for a solution using RASA or any other useful tool.

Here my first gift thinking in RASA terms: in the human-to-human turn taking, I want to recognize conversational patterns of typical sequence of turns. These pattern sequences are maybe comparable with RASA stories.

Makes sense?

An idea I’m thinking about is to recognize all relevant stories contained in the full conversation (transcript.yaml [1]). I imagine each of these stories as a multi-turn sequences terminated by an action that collect relevant slots (elaborating intents/entities). By example:

# stories.yml concept
stories:
- story: alfa
  steps:
  - intent: intent_alfa_1 # user_1 say
  - intent: intent_alfa_2 # user_2 reply
  - intent: intent_alfa_3 # user_1 say 
  - action: store_slots_alfa

- story: beta
  steps:
  - intent: intent_beta_1 # user_2 say 
  - intent: intent_beta_2 # user_1 reply
  - intent: intent_beta_3 # user_2 say
  - action: store_slots_beta

- story: gamma
  steps:
  - ...
  - ...

So the weird (but I guess feasible) approach is to model with RASA stories each different-user sentence as an intent (containing entities).That’s something different in comparison with usual chatbot-like intent-action sequence. Here we have almost concatenation of intents with a final action that store collected slots.

UPDATE At run-time, we have to set-up a RASA client interface (via HTTP or RASASDK?) so the RASA run-time engine must return a feedback to the client for each elaborated sentence, so I believe each story must be a sequence of intent/action, as usual… where in this case the action could be an unique next custom action that tracks/log the intent/entities info, and return “ok, and submit the next sentence” to the caller.

The previous stories.yml becomes in practice:

# stories.yml
# set of stories (sub-conversation patterns)
# practical implementation 
stories:
- story: alfa
  steps:
  - intent: intent_alfa_1 # input: user_1 say something
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_alfa_2 # input: user_2 reply something
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_alfa_3 # input: user_1 say something
  - action: store_slots_alfa

- story: beta
  steps:
  - intent: intent_beta_1 # input: user_2 say 
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_beta_2 # input: user_1 reply
  - action: next          # output: track who/when/intent/entities & feedback to client
  - intent: intent_beta_3 # input: user_2 say
  - action: store_slots_beta

- story: gamma
  steps:
  - ...
  - ...

The hypotesis is this:

At train-time I submit to rasa a relevant set of stories, covering possible conversations.
At run-time a batch program inject to RASA step-by-step, submitting sequentially each user’s utterance (reading from transcript.yaml [1]). During processing, RASA custom actions store slots in the final “data structure” object, with some logic/reasoning about collected slots.

user_id: 1     user_id: 2    user_id: N

       │             │             │
       │             │             │
    ┌──▼─────────────▼─────────────▼──────┐
    │                                     │
    │ - user_id: '1'                      │
    │   sentence: bla bla bla             │            domain-realted stories
    │   timestamp: '1631519401'           │            ┌──────────┐
    │                                     │            │          │
    │ - user_id: '2'                      │            │  ┌───────┴──┐
    │   sentence: bla bla bla bla bla bla │            └──┤          │
    │   timestamp: '1631519443'           │               │  ┌───────┴───┐
    │                                     │               └──┤           │
    │ - user_id: '2'                      │                  │           │
    │   sentence: bla bla                 │                  └─────┬─────┘
    │   timestamp: '1631519522'           │                        │
    │                                     │                ┌───────▼─────────┐
    │ - user_id: '1'                      │                │                 │
    │   sentence: bla bla bla bla         │                │  RASA train     │
    │   timestamp: '1631519589'           │                │                 │
    │                                     │                ├─────────────────┤
    └───────────────────┬─────────────────┘                │                 │
    transcript.yaml     │                                  │    RASA model   │
                        │                                  │                 │
           ┌────────────▼────────────┐                     ├─────────────────┤
           │                         │                     │                 │
           │     structured data     │                     │    RASA run     │
           │       extrapolator      ◄─────────────────────┤                 │
           │                         │                     └─────────────────┘
           └────────────┬────────────┘
                        │
 conversation_data.json │
         ┌──────────────▼──────────────┐
         │ {                           │
         │   "data1":                  │
         │   {                         │
         │       "attribute1": "...",  │
         │       "attribute2": "...",  │
         │       "attribute3": "..."   │
         │                             │
         │   },                        │
         │   "data2":                  │
         │   {                         │
         │       "attribute4": "...",  │
         │       "attribute5": "..."   │
         │   },                        │
         │                             │
         │   "data3": "...",           │
         │   "data4": "..."            │
         │                             │
         │ }                           │
         └─────────────────────────────┘

Does all this make sense?
Any suggestion / drawbacks / pitfalls?
Or there is any smarter alternative (maybe without using RASA) ?

Thanks
Giorgio

gitgithan · April 26, 2022, 1:51am

This makes me think of database transactions. I’m curious why “concatenation of intents” and storing after a few user sentences, instead of storing after every user sentence? Intuitively that saves network calls, but not sure if it matters much since users don’t expect extremely fast responses in chatbot applications.

Topic		Replies	Views
How to convert large conversation data to stories in rasa Rasa Open Source	1	472	January 20, 2020
Can I use Rasa to extract key points of a conversation? Rasa Open Source	4	958	August 16, 2018
How to map raw conversations into stories for training Rasa Core? Rasa Open Source	3	1320	December 28, 2019
Using RASA in education to build "social practice" dialogues Getting Started with Rasa	7	302	August 5, 2021
Ideas Rasa Open Source	38	5900	February 11, 2024

Human-to-Human Conversations Data Extrapolator (using RASA)

Related topics