Tables in rasa.db and tracker.db? need a reply URGENTLY

Can you please explain each of the tables and their columns: in rasa.db and tracker.db?

The tables in tracker.db : events.

The tables in rasa.db :

alembic_version intent analytics_cache lookup_table chat_tokens message_log conversation model conversation_action_metadata model_tag conversation_action_statistic nlu_evaluation conversation_entity_metadata nlu_evaluation_prediction conversation_entity_statistic nlu_training_data conversation_event nlu_training_data_entity conversation_intent_metadata password conversation_intent_statistic permission conversation_message_correction platform_feature conversation_policy_metadata project conversation_policy_statistic rasa_x_user conversation_session regex_feature conversation_statistic single_use_token domain story domain_action template domain_entity temporary_intent_example domain_intent user_goal domain_slot user_goal_intent entity_synonym user_role environment user_role_mapping

Hi @sheetal,

I think describing each of the tables is a bit too extensive, but do you have any specific questions?

Hi. What information does the message_log table hold?

I thought it held unique messages, but I realized that this is not the case. I saw messages in the conversation_event table that users sent(type_name=user) but did not exist in the message_log table at all (They were not in the nlu_training_data table).

I is supposed to store unique messages which the users sent. How did you check if the messages from conversation_event are in message_log?. nlu_training_data stores the training data - so that’s completely different.

I also thought this table stores unique user messages which you have now confirmed.

I noticed that sometimes some user messages are not stored in this table. I was just testing some messages. At first, when I saw a message that was not in this table but was one of my training examples, I thought that maybe if the user message is one of our training examples, it will not be saved in this table. This assumption was violated by further testing. Anyway, It was strange to me why some messages are not in this table.

I wrote this code for you. Let me show you:

import json
import psycopg2

con = psycopg2.connect(database="rasa", user="admin", host="", port="5435", password="")

cur = con.cursor()
cur.execute("""SELECT count(*) FROM conversation_event WHERE type_name=\'user\';""")
record = cur.fetchone()
cur.close()
print("number of user events:", record[0])

user_events = set()
cur = con.cursor()
cur.execute("""SELECT id,data FROM conversation_event WHERE type_name=\'user\';""")
records = cur.fetchall()
for row in records:
    parsed_data = json.loads(row[1])
    user_events.add(parsed_data['text'])
print("number of user events:", len(records))
print("number of unique user events:", len(user_events))
cur.close()

message_log = set()
cur = con.cursor()
cur.execute("""SELECT event_id,text FROM message_log;""")
records = cur.fetchall()
for row in records:
    message_log.add(row[1])
cur.close()
print("number of unique text messages:", len(message_log))

print(len(user_events.intersection(message_log)))
print(len(user_events.difference(message_log)))

con.commit()

This is the result:

3q5slCVUA3smxpAWert5_1603725061568

I received 1,110,845 messages from users. I had 425,409 unique user events but only 381,939 of them are stored in this table and 43,470 of them are not.

Hi @z.fallahn,

thanks for the detective’s work! This is weird indeed. Can we please check the content of some of these events which are in conversation_event but not in message_log? We are using md5 to hash the texts, but I doubt these are all collisions :thinking:

Hi @Tobias_Wochinger,

I checked them and they looked like normal messages. (User messages are not in English, otherwise I would have shared them with you.)

I changed the code and checked them based on the md5 hash. I hope this is the change you were looking for:

import json
import psycopg2
from hashlib import md5


def get_text_hash(text):
    """Calculate the md5 hash of a string."""
    if text is None:
        text = b""
    elif not isinstance(text, bytes):
        text = text.encode()
    return md5(text).hexdigest()


con = psycopg2.connect(database="rasa", user="admin", host="", port="5435", password="")

cur = con.cursor()
cur.execute("""SELECT count(*) FROM conversation_event WHERE type_name=\'user\';""")
record = cur.fetchone()
cur.close()
print("number of user events:", record[0])

user_events = set()
cur = con.cursor()
cur.execute("""SELECT id,data FROM conversation_event WHERE type_name=\'user\';""")
records = cur.fetchall()
for row in records:
    parsed_data = json.loads(row[1])
    user_events.add(get_text_hash(parsed_data['text']))
print("number of user events:", len(records))
print("number of unique user events:", len(user_events))
cur.close()

message_log = set()
cur = con.cursor()
cur.execute("""SELECT hash FROM message_log;""")
records = cur.fetchall()
for row in records:
    message_log.add(row[0])
cur.close()
print("number of unique text messages:", len(message_log))

print(len(user_events.intersection(message_log)))
print(len(user_events.difference(message_log)))
print(len(message_log.difference(user_events)))

con.commit()

This is the result:

Capture

It’s so weird. I also have 12 messages in the message_log that are not in the conversation_event!

If it needs further investigation, I will do whatever you want to help solve the problems and improve rasa. I am using rasax=0.24.7, rasa=1.6.1, and rasa_sdk=1.6.1.

Hi @sheetal , could you please tell me how did you opened and viewed the rasa.db and the tracker.db files?