Do persistent tracker stores persist forever?

Hi, I know there are several posts discussing this topic, but I’m still a bit confused about how the tracker and tracker stores work. So far I’ve just been using the InMemoryTrackerStore, but now I want to get real and use MongoDB to persist the tracker history.

I understand that the main purpose of doing this is to maintain state if the rasa_core process restarts. But I’m trying to wrap my head around whether the MongoTrackerStore persists events indefinitely (if max_event_history isn’t set)? I tried using the “restart” command/event, and understand that it doesn’t delete anything from the tracker store, but the resultant tracker state history is wiped by the “restart” event. Do “old events” ever get deleted from the tracker store? If not, it’s obviously a good persistent storage from which to retrieve logs to annotate and create more training data, and I don’t have to bother setting an event broker to persist all the events. But if the tracker history for a given user just keeps growing, doesn’t it at some point cause latency issues? The bot I’m making is intended to be a power-user tool so hopefully gets heavy usage.

I have a feeling I’m not understanding where the history is/should be limited… what is the best practice?

Hi @einarbmag - welcome to the Rasa forums.

Regarding latency; I’ve found that the latency can depend on the channel you use.

In my case, I have a bot which is hosted on a webpage and it uses MongoDB on the backend for the tracker store.

Initially I tried using the scalableminds/chatroom widget - which uses a custom channel - but quickly dropped it, as the latency gets crazy. The widget seems to poll the tracker state once per second. The responsiveness of the bot widget degenerates very quickly as the tracker storage format is quite verbose, and it tends to grow quickly as conversations go on. In one of my test dialogs, this widget ended up polling the same ~100Kbytes of tracker data per second. (My apologies to the chatroom developers if this is due to a pebcak error on my part - but it made it unsuitable for my application.)

I’ve since switched to the mrbot-ai/webchat widget that uses the socketio channel to send user utterances to the core and gets responses pushed back to it - this is much lighter on communications. The widget also caches the conversation locally (either in session or local storage.) This works out really well for my use-case and I can have lengthy conversations without any noticeable latency in the website.

I haven’t connected my bot to any other channels (yet) so I can’t comment on latency for those.

Regarding persistence; Using the default in-memory tracker, or Redis without enabling one of its persistence modes will obviously limit the tracker-store lifespan to that of the process being restarted or the host machine power-cycled.

Regarding best-practice for trimming history; I have no data on this at the moment as I’m just starting to consider this issue myself since I read your post :slight_smile:

Hope that helps.

Hi Steve, thanks for your reply. Good points about the channel mattering as well for latency. But if I understand the core code correctly, the agent retrieves the tracker from the tracker store every time it predicts the next action, which is where I would worry about latency if we don’t set a max_event_history.

Regarding history configuration: we can specify max_history for dialogue policies, and we can specify max_event_history for dialogue trackers. I guess the point in specifying a longer history for the trackers than the policy would be just if an action would like to access a longer history of what the user has said, e.g. to tell the user which persons have been mentioned in the conversation so far.

I’m guessing best practice is to limit the tracker max_event_history (e.g. 20-100?) to avoid bloating the tracker store, and use the event broker approach to persist every event for separate analysis/annotation to improve training data. Would be great to get authoritative input on this from someone like @akelad.

Another idea that came up in my team is to decide heuristically on some “session” definition, e.g. new session each day or after a specific period of inactivity, and append a session_id to the sender_id, and thus start a new tracker store object for each session, thus avoiding the object growing out of hand. But that doesn’t really sound like the Rasa way to me.

Hi Einar,

Ok, I see where you are going with this a little better now and like the ideas (trim tracker state for core, use brokered remote persistence of the conversation and session heuristics.)

@einarbmag I wouldn’t limit max_event_history in the tracker store right now, because this means your event history will get overwritten every time in MongoDB. You should be fine to store the whole tracker store, we haven’t had any issues with this with our demo bot so far

Hi Einar, we’ve also found the persistent tracker store is a critical problem:

Also note that max_event_history has no impact on tracker (see comment on issue here: Tracker gets very big for long conversations · Issue #3011 · RasaHQ/rasa · GitHub).

i think because of this responses getting slow.