Memoization policy ignored in some cases - probably something fundamental I'm not understanding

Apologies in advance for the long post. :frowning: I’m using rasa version 2.4.3 and rasa-sdk version 2.4.1.

The background is that after thoroughly testing the bot locally, I trained a model (let’s call this file model-ci.tar.gz for convenience) through a CI process that takes the same training examples (nlu and stories) and config.yml that I had pushed to my git repo.

My initial understanding was that the model file once created can function on its own without needing the training examples to exist somewhere accessible every time.

The bot in production that uses model-ci.tar.gz, started to incorrectly fall back after the first utterance itself. I downloaded the same file model-ci.tar.gz, and ran the exact same utterance with that model served on a local Rasa server and it gave the right response. This difference in behavior was consistently reproduced.

Also, note that I trigger a conversation by passing the bot entities extracted from before the conversation starts, and in both cases all slots before the first intent are correctly set in the tracker.

I investigated the tracker for both these conversations. For the correct behavior (model-ci.tar.gz served in a local rasa server), I saw the correct actions predicted by policy_0_MemoizationPolicy. For the wrong behavior however (model-ci.tar.gz served in a production env), I noticed that a correct action was first predicted by policy_1_TEDPolicy instead, and then the wrong action that broke the bot was predicted by policy_2_RulePolicy.

My config.yml is the same on my local files and on the git repo from which CI trained model-ci.tar.gz. Here’s what it looks like -

# Alternative pipeline with spaCy
language: "en_core_web_md"  # your two-letter language code

pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.6
    ambiguity_threshold: 0.1

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
  - name: RulePolicy
    core_fallback_threshold: 0.3
    core_fallback_action_name: fallback_action

I’m confused as to why the Memoization policy was ignored in the case where the bot failed. I don’t have any rules added, since I’m strictly using stories.yml to manage my dialogs. I’m using the exact same model-ci.tar.gz, and I’m passing it identical curl requests with identical payloads to continue the conversation. So why is one behavior so vastly different from the other and consistently so?

Does the memoization policy have any kind of dependency that I need to set up? Does the order of the policies in the config.yml file matter (since Rasa also has the default priorities set)? Are training examples and the config.yml are still sought after, once the model is served? What else should I look into to resolve this discrepancy?

Many thanks in advance for any suggestions you can share.

it could be your tracker store. Memoization looks back unti max history to find an exact match and it doesn’t then it fails. It is possible that your local setup is using an InMemoryTrackerStore that empties everytime you restart while in prod, you might be using redis? then the redisTrackerStore is more persistent of the previous conversations or sessions and maybe that leads to an incorrect prediction because the history is different, as here in the config you have not provided the max_history parameter it is taking the last 5 turns

1 Like

@souvikg10 Thank you for replying :slight_smile:. I love your question, because that’s one of the places that it didn’t strike me to look at.

Let me check with my engineering team on if we have a tracker store connected to the deployed bot. I believe we did not connect the tracker store.

In the meantime, couple of clarifications -

  1. I have been using new conversation IDs each time while running the test. So I generate a unique set of characters, use that as my conversation ID to run identical curl requests pinging the machines containing rasa (localhost and the production address) to simulate how it is run automatically. So in effect, each success on local and failure on the deployed bot are through new conversations.

  2. Another thing is that the failure happens on the first user utterance, so I’m not sure about the history affecting each of these new conversations. How I handle it is I first ping the endpoints /conversations/<conversation_id>/trigger_intent to initialize an intent with entities captured based on the data on the caller we already have from outside conversations (like which number the user called from, etc.), run ONE action to capture those as slots for the rest of that conversation. It is then set to action_listen. The actual user utterance is then passed to it through the endpoint /webhooks/rest/webhook. Until this action_listen, the performance of the bot is identical on both local and production and it fails in the way I had described for the first user utterance.

Let me know if you have any other ideas and questions. :slight_smile:

the slots you set from the data are they featurized slots?

if so, the data that you use to fill in the slots is the same between your server and your test machine.

Memoization won’t work at all if any features dont resemble the features in the stories(intent, slots, entities, prev_action etc)

you can actually start the debug and see if everything that is captured before the user utterance is exactly the same. you can even probe the tracker store to see this.

Hello @souvikg10, none of the slots are featurized. They’re all set to slot type: any and have influence_conversation: false.

Probing the tracker is how I got to know that it’s using Memoization in one situation where it picked the right action and it was using TED/Rule_policy for one with the wrong action. All the other slots up to the point of failure were correctly captured in both situations.