Rasa Run Behaving Differently Than Rasa Shell

First, I ran rasa train to obtain my model: models/20200804-025637.tar.gz.

When I run rasa shell -m models/20200804-025637.tar.gz or rasa x, my bot performance is as expected. However, if I deploy Rasa to a server via rasa run -m models/20200804-025637.tar.gz --enable-api, the performance is not as expected — it’s much worse than its interaction with shell. Here is my code to interact (mimicking the shell interaction) via Rasa’s HTTP API:

conversation_id = input("Conversation ID: ")
url = f"my-server:5005/conversations/{conversation_id}/"
messages_url = url + "messages"
predict_url = url + "predict"

while True:
    message = input("Input: ") 
    payload = {
                "text": message,
                "sender": "user",
    messages_response = requests.post(messages_url, json=payload)
    if messages_response.status_code != 200:
        raise Exception('Networking error.')

    predict_response = requests.post(predict_url)
    predict_data = json.loads(predict_response.text)

    print (predict_data["scores"][0]["action"])

It seems like the NLU performance is as expected; however, the Core and its predicted next action does not seem to follow. What am I missing?

Bumping this… I am seriously stuck on this :frowning:

Do you happen to have a model server already running or an earlier model in a cloud server? By default the run command will go through these in order, which I don’t believe is the case for rasa shell:

  1. Fetch the model from a server (see Fetching Models from a Server), or
  2. Fetch the model from a remote storage (see Cloud Storage).
  3. Load the model specified via -m from your local storage system

(Edit to clarify: you’ll only get to option 3, even if you specified a model with the -m flag, if both 1 & 2 don’t exist)