Cannot upload models to rasa-x due to timeout error

I have a model working on rasa locally; and have installed rasa-x via the install script. The docker containers are all running fine. It was all working fine yesterday. However today I find I can no longer upload a model as it says “something went wrong”. There is a timeout error in rasa-x logs. The model is only 25M. I thought maybe my git repo did not match the trained model so I retrained and still have the issue.

nginx_1 | 172.25.0.1 - - [29/Nov/2021:18:18:09 +0000] “GET /api/projects/default/entities HTTP/1.1” 200 584 “http://localhost/interactive” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36” rasa-x_1 | [2021-11-29 18:18:10 +0000] [23] [ERROR] Exception occurred while handling uri: ‘http://localhost/api/projects/default/models?limit=1&offset=0&tag=production’ rasa-x_1 | Traceback (most recent call last): rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/sanic/app.py”, line 973, in handle_request rasa-x_1 | response = await response rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/api/decorators.py”, line 217, in decorated_function rasa-x_1 | return await await_and_return_response(args, kwargs, request) rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/api/decorators.py”, line 147, in await_and_return_response rasa-x_1 | response = await response rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/api/blueprints/models.py”, line 45, in get_models rasa-x_1 | models, total_models = await _model_service(request).get_models( rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/services/model_service.py”, line 747, in get_models rasa-x_1 | minimum_compatible_version = await self.minimum_compatible_version() rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/services/model_service.py”, line 160, in minimum_compatible_version rasa-x_1 | info = await stack_service.version() rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/rasax/community/services/stack_service.py”, line 112, in version rasa-x_1 | response = await session.get( rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/aiohttp/client.py”, line 544, in _request rasa-x_1 | await resp.start(conn) rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/aiohttp/client_reqrep.py”, line 905, in start rasa-x_1 | self._continue = None rasa-x_1 | File “/usr/local/lib/python3.8/dist-packages/aiohttp/helpers.py”, line 656, in exit rasa-x_1 | raise asyncio.TimeoutError from None rasa-x_1 | asyncio.exceptions.TimeoutError rasa-x_1 |

@simonm3 can you delete the cache and cookies and try again or make the chrome a default browser? and even check the container logs please.

I cleared cache and cookies. Then switched to chrome (was using brave albeit with extensions off and shields down). In chrome a model loaded then could not load another one or reload the first one without the error. I can chat but it does not respond. The only thing in the logs is the timeout error above though it happens almost immediately.

@simonm3 you can upload the models? yes or no?

@simonm3 I guess you can chat but the response are taking time to display? Yes or no?

No. I uploaded one model then others failed with the timeout error. Could not reupload the original. There is no response to chat from the model I uploaded after several minutes.

Tomorrow I am going to upgrade to v3 so hopefully the problem will disappear. Still would be good to know what to do if it happens again.

I had this problem as well. It seems like a problem with Intent Insights and your server not having enough RAM/CPU. To disable it, add the following to the ~/.bashrc file:

alias api_token_json="curl -X POST -H \"Content-Type: application/json\"  -d '{\"username\":\"me\", \"password\":\"IP@eco@2021\"}' http://10.126.15.10/api/auth"
alias api_token_value="api_token_json | python3 -c 'import json,sys;obj=json.load(sys.stdin);print(obj[\"access_token\"])'"

disable-insights() 
{
	TOKEN=$(api_token_value) \
	curl --request PUT \
		 --url 'localhost/api/insights/config' \
		 --header "Authorization: Bearer "$1 \
		 --header 'Content-Type: application/json' \
		 --data '{"schedule": null, "cross_validation_folds": 4, "calculator_configuration": null}'
}

On the first line, replace IP@eco@2021 with your Rasa X password and 10.126.15.10 with the server’s IP.

Then, delete your namespace with kubectl delete namespaces rasa.

Open a new terminal, and do api_token_value. Then copy the output, and write disable-insights, paste the token, then enter.

Related thread: Disable intent insights: REGULAR SERVER CRASH

@nik202 Have not run insights and don’t think it is memory as it is a small model and memory is nowhere near used up and it happens immediately. This is what I get in rasa_x logs.

Also I have no rules, stories or domain despite being connected to a git repo that has these. maybe they only show up when a model is loaded?

I am still unable to load a model. I have tried a completely fresh machine on AWS with 8GB of RAM. Everything installs. All containers are running. I can log in. Load model and it says " Upload failed Something went wrong. Please try again." Mostly it disconnects my ssh terminal. When I log back in it says model has loaded. Then I look at models and there is nothing there.

Sometimes I can get a model loaded.Then if I load a second one I get the failure and all the models disappear.

The memory seems to peak at 5GB. The processor shoots up to 100% before killing my terminals. rasa_x container shows 204 error for production requests. production and worker no errors.

@nik202

Tried it now on t3.xlarge which has 16GB; 4 processors; AVX available. Removed my docker-compose.override.yml so it is just the standard install. Still the same issue.

That would be great thanks. I suspect it may be a bug as I am starting from a completely new machine; running only the supplied docker compose script; then uploading the model from my browser.

Will do some more experiments today and let you know if make progress. maybe will try the kubernetes install instead though not familiar with kubernetes @nik202

I think I just resolved it. Just retesting.

@simonm3 Great and please share the solution with us.

I am not sure exactly which of these were causing the initial problem but…

  • using too small a machine. I have 8GB RAM but much of that is used up by windows and WSL. I had a lot of diskspace free but docker and wsl ate it up much faster than expected.

  • version incompatibilities. version 3 of rasa is not compatible with rasa x; other version conflicts between model trained on laptop and rasax server. the 0.42.6 script installs rasa 2.8.11 but if you pip install 0.42.6 locally it says you must have >2.8.14. I pinned everything in the end.

  • finally an embarassing error :slightly_frowning_face: not the initial cause but prolonged the pain. when I fixed the machine size and versioning I created a copy of my model to test uploading. Windows copy/paste gave me 20211201-151135.tar - Copy.gz. I did not notice the filename error and rasa does not check the file is valid but just crashes.

@simonm3 please mark you solution as solution for other readers.

You can always used prune command and Chris and I both thinking that can be the issue of this error, I am glad you issue is solved. Good Luck!

@nik202 After more experimentation I can make this more explicit:

  • avx is not mentioned in the docs as a requirement. however if you try to load a model in a machine without avx then it crashes the server.
  • 4GB RAM is suggested in the docs. On a new AWS server 2gb is used up by the containers then if you upload a 24mb model the ram usage is above 5GB. maybe 4gb is enough for a very small model but you may need more.
  • rasa server version must be greater than the version used to train the model . the version is stored in the model file but rasax does not check. it just crashes the server if you attempt to upload a model from a newer version.
  • 0.42.6 docker compose script installs rasa 2.8.11 on the server; pip install 0.42.6 installs 2.8.15 locally. This is incompatible so causes a server crash. So it is essential to pin the rasa version locally and if there are multiple developers then each needs to use the same version.
  • there is no version of rasax that works with rasa v3

@simonm3 It make sense but few points not clear to me.

It should be the case, if you have previously trained the model on suppose rasa 2.8.1 and now you updated the rasa to 2.8.11 so, we need to train again and upload the latest train model.

Did you tried install rasa x in local, did you succeed?

The current update is only compatible with rasa open source not with Rasa x and developers working on that.

One would not necessarily expect to have to retrain a model each time a minor release comes out. However if it is incompatible then rasax should check the version rather than just crashing.

yes I had rasax working locally with 0.42.6. the issue was that the server version is incompatible with the local install even though they have the same version number 0.42.6. So I trained my model locally on 0.42.6 version which uses rasa 2.8.15 then uploaded to server 0.42.6 which is 2.8.11.

[Note: This discussion is not related to the main topic ]

Interesting, can you share the solution for the rasa x install on local machine ? with latest version when you get time.

Local install works fine with the instructions in the docs:

 pip install rasa-x --extra-index-url https://pypi.rasa.com/simple

To downgrade both:

pip uninstall rasa-x
pip install rasa==2.8.11
pip install rasa-x==0.42.5 --extra-index-url https://pypi.rasa.com/simple

Then rasa x runs without error. have not actually tested in the browser as as I forgot how to configure the ports from wsl and I don’t need it now I have the server running.