I’m posting this mostly just in case someone else has had a similar issue with this.
I’ve deployed my model to a multi-container docker instance using Elastic Beanstalk. Suffice it to say the deployments were working up until I went to Rasa OSS 1.9.4 to 1.10.2. I’ve tested the model works fine on local and also even runs with eb local run
as a sanity check, but when running on the t2.micro instance, it seems to get stuck in the final portion of loading the model and seems to get stuck in a loop (100% CPU usage and no data from the instance). I’ve found the line before the hangup in Rasa’s source, but I’m at a loss as to how I could fix this issue since I can’t recreate it on local.
logs that appear when attempting to start the model in the rasa container:
2020-06-16 22:07:42 DEBUG rasa.cli.utils - Parameter 'credentials' not set. Using default location 'credentials.yml' instead.
2020-06-16 22:08:06 DEBUG rasa.model - Extracted model to '/tmp/tmpcd1pb9bf'.
2020-06-16 22:08:15 DEBUG sanic.root - CORS: Configuring CORS with resources: {'/*': {'origins': [''], 'methods': 'DELETE, GET, HEAD, OPTIONS, PATCH, POST, PUT', 'allow_headers': ['.*'], 'expose_headers': None, 'supports_credentials': True, 'max_age': None, 'send_wildcard': False, 'automatic_options': True, 'vary_header': True, 'resources': {'/*': {'origins': ''}}, 'intercept_exceptions': True, 'always_send': True}}
2020-06-16 22:08:15 DEBUG rasa.core.utils - Available web server routes:
/webhooks/rest GET custom_webhook_RestInput.health
/webhooks/rest/webhook POST custom_webhook_RestInput.receive
/ GET hello
2020-06-16 22:08:15 INFO root - Starting Rasa server on http://localhost:5005
2020-06-16 22:08:15 DEBUG rasa.core.utils - Using the default number of Sanic workers (1).
2020-06-16 22:08:15 INFO root - Enabling coroutine debugging. Loop id 94263349628512.
2020-06-16 22:08:37 DEBUG rasa.model - Extracted model to '/tmp/tmp0iq57lcd'.
2020-06-16 22:08:48 DEBUG rasa.utils.tensorflow.models - Loading the model ...
2020-06-16 22:08:48.429880: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2020-06-16 22:09:12 DEBUG rasa.utils.tensorflow.models - Finished loading the model.
2020-06-16 22:09:12 DEBUG rasa.utils.tensorflow.models - Building tensorflow prediction graph...
2020-06-16 22:10:47 DEBUG rasa.utils.tensorflow.models - Finished building tensorflow prediction graph.
2020-06-16 22:10:47 DEBUG rasa.utils.tensorflow.models - Loading the model ...
Anyone have any thoughts?