Tensorflow driver : UNKNOWN ERROR (303)

I’m having this issue as well, both on (RasaX: 0.31.5, rasa: 1.10.10-full) and (RasaX: 0.30.1, rasa:1.10.8). This only happens when I try to train from the RasaX interface deployed on kubernetes(I have tried both on DigitalOcean and on Google Cloud). It did not happen to me when I was developing on my local machine (and trained via CLI). I have tried deploying using both the quick install and the helm install.

It also aborts the training sadly.

I understand if you consider this is as not Rasa’s fault or problem, if you think it’s solely a hardware problem, and that’s fine. I’m sure everyone who supports Rasa and wants to use it would like some form of support though, perhaps an update/closer look at the minimum OS/hardware requirements in the docs? A confirmation that a fresh attempt to host Rasa X on GCP/DigitalOcean/etc. is atm working and it’s mostly our(user’s) fault? Sorry if that sounded rude.

EDIT: I finally got it to work, and I’m here to lay some notes (what worked for me) for everyone else reading this in the future.

  1. Turn on debugMode=“true” on your helm chart deployment(or any other deployment, they have slightly diff methods), it helps you debug your own problems when you see the pod logs
  2. If you are using HFTransformers, you might need to solve this Training fails when using HFTransformersNLP Rasa X
  3. If you think your model is quite big (I was using quite a big BERT model, based on the download pre-training 500MB?), increase your RAM from the minimum of 4 to 8gb. <-- this was the last piece of my puzzle.

Increasing the RAM did not make the error disappear, but it did make the training continue (the others were right, it actually does not abort your training)