Version Rasa - 2.8.15 Rasa-sdk - 2.8.3 Rasa-X - latest
Please provide more info:
- Does training work if you do
rasa train
? - What’s your method of deployment?
@ChrisRahme yes it’s working with rasa train.
I have deployed rasa-x to the AWS EKS Cluster via the Helm chart.
I also connected this to git repository but training data was not visible in rasa-x. So I tried to upload it from local machine but I am not able to upload or train the model.
Okay, please check the logs of the rasa-worker
pod after training using kubectl logs <pod_name>
.
@ChrisRahme My training data is synced with git repository and it is successfully trained on local machine but training is getting failed in rasa-x. Logs of rasa-worker pod is empty.
PS C:\WINDOWS\system32> kubectl --namespace rasahr logs rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm
PS C:\WINDOWS\system32>
@ChrisRahme should I check logs of any other pods ?
PS C:\WINDOWS\system32> kubectl --namespace rasahr get pods
NAME READY STATUS RESTARTS AGE
rasa-x-1641746175-app-7cc446595b-dwbfp 1/1 Running 0 17h
rasa-x-1641746175-db-migration-service-0 1/1 Running 0 17h
rasa-x-1641746175-duckling-67b647db54-wxn66 1/1 Running 0 17h
rasa-x-1641746175-event-service-9844b68db-vvjv8 1/1 Running 0 17h
rasa-x-1641746175-nginx-ccc677589-mrhhh 1/1 Running 0 17h
rasa-x-1641746175-postgresql-0 1/1 Running 0 17h
rasa-x-1641746175-rabbit-0 1/1 Running 0 17h
rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm 1/1 Running 0 17h
rasa-x-1641746175-rasa-x-5c85fbbd49-nwxvw 1/1 Running 0 17h
rasa-x-1641746175-redis-master-0 1/1 Running 0 17h
@abhishekrathi please check your internet connection, if its speed is low then training can be failed or if it’s not stable then also. I have seen all the conversation between you and Chris and for me everything is seems fines as all your pods are running fine and you also strongly believe that everything as per the documentation. Good Luck!
Thanks for the quick response but I have tried with a different connection and the internet speed is also good. Everything went perfect till connection to git repository but at the end stuck at model training. Not able to find the issue
@abhishekrathi check your internet speed or are you using VM ?
Internet download speed is 109 mbps and upload speed is 36 mbps.
Not using VM. I deployed rasa-x to the AWS with helm chart following the below process
- https://rasa.com/docs/rasa-x/installation-and-setup/install/aws-installation/requirements (AWS Requirements)
- https://rasa.com/docs/rasa-x/installation-and-setup/install/aws-installation/installation (AWS Installation)
- Installation (Helm Chart Installation)
“Training failed” message appears immediately on click of train the model. Not even processing for 1 second.
List of services running on amazonaws
PS C:\WINDOWS\system32> kubectl --namespace rasahr get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rasa-x-1641746175-app ClusterIP 10.100.182.236 <none> 5055/TCP,80/TCP 21h
rasa-x-1641746175-db-migration-service-headless ClusterIP None <none> 8000/TCP 21h
rasa-x-1641746175-duckling ClusterIP 10.100.84.56 <none> 8000/TCP 21h
rasa-x-1641746175-nginx LoadBalancer 10.100.15.95 a7a102abe80f24583b6247273bedc2a4-1408708577.ap-south-1.elb.amazonaws.com 8000:32128/TCP 21h
rasa-x-1641746175-postgresql ClusterIP 10.100.231.180 <none> 5432/TCP 21h
rasa-x-1641746175-postgresql-headless ClusterIP None <none> 5432/TCP 21h
rasa-x-1641746175-rabbit ClusterIP 10.100.136.197 <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 21h
rasa-x-1641746175-rabbit-headless ClusterIP None <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 21h
rasa-x-1641746175-rasa-worker ClusterIP 10.100.113.220 <none> 5005/TCP 21h
rasa-x-1641746175-rasa-x ClusterIP 10.100.235.253 <none> 5002/TCP 21h
rasa-x-1641746175-redis-headless ClusterIP None <none> 6379/TCP 21h
rasa-x-1641746175-redis-master ClusterIP 10.100.179.33 <none> 6379/TCP
For now just the worker And do it right after training fails.
If nothing shows try adding --previous
to the command.
PS C:\WINDOWS\system32> kubectl --namespace rasahr logs rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm --previous
Error from server (BadRequest): previous terminated container "rasa-x" in pod "rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm" not found
with describe pods
PS C:\WINDOWS\system32> kubectl --namespace rasahr describe pods rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm
Name: rasa-x-1641746175-rasa-worker-6dcc468fd-f5kqm
Namespace: rasahr
Priority: 0
Node: ip-192-168-19-85.ap-south-1.compute.internal/192.168.19.85
Start Time: Sun, 09 Jan 2022 22:06:17 +0530
Labels: app.kubernetes.io/component=rasa-worker
app.kubernetes.io/instance=rasa-x-1641746175
app.kubernetes.io/name=rasa-x
pod-template-hash=6dcc468fd
Annotations: checksum/rasa-config: 4d99db7beb15d9c7065c913c33e6d17c813cd846c037ba4e710c4a145d54fb48
checksum/rasa-secret: 52dda2e77938832263b5699996ee0f7054ed130539453b9e7bbde03272b6411f
kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.31.75
IPs:
IP: 192.168.31.75
Controlled By: ReplicaSet/rasa-x-1641746175-rasa-worker-6dcc468fd
Init Containers:
init-db:
Container ID: docker://0b19460978031276d5af9553489d91a1c6762f5adca4c0f526e36d331606dda8
Image: rasa/rasa:2.8.15-full
Image ID: docker-pullable://rasa/rasa@sha256:c6cdf4218b1017abbfcca70df9c842602e2398a3c1191962a7c7eb3d4e6e974b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
until [[ "$(curl -s http://rasa-x-1641746175-db-migration-service-headless:8000 | grep -c completed)" == "1" ]]; do STATUS=$(curl -s http://rasa-x-1641746175-db-migration-service-headless:8000); if [[ -n "$STATUS" ]];then echo $STATUS; fi; sleep 5; done;
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 09 Jan 2022 22:07:38 +0530
Finished: Sun, 09 Jan 2022 22:13:46 +0530
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Containers:
rasa-x:
Container ID: docker://b75e06c42f8a3dfcb9334d30ac6c06f7d64e9f9766496278a23659c5389ea450
Image: rasa/rasa:2.8.15-full
Image ID: docker-pullable://rasa/rasa@sha256:c6cdf4218b1017abbfcca70df9c842602e2398a3c1191962a7c7eb3d4e6e974b
Port: 5005/TCP
Host Port: 0/TCP
Args:
x
--no-prompt
--production
--config-endpoint
http://rasa-x-1641746175-rasa-x.rasahr.svc:5002/api/config?token=$(RASA_X_TOKEN)
--port
5005
--jwt-method
HS256
--jwt-secret
$(JWT_SECRET)
--auth-token
$(RASA_TOKEN)
--cors
*
State: Running
Started: Sun, 09 Jan 2022 22:13:49 +0530
Ready: True
Restart Count: 0
Liveness: http-get http://:http/ delay=10s timeout=1s period=10s #success=1 #failure=10
Environment:
MPLCONFIGDIR: /tmp/.matplotlib
DB_PASSWORD: <set to the key 'postgresql-password' in secret 'rasa-x-1641746175-postgresql'> Optional: false
DB_DATABASE: worker_tracker
RASA_X_TOKEN: <set to the key 'rasaXToken' in secret 'rasa-x-1641746175-rasa'> Optional: false
RASA_TOKEN: <set to the key 'rasaToken' in secret 'rasa-x-1641746175-rasa'> Optional: false
JWT_SECRET: <set to the key 'jwtSecret' in secret 'rasa-x-1641746175-rasa'> Optional: false
REDIS_PASSWORD: <set to the key 'redis-password' in secret 'rasa-x-1641746175-redis'> Optional: false
RABBITMQ_PASSWORD: <set to the key 'rabbitmq-password' in secret 'rasa-x-1641746175-rabbit'> Optional: false
RABBITMQ_QUEUE: rasa_production_events
RASA_ENVIRONMENT: worker
RASA_MODEL_SERVER: http://rasa-x-1641746175-rasa-x.rasahr.svc:5002/api/projects/default/models/tags/production
RASA_DUCKLING_HTTP_URL: http://rasa-x-1641746175-duckling.rasahr.svc:8000
Mounts:
/.config from config-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
This is possibly related to the issue I still have here
Hello @abhishekrathi were you able to solve the issue? If yes, how did you do it, I’ve been stuck since a week now on this. I’m unable to train/upload a model as well. I’ve deployed using Rasa-X helm chart installation. Rasa-X version - 1.0.1, Rasa version - 2.8.15
This is the solution to this bug Rasa-x 1.0.1 compatibility issue with Rasa 2.8.2 & 2.8.17 models - #25 by virtualroot
How to train model very fast as I am using GPU(with 4 ) but its very slow.What I have to do ?? @ChrisRahme