Segmentation fault (core dumped) on AWS GPU while training

I’m trying to train intent classification model using Tensorflow embedding pipeline. We have two datasets with 2500 and 17000 utterances respectively.

With 2500 utterances dataset, I’m able to train the model, but for 17000 utterances dataset model, i’m facing Segmentation fault error as follows:

name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0Segmentation fault totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-22 11:22:03.435794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2019-01-22 11:22:12.477504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-01-22 11:22:12.477554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2019-01-22 11:22:12.477564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2019-01-22 11:22:12.481233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10758 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) Epochs: 0%| | 0/300 [00:00<?, ?it/s]2019-01-22 11:23:03.791310: W tensorflow/core/framework/allocator.cc:108] Allocation of 17455684000 exceeds 10% of system memory. train_models.sh: line 5: 8420 Segmentation fault (core dumped) python -m rasa_nlu.train -o ${model_dir} -d ${tr_file} -c ${config_file} --project nlu --fixed_model_name model_1

But I observed while training:

2500 utterances dataset using 3GB RAM (CPU) Memory but same dataset using 10.5 GB GPU memory out of 12GB.

17000 utterances dataset using 6GB RAM (CPU) Memory but same dataset giving Segmentation fault error.

I think its too odd because it is using three time GPU Memory than CPU. Is there any reason for this odd behaviour?

Is there any solution to avoid Segmentation fault error on big datasets when GPU Instances.