Training NER model using TensorFlow pipeline

abhijeetdas057 · January 29, 2019, 11:52am

I am trying to train a NER model using RASA which uses Tensorflow pipeline. The usecase which i am working on is basically extracting key entities from Driving License of US. For e.g if a user uploads a DL image then I get the OCR text from the image. Then I push this raw text to my NER model which will return me only the key entities like DLNumber, DOB, DOE, Address etc out of it.

So if some one can help me in suggesting how to make my NER model more robust. So that it returns all the entities for any DL’s of US. As I am facing issues where it returns duplicate entity names for wrong values. For e.g. dl_issue is repeated twice.

{ “intent”: { “confidence”: 0.9285378456115723, “name”: “DL” }, “project”: “default”, “entities”: [ { “start”: 48, “confidence”: 0.9092500118874686, “entity”: “dl_number”, “extractor”: “ner_crf”, “end”: 57, “value”: “999999999” }, { “start”: 63, “confidence”: 0.9973167577546155, “entity”: “dl_dob”, “extractor”: “ner_crf”, “end”: 73, “value”: “04-27-1970” }, { “start”: 146, “confidence”: 0.8391978961972846, “entity”: “dl_postal_code”, “extractor”: “ner_crf”, “end”: 151, “value”: “72203” }, { “start”: 168, “confidence”: 0.4135534135801666, “entity”: “dl_issue”, “extractor”: “ner_crf”, “end”: 178, “value”: “04-27-2010” }, { “start”: 179, “confidence”: 0.5787973704711238, “entity”: “dl_issue”, “extractor”: “ner_crf”, “end”: 189, “value”: “04-27-2014” } ], “model”: “model_20190129-114027”, “intent_ranking”: [ { “confidence”: 0.9285378456115723, “name”: “DL” }, { “confidence”: 0.0, “name”: “AOI” }, { “confidence”: 0.0, “name”: “W9” }, { “confidence”: 0.0, “name”: “Passport” } ], “text”: “ARKANSAS Natural State DRIVER’S LICENSE DL DLN: 999999999 DOB: 04-27-1970 CLASS: D dushin glamcole Susan Sample SAMPLE 123 Easy S: Little Rock AR 72203 issued Expirese 04-27-2010 04-27-2014 Height Eyes: 5-8 BR Endors Restri B ORGAN DONOR” }

Gjouini · September 15, 2020, 9:30am

Hello, I’m working on same project as you. What did you use for ocr extraction ?

Topic		Replies	Views
Suggestion for pipeline Rasa Open Source	1	557	April 9, 2019
Multiple NER Rasa Open Source	10	1323	May 24, 2019
Entity Extraction ner_crf Rasa Open Source	1	819	August 13, 2019
Universal Sentence Encoder Rasa Open Source	3	1519	October 15, 2019
Rasa_NLU ner_crf classification issue Rasa Open Source	1	501	June 12, 2019

Training NER model using TensorFlow pipeline

Related topics