Custom Entity and Relation Extraction

rasa-nlu

(sourav) #1

Hello everyone, I have a specific requirement as below:

I have a doc which has text - “The named insurer is ABC and his Date of Birth is 1/01/2001.”

For the above text I have the training_data.json has

{ “rasa_nlu_data”: { “common_examples”: [ { “text”: "The named insurer is ABC and his Date of Birth is 1/01/2001. ", “intent”: “WhoIsPolicyHolder”, “entities”: [ { “start”: , “end”: , “value”: “ABC”, “entity”: “NAMED INSURED” } ] } { “text”: "The named insurer is ABC and his Date of Birth is 1/01/2001. ", “intent”: “WhatIsDOBofInsurer”, “entities”: [ { “start”: , “end”: , “value”: “1/01/2001”, “entity”: “DOB” } ] }

I have created a model with 10 intents for each WhoIsPolicyHolder and WhatIsDOBofInsurer. This is what I got as output:

{‘intent’: {‘name’: ‘WhatIsDOBOfPolicyHolder’, ‘confidence’: 0.5965509303964964}, ‘entities’: [], ‘intent_ranking’: [{‘name’: ‘WhatIsDOBOfPolicyHolder’, ‘confidence’: 0.5965509303964964}, {‘name’: ‘WhoIsPolicyHolder’, ‘confidence’: 0.4034490696035035}], ‘text’: ‘The named insurer is XYZ and his Date of Birth is 2/02/2002.’}

Can we get the output with WhoIsPolicyHolder, WhatIsDOBofInsurer relations(the entity and value of the test data) instead of only the intent with cofidence score?


(Deepak Shetty) #2

You seem to have multiple intents for the same text , so to pick that up you need to do https://blog.rasa.com/how-to-handle-multiple-intents-per-input-using-rasa-nlu-tensorflow-pipeline/

Your second problem is that your entities arent being picked up - which might indicate an issue with your data. In the response you can see your entities array is empty (also be careful with dates!)


(sourav) #3

Hi Deepak, You are right, my entities are not picked in the output text. Can you please help what i can do to fix that.?


(Deepak Shetty) #4

Its hard to say without your input data (and your pipeline etc). Its usually incorrect or too little training data . if you can share your data and pipeline file , someone might take a look. If you cant then reduce the problem (to say one entity and one intent and just train that and see.


(sourav) #5

Hi Deepak,

My input data is --some text --The named insurer is ABC and his Date of Birth is 1/01/2001 --some text–.(Repeated multiple times with different name and DOB).

My pipeline is :

language: “en” pipeline:

  • name: “nlp_spacy” model: “en”
  • name: “tokenizer_spacy”
  • name: “ner_crf”
  • name: “intent_featurizer_spacy”
  • name: “intent_classifier_sklearn”

(Deepak Shetty) #6

I tried with spacy_sklearn (your pipeline has an incorrect order and is missing some stuff if you were intending to use that - see https://rasa.com/docs/nlu/pipeline/#section-pipeline And it worked for me (single intent though - as before if you want multiple intent you need tensorflow_embedding)

language: en
pipeline: spacy_sklearn

Here is the chatito file I used to generate data for the NLU

%[greet]
    Hi
    Hello
    Howdy

%[WhoIsPolicyHolder]('training': '100')
    The named insurer is @[NAMEDINSURED] and his Date of Birth is @[DOB]

@[NAMEDINSURED]
    ABC
    DEF
    GHI
    TEXT
    abdfgh
    bghtery
    qwerty
    singte
    AsdF
    BlahBlah

@[DOB]
    01/01/2001
    04/12/2013
    03/11/1987
    02/12/1987
    01/18/1964
    11/23/1945
    12/12/2012
    07/11/1999
    03/14/2017
    07/07/2007

(sourav) #7

Thanks Deepak, Let me try with spacy_sklearn and see the output.

Moreover in my config.yml if I give

language: “en”

pipeline:

  • name: “tensorflow_embedding”

Throws me an error with Exception: Failed to find component class for ‘tensorflow_embedding’. Unknown component name. Check your configured pipeline and make sure the mentioned component is not misspelled. If you are creating your own component, make sure it is either listed as part of the component_classes in rasa_nlu.registry.py or is a proper name of a class in a module.

Any idea how to resolve this?


(sourav) #8

Just changed config.yml to

language: “en”

pipeline: “tensorflow_embedding”

its working then.