Train the RASA NLU to extract the entities and fill slot based on regular expressions

In my data.json , I have given sample IP and if i give some other ip address , still it is taking the sample IP given in the data.json. How can i train the model to extract the correct entity .

{
         "text": "1.2.3.4",
         "intent": "get_ip_reputation",
          "entities": [
           {
             "start": 0,
              "end": 7,
             "value": "1.2.3.4",
             "entity": "ipAddr"
           }
         ]
       },

Hi @hemamalini, what does your training data look like? Can you also provide an example like this one, but where it extracts the wrong entity value?

@erohmensing i tried by giving different ip addresses . It is taking by default 1.2.3.4 even i type some other IP. Sample story below. How can i use regular expressions to fill the slot. Whenever i type some other ip address , ipAddr slot is getting the value as 1.2.3.4

story_001

  • greeting
    • utter_greet
  • get_ip_reputation
    • utter_ask_ip_addr
  • get_ip_reputation{“ipAddr”: “1.2.3.4”}
    • slot{“ipAddr”: “1.2.3.4”}
    • get_ipaddr_reputation
    • utter_reply
    • utter_good_bye

Can you try setting the slot type to unfeaturized if it isn’t already so?

Seeing your NLU training data for intent: get_ip_reputation would be helpful too.

Thanks … Do i need to change in the stories as well

Actually I apologize, as long as the slot type was originally text and not something like categorical, keep it the way it was instead of switching it to unfeaturized. Can you show me your intent data for the get_ip_reputation?

Hi , this is the same data for getting IP. I need to get the IP value dynamically

{
        "text": "what is the reputation of 10.1.1.1",
        "intent": "get_ip_reputation",
        "entities": [
          {
            "start": 26,
            "end": 34,
            "value": "10.1.1.1",
            "entity": "ipAddr"
          }
        ]
      },

Yes, but how many examples of this entity do you have? In order for it to generalize, you’ll want to have at least 20 examples. With IP addresses, however, your best bet is probably a regex entity extractor, as you thought in your post title. You can use it by adding the regex_features to your training data as described here and adding the intent_entity_featurizer_regex (RegexFeaturizer if on NLU 0.15.0) to your pipeline.

i added nearly 10 examples.Do we have to modify that in stories.md file. Any sample on stories would be highly appreciated

nope, shouldn’t need to be in your stories! Did you try out the regex featurizer?

thanks it worked .I figured out the issue. Regex was the issue!!Thanks a ton