RASA Entity extraction

(Prashant Gohel) #1

Rasa NLU version : 0.15.0 Rasa core version : 0.14.3

Python version : 3.7

Operating system (windows, osx, …): windows

Issue : I want to extract entities like Source and Destination from string like “Show me trains from Rajkot to Ahmedabad on 31st may” . Kindly help me with CONFIG.YML file. I am also attaching my training data.

I am annotating after “from” as source and after “to” as destination. digits similar to 3rd ,4th and 13th…etc as “date” and “may”, “june”…etc as month.


Kindly help me with proper config.yml file

Content of configuration file (config.yml) :

language: “en” pipeline: - name: “SpacyNLP” - name: “SpacyTokenizer” - name: “SpacyFeaturizer” - name: “RegexFeaturizer” - name: “CRFEntityExtractor” - name: “EntitySynonymMapper” - name: “SklearnIntentClassifier”

Content of domain file (domain.yml) (if used & relevant):

There is no problem with spacy english model. There is no problem with even cities detection using spacy.

Here I want to extract source and destination from sentence like.

Show me bus from Delhi to Mumbai.

so Delhi as source city Mumbai as destination city.

sample “traindata” as below

{ “text”: “I’m looking for trains from mumbai to Delhi on friday”,

    "entities": [
        "start": 28,
        "end": 35,
        "value": "mumbai ",
        "entity": "source"
        "start": 38,
        "end": 44,
        "value": "Delhi ",
        "entity": "destination"
        "start": 47,
        "end": 53,
        "value": "friday",
        "entity": "day"
(Ella Rohm-Ensing) #2

I don’t think the issue is in your config (although it would be much easier to pick up entities like “friday” and “31st May” if you add the DucklingHTTPExtractor, and wouldn’t require you to annotate any of those entities). Can you provide an example of how it is not working? E.g. we can help you better if we know whether it:

  1. Isn’t picking up the entities
  2. Is picking up the entities but with the wrong labels
  3. Some other issue

Also just in general, you want to make sure that your entity tags are located directly around the tokens, as I can see spaces after your cities in the training data. It’s unlikely but possible that this is affecting your entity extraction as well.

(Prashant Gohel) #3

If entity is like “show me train from delhi to mumbai on 3rd jan” .

It will not detect delhi as “source” city and mumbai as destination.

for your ref. I am attaching here traning data.

data.txt (22.5 KB)

(Prashant Gohel) #4

Continuing the discussion from RASA Entity extraction:


It is not picking up entities.

Some time it picks only destination city.

(Prashant Gohel) #5


This is one sample output of model.

‘text’: ‘show me trains from Ahmedabad to Rajkot’

{‘intent’: {‘name’: ‘trains_inform’, ‘confidence’: 0.9510094166013943}, ‘entities’: [{‘start’: 33, ‘end’: 39, ‘value’: ‘rajkot’, ‘entity’: ‘destination’, ‘confidence’: 0.9486676451860248, ‘extractor’: ‘CRFEntityExtractor’}], ‘intent_ranking’: [{‘name’: ‘trains_inform’, ‘confidence’: 0.9510094166013943}, {‘name’: ‘bus_inform’, ‘confidence’: 0.020814976173191724}, {‘name’: ‘greet’, ‘confidence’: 0.014977496249056891}, {‘name’: ‘goodbye’, ‘confidence’: 0.013198110976357496}], ‘text’: ‘show me trains from Ahmedabad to Rajkot’}

As you can see It has extracted Rajkot as a destination

but It fails to extract Ahmedabad as a source.


(Ella Rohm-Ensing) #6

Hm, you only have 30 examples for source, maybe some more data is the key. I think your use case would be a good one for using Chatito to generate more data.

(Prashant Gohel) #7

@erohmensing How to solve this problem. Please tell…