Training data and slot entities questions

e8180kimo · October 1, 2018, 10:03pm

I want to get a slot, ‘slot_topic’. I used customized action to get the slot. The bot is for helping user to write an essay. That being said, there are many topics. Do I have to put as many as possible topics and sentence structures in my training data? If so, this seems inefficient since there are different sentence structures. For example:

- [peer review](slot_topic)
- [motivation](slot_topic)
- [bullying](slot_topic)
- My topic is [bullying](slot_topic)
- the topic is [bullying](slot_topic)
- [bullying](slot_topic) is my topic 
- topic is [bullying](slot_topic)
- My topic is [peer review](slot_topic)
...
... and so on.

Is there a way that rasa can get whatever the user input as a slot so I don’t need to use so much training data?

Part of my actions.py:

class ActionTopic(FormAction):

    RANDOMIZE = False
    

    @staticmethod
    def required_fields():
        return [
            EntityFormField("slot_topic", "slot_topic")
            
        ]

    def name(self):
        return 'action_topic'

    def submit(self, dispatcher, tracker, domain):
        
        Bookinginfo = BookingInfo()
        booking = Bookinginfo.save(tracker.get_slot("slot_topic"))

akelad · October 2, 2018, 9:07am

you can use tracker.latest_message.get("text") to get the full user utterance and set that as a slot

e8180kimo · October 2, 2018, 6:32pm

What if it is a different sentence structure?

For example:

My topic is ...
Topic is ...
I want to write ...
I like to write ...
I like the topic ...

e8180kimo · October 3, 2018, 2:56am

I followed your suggestion. I revised my actions.py:

logger = logging.getLogger(__name__)

class BookingInfo(object):
    def save(self, slot_topic):
        logger.debug("-------slot_topic---------")
        logger.debug(format(slot_topic))
        logger.debug("-------slot_topic---------")
        return
class ActionTopic(FormAction):

    RANDOMIZE = False
    @staticmethod
    def required_fields():
        return [
            EntityFormField("slot_topic", "slot_topic")
        ]

    def name(self):
        return 'action_topic'

    def submit(self, dispatcher, tracker, domain):
    
        Bookinginfo = BookingInfo()
        booking = Bookinginfo.save(tracker.latest_message.get("text"))
        return[]

However, I type “cat” when asking the topic, it does not go into my slot though. (cat is not in my nlu_data.md) On the action server side, it shows:

DEBUG:rasa_core_sdk.executor:Received request to run 'action_topic'
DEBUG:rasa_core_sdk.executor:Successfully ran 'action_topic'

On the bot terminal side, it shows:

What is your topic?
2018-10-02 19:44:40 DEBUG    rasa_core.policies.ensemble  - Predicted next action using policy_1_MemoizationPolicy
2018-10-02 19:44:40 DEBUG    rasa_core.processor  - Predicted next action 'action_listen' with prob 1.00.
2018-10-02 19:44:40 DEBUG    rasa_core.processor  - Action 'action_listen' ended with events '[]'
127.0.0.1 - - [2018-10-02 19:44:40] "POST /webhooks/rest/webhook?stream=true&token= HTTP/1.1" 200 187 0.024927
cat
2018-10-02 19:44:42 DEBUG    rasa_core.tracker_store  - Recreating tracker for id 'default'
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Received user message 'cat' with intent '{'name': None, 'confidence': 0.0}' and entities '[]'
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Logged UserUtterance - tracker now has 14 events
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Current slot values: 
	requested_slot: None
	slot_ThesisUpdate: None
	slot_thesis: None
	slot_topic: None
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - Current tracker state [{'prev_action_listen': 1.0, 'intent_confirm_completion': 1.0}, {'intent_confirm_completion': 1.0, 'prev_utter_ask_topic': 1.0}, {'prev_action_listen': 1.0}]
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - There is no memorised next action
2018-10-02 19:44:42 DEBUG    rasa_core.policies.ensemble  - Predicted next action using policy_2_KerasPolicy
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Predicted next action 'action_topic' with prob 0.89.
2018-10-02 19:44:42 DEBUG    rasa_core.actions.action  - Calling action endpoint to run action 'action_topic'.
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Action 'action_topic' ended with events '['SlotSet(key: requested_slot, value: slot_topic)']'
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - Current tracker state [{'intent_confirm_completion': 1.0, 'prev_utter_ask_topic': 1.0}, {'prev_action_listen': 1.0}, {'prev_action_topic': 1.0}]
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - There is no memorised next action
2018-10-02 19:44:42 DEBUG    rasa_core.policies.ensemble  - Predicted next action using policy_2_KerasPolicy
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Predicted next action 'utter_need_other_help' with prob 0.50.
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Action 'utter_need_other_help' ended with events '[]'
2018-10-02 19:44:42 DEBUG    rasa_core.processor  - Bot utterance 'BotUttered(text: Do you need any other help?, data: {
  "elements": null,
  "buttons": [
    {
      "payload": "Yes",
      "title": "Yes"
    },
    {
      "payload": "No",
      "title": "No"
    }
  ],
  "attachment": null
})'
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - Current tracker state [{'prev_action_listen': 1.0}, {'prev_action_topic': 1.0}, {'prev_utter_need_other_help': 1.0}]
2018-10-02 19:44:42 DEBUG    rasa_core.policies.memoization  - There is no memorised next action

However, if I type something in my training data, which is “dog behaviour”, it will get the slot value.

But from your suggestion is that whatever the user type, it will be the slot value, is it?

e8180kimo · October 3, 2018, 11:57pm

I used the above code but still I am not able to get the value. If the sentence structures are various, should it be entities extraction? However, I’ve searched the forum and found a few guys have the similar entities extraction problems. For example, entities extraction problem But no one is able to help.

On the other hand, if I set it as firm when the bot asking like “please fill in the blank, my topic is ______ ?” Then I should use “FreeTextFormField” with "tracker.latest_message.get("text")? But this seems the bot is stupid though.

e8180kimo · October 4, 2018, 9:53pm

no one can help here?

akelad · October 5, 2018, 7:46am

well, cat got classified as the intent None, because this isn’t in the vocabulary of your trained model. This is probably messing up your core predictions. If you want to avoid this happening, you have to enable oov handling.

If you’re frustrated with the docs, could you please clarify which parts of it exactly and maybe also create a github issue? we’d also be happy to submit a PR with improvements.

azarezade · May 14, 2019, 12:58pm

I have the same problem. For example consider weather intent with city entity, and city slot, in the following scenario:

user: how is the weather?

bot: okay, where do you live?

then, the sample replies of the user to the bot can be:

Paris
in Paris
I live in Paris
I am in Paris
I want to know weather in Paris
and so on…

Now, how bot decide to extract entity form the various replies by the user? Does it use entity extraction or just take the whole text and the name of city! or something else?

If it uses the entity extraction to fill city slot (assume there is no lookup table for list of cities) then, I think NLU can’t find entity when user say for example “Paris” or “in Paris”. How Rasa handle this issue?

akelad · May 24, 2019, 12:13pm

hmm i’m not sure i understand why you think it wouldn’t extract “in Paris”? If you have training examples like that it should be ok

azarezade · June 8, 2019, 10:45am

@akelad Yes, It may extract city entity from “in Paris” given enough training data or a lookup_table for the cities. Let me explain better what I mean. Consider the following scenario:

user: call
bot: okay, to whom should I call?

Now if the user answers for example “John Doe”, does entity extraction can identify “John Doe” as a contact entity and fill the slot?

JulianGerhard · June 8, 2019, 11:46am

Hi @azarezade,

the entity extraction can extract everything it has learned before to be an entity. There are several possibilities to learn new entities or use existing ones. You can read further information here:

So I think there are three feasible scenarios:

You provide training data in which contacts are annotated with [name](contact). Since it could be very hard to achieve a good coverage over names here, I’d only suggest to choose this one if you want to regularize the specific contacts you want the slot to be filled with
You rename the slot “contact” into “person” and choose the slot_mapping from_entity and one of your possible EntityExtraction (e.g. spaCy would be a good one here) pipeline is able to extract a person into the slot.
If it has to be “contact”, choose the slot_mapping from_entity(entity='person') and the rest works like in point 2

Those point 2 and 3 seem to be very similar - it has something to do with convenience in this case. You might not want to rename contact to person since a contact is a hyponym of person and thus more detailed.

Did that help you?

azarezade · June 8, 2019, 1:16pm

@JulianGerhard Thanks for the quick reply.

I know about rasa entity extraction and also worked with spaCy. Actually even a good contact/person entity extraction fails in many cases, for example try spaCy with some non Europe/US names like famous Chinese surname “Zhang”, see here, it fails and it seems somehow reasonable, since spaCy doesn’t know anything about our context. But in rasa with know context in the conversion and I was trying to see if there is any way to utilize this context to better identify entities?

JulianGerhard · June 8, 2019, 2:27pm

Hi @azarezade,

okay - got it. I think then we have to tackle the problem another way by questioning why neither spaCy nor other components are efficiently possible to extract those names.

For example the german language is a more tough one in terms of NLP/NLU - especially names are funny because many NER tools fail to extract only first- or lastname. So thinking about your problem - how about training your own component specialised with names:

Using @Juste 's tutorial here:

by adding this one:

might be a good approach. Maybe should give it a try!?

Regards

azarezade · June 10, 2019, 7:35am

Hi @JulianGerhard

I think the reason is that when we have a single name without any context, that is “Zhang”, it is hard for any entity extraction method to detect Zhang as a contact unless it saw it before or have it in lookup table. But, when we say “Please call to Zhang” or “Please call to zzz” then a good entity extraction method should easily detect Zhang or even zzz as person, since we only call to persons!

Yes, a custom component can be used to this purpose. But I’m still not very confident how to design a custom entity extraction that use the context of conversion to better understand that zzz is content even when the user replies to bot question “okay, to whom should I call” with only “zzz” not “please call to zzz”.

Saurbh060 · January 3, 2020, 10:24am

I have also same problem. For example, I’m using pincode as slot or entity and make an action for taking slots and utter_message() but It is not extract input slots. Can you please help me out Here are my files stories.md (6.4 KB) nlu.md (13.4 KB) actions.py (4.0 KB)

Saurbh060 · January 5, 2020, 1:22pm

Hey @akelad, How to set utter sentence as slot

akelad · January 14, 2020, 2:54pm

@Saurbh060 could you clarify what you mean?

huseyinyilmaz01 · October 2, 2020, 4:26pm

Hi @azarezade, If you have some common patterns like “Please call to zzz”, regex might be helpful, and can be used as a back solution in a custom action. Like… try this, if it doesn’t fill PERSON slot, then try this, if that one doesn’t work too, try this…

UlisesVD · October 2, 2020, 11:39pm

I’m in the same situation :'v

my intent is like this

intent:inform

[barcel] (brand)

[pepsi] (brand)

[cocacola] (brand)

[sabritas] (brand)

[dell] (brand)

mi marca es [barcel] (brand)

mi marca es [pepsi] (brand)

mi marca es [sabritas] (brand)

mi marca es [cocacola] (brand)

mi marca es [nicke] (brand)

mi marca es [marinera] (brand)

mi marca es [samsung] (brand)

mi marca es [exxon mobil] (brand)

obviously I have a lot of more examples but when I type a new brand the entity extraction doesn’t work and extract the whole response of the user. I’m using DIETClassifier and I try with other but is the same result.

this type of issue work fine in dialogflow. jeje

UlisesVD · October 3, 2020, 4:55am

I solved this problem with a lot of examples in my nlu file and this configuration

pipeline:

name: WhitespaceTokenizer

name: RegexFeaturizer

name: LexicalSyntacticFeaturizer

name: CountVectorsFeaturizer

name: CountVectorsFeaturizer

Analyzer to use, either ‘word’, ‘char’, or ‘char_wb’

analyzer: “word”

Set the lower and upper boundaries for the n-grams

min_ngram: 1

max_ngram: 1

Set the out-of-vocabulary token

OOV_token: “oov”

Whether to use a shared vocab

use_shared_vocab: False

name: DIETClassifier

batch_strategy: sequence

epochs: 150

name: EntitySynonymMapper

name: ResponseSelector

epochs: 100

Topic		Replies	Views
Intent slot training sample Rasa Open Source	2	467	August 7, 2018
Extracting multiple slots from a single utterance Rasa Open Source	3	1572	November 27, 2020
Extracting entity and filling slot from one word user reply Rasa Open Source	1	435	December 14, 2019
NLU entity extraction when user gives single word as response Rasa Open Source	10	4488	December 11, 2018
Entities not recognized by DIET therefore not training bot as well as not allowing slots to work Rasa Open Source	2	2102	June 9, 2020

Training data and slot entities questions

intent:inform

Analyzer to use, either ‘word’, ‘char’, or ‘char_wb’

Set the lower and upper boundaries for the n-grams

Set the out-of-vocabulary token

Whether to use a shared vocab

Related topics