How to properly use `unfeaturized` slot, so that it does not affect core predictions?

(Psds01) #1

Usage problem: I have created a custom NLU component following this. It detects language of the text, adds that language as an extra entity.

  1. I have not included this extra entity type under domain -> entity.
  2. But I have set a slot for this entity type, a slot of type unfeaturized.

If I train NLU+core, with LanguageClassifier component and language slot both disabled, everything works as expected.

But if I enable the LanguageClassifier component and language slot (type:unfeaturized) both, the core breaks up. It doesn’t proceed forward at all. The very first action is predicted as “custom_fallback” (with UserUtteranceReverted). So you can imagine the infinite loop.

My guess is that the language slot of type unfeaturized is not working as it should. What do you guys think? How to solve this?

Also, is there a better way to add this new 'language" property of text to message? Something like :

message.set("language", language, True) 

Without affecting predictions by core? What all “message properties” does core use to predict next action?

Gist of the component class:

class LanguageClassifier(Component):
    requires: List = [
        "entities",
    ]
    provides: List = [
        "entities",
    ]
    # Component configurations. These values can be overwritten in the `config` file
    defaults: Dict = {
        "model_params": {
            "kernel": "linear",
        },
    }
    # What language(s) this component can handle. Default - None : handles all languages
    language_list = None
    def __init__(
            self,
            component_config: Dict[Text, Any] = None):
        super(LanguageClassifier, self).__init__(component_config)

    def train(...):
        """
        Train this component.
        """
    def process(self, message, **kwargs):
        lang = "en"
        if some_condition:
            text = message.text
            text = text.lower()
            arr = self._featurize_text(text)
            lang = self.clf.predict([arr])[0]
        # I WANT TO DO THIS!!
        # message.set("language", lang, add_to_output=True)
        entity = self.convert_to_rasa(lang, 1.0)
        message.set(
            "entities",
            message.get("entities", []) + [entity],
            add_to_output=True
        )

    @classmethod
    def load(...) -> "LanguageClassifier":
        pass

    def persist(...) -> Optional[Dict[Text, Any]]:
        pass

    def convert_to_rasa(self, value: Text, confidence: float) -> Dict:
        entity = {
            "value": value,
            "confidence": confidence,
            "entity": "language",
            "extractor": "LanguageClassifier",
            "start":0,
            "end":0
        }
        return entity


Command or request that led to error:


Content of configuration file (config.yml) (if relevant):

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
  - name : WhitespaceTokenizer
  - name : RegexFeaturizer
  - name : CRFEntityExtractor
  - name : EntitySynonymMapper
  - name : CountVectorsFeaturizer
  - name : EmbeddingIntentClassifier
  - name : LanguageClassifier

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
    max_history : 10
  - name: KerasPolicy
    max_history : 10
  - name: MappingPolicy
  - name: FallbackPolicy
    fallback_action_name: 'action_default_fallback'
    nlu_threshold: 0.5
    core_threshold: 0.9

Content of domain file (domain.yml) (if relevant):

slots:
  language:
    type: unfeaturized
  a:
    type: text
  b:
    type: text

entities:
  - a
  - b

.... usual stuff