Use of Out of Vocabulary - OOV

How do I use the OOV token in my rasa bot? I read about it in the documentation but did not get a clear idea about how to use it. Where in my file(action,domain,nlu) do I insert the oov_word or oov_token?? I need a stepwise solution to this to not make it more tiresome that it already is :smile:…

Example : I have to get any reason for an employee leave application. Kindly guide me.

Also, since I have used supervised_embeddings as the pipeline, do I change the pipeline or do I need to add “CountVectorsFeaturizer” to the existing pipeline?

Thanks in Advance!!

1 Like

Can you look into this??? @btotharye @Mappi @JiteshGaikwad @stephens

hey @varunsapre10, you can check this congif.yml of sara demo bot:

Let mek know if this helps you :slight_smile:

Thank, @JiteshGaikwad, you would then use the oov token in your NLU training data. For example, our rasa-demo bot uses an oov value in an enter_data intent:

## intent:enter_data
- my budget is oov
- oov
- oov per year

Thanks @stephens @JiteshGaikwad, I think this will do it. :+1:

Also,can you explain what “token_pattern: (?u)\b\w+\b” means??? in the sara demo bot config file

Secondly, can we use “oov” with intent? for example:

intent:reason

  • [sick]((reason)
  • [emergency]((reason)
  • [oov]((reason)

Will this work??

@stephens @JiteshGaikwad @btotharye @JulianGerhard It works if I remove my default fallback response. So either the fallback will work or the oov will work. It accepts the flow but when i try to print the value of slot ‘reason’, it shows ‘None’. Is it because i have used ‘oov’ with an entity/slot??

1

If i enter anything from this intent, the reason is printed fine. But if i enter anything except this, it should print whatever is entered as a reason. but it does not. Please help.

I want to join this discussion, as I found no ready recipe for this particular problem in any of the forum branches. It took me more than 1 week of frustration and a huge number of attempts to crack this task. IMHO, this should be described in the very first intro Rasa guide, as extracting any entities from user inputs (based on a set of example inputs having exactly the same structure) is arguably the most popular thing one wants from the bot.

Environment:

  • Rasa Version : 2.0.3
  • Rasa SDK Version : 2.0.0
  • Rasa X Version : None
  • Python Version : 3.7.7
  • Operating System : Darwin-19.6.0-x86_64-i386-64bit

Here is the config pipeline that worked for me to finally extract any word user puts in the training phrase:

pipeline:
- name: WhitespaceTokenizer
- name: CRFEntityExtractor
- name: CountVectorsFeaturizer
  OOV_token: _oov_
  token_pattern: (?u)\b\w+\b
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100
- name: FallbackClassifier
  threshold: 0.3
  ambiguity_threshold: 0.1
policies:
  - max_history: 5
    name: AugmentedMemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy

Note the use of CRFEntityExtractor, it is an important part of success, as with default RegexFeaturizer and LexicalSyntacticFeaturizer I couldn’t make it work.

Then in the nlu.yml file I use training phrases like these:

    - I want to buy some [oranges](fruits)
    - I want to buy some [mandarines](fruits)
    - I want to buy some [grapefruits](fruits)
    - I want to buy some [kiwis](fruits)
    - I want to buy some _oov_

With this setup the bot grabs any word that user puts in place of _oov_, so if I put “I want to buy some BMWs”, it will recognise “BMWs” as fruits and will save it to the slot. You might have to handle this later since BMWs are clearly not fruits, but that is a completely different story. I hope this helps somebody.

FYI, after some more trials I’ve figured out that oov recognition does not happen at all with DIETclassifier, but works sometimes with CRFEntityExtractor if I provided at least 10 test phrases with different words in place of oov token.

Nevertheless, it stopped working after I’ve added more modified variations of test phrases (rephrased in different but very similar words). Maybe one has to be a NLP pro to quickly and successfully build a bot with Rasa, but for me as a complete beginner it takes enormous effort and still the results make zero sense.

@Dar0n Note that CRFEntityExtractor does’t even use the OOV token in your configuration, because it is stated before the OOV token. Use this link for more info on pipeline order.