Rasa does not extract person names in cyrillic

Hello!I want extract person name.For example bot:What is your name? user:Meruyert(in cyrillic).Problem in that rasa extracts just names which I wrote in intent examples but new names it does not classify as name

hi! sounds like your model has overfit to a few specific names. What config are you using? How much data do you have?

Hello!I am creating bot in russian language and I think i have pretty much data.I am using default config file just changed language from en to ru

in principle there is no reason this should not work. how much data? the default config has changed in different versions, can you post yours here?

This is config file: language: ru pipeline:

  • name: WhitespaceTokenizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100

Configuration for Rasa Core.

Policies

policies:

  • name: MemoizationPolicy
  • name: TEDPolicy max_history: 5 epochs: 100
  • name: MappingPolicy
  • name: FallbackPolicy nlu_threshold: 0.3 core_threshold: 0.3 fallback_action_name: “action_fallback”

I don’t know how to measure data size but overall I have 122 intents

Also how can i handle this problem: when I write down in rasa shell nlu “hello” it shows me that it is greeting intent and probability equals 0.45.It is ok.But problem in that if I write “Send me the latest news in sports” nlu determines it as “what is your favourite sport” intent.So I put nlu threshhold in Fallback policy equals to 0.6 to get rid of last problem when it does not determine intents correctly,but it also inflects to my greeting intent.How to solve it.I want to increase nlu threshhold but it also reflects to correctly determined intents and after increasing threshhold bot does not determine “hello” as greeting

hi @MMustafa ! I can recommend using the Rasa testing tools to pick the right cutoff (and perhaps add more data or tweak your configuration)

you can split your data into train and test sets, and use rasa test with the --histogram option, see Testing Your Assistant

that will show how the confidence values are distributed

Hello Alan!Thanks for response and advice.I used rasa test and wrote some test stories after I got this picture.Unfortunately I don’t understand what does this picture mean.Can you explain to what measure should I look to determine nlu/core threshold in Fallback policy

I have not solved problem with extracting person’s name.My form is Меруерт.I have 30 examples like this with different names.What to do next?Can increasing training examples solve this problem.If yes how much examples do I need, if not what else can I do to solve it?

hi @MMustafa - the rasa test command should produce a histogram, this blog post might help.

for testing your entity performance, I would recommend creating a train test split and evaluating