Rasa don't work with russian language

Hi everyone! I want use russian language in rasa nlu, but rasa wrong recognize intents. How i can resolve it?


Hello, Andrey!

Could you show your nlu pipeline?

# The config recipe. recipe: default.v1

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: ru
  - name: "SpacyNLP"
    model: "ru_core_news_lg"
    case_sensitive: False
  - name: "SpacyTokenizer"
    "intent_tokenization_flag": False
    "intent_split_symbol": "_"
    "token_pattern": None
  - name: "SpacyFeaturizer"
    "pooling": "mean"
  - name: "RegexFeaturizer"
    "case_sensitive": False
    "use_word_boundaries": True
  - name: "CountVectorsFeaturizer"
  - name: DIETClassifier
    epochs: 50
    batch_strategy: "sequence"
    similarity_type: "inner"
    maximum_positive_similarity: 0.9
    maximum_negative_similarity: 0
    constrain_similarities: true
    model_confidence: "softmax"
    ranking_length: 3
    entity_recognition: False
    evaluate_every_number_of_epochs: -1
    use_masked_language_model: True
  - name: "SpacyEntityExtractor"
  # dimensions to extract
    dimensions: ["PERSON", "LOC", "ORG", "PRODUCT", "LANGUAGE", "PERCENT"] 
  - name: FallbackClassifier

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
# # No configuration for policies was provided. The following default policies were used to train your model.
# If you'd like to customize them, uncomment and adjust the policies.
# See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

Does default DIET hyperparams give you same result? You can try add some examples like ‘акакдлщва’ to out_of_scope intent. You also may develop custom component to detect this queries (e.g. look at big russian dictionary).