Evaluation of Dialog Policy

I want to evaluate the Dialog Policy as it was done in the TED Paper and therefore I’m looking for a way to train different models on varying number of dialogs without having to split manually.

When I run rasa train core using the --percentages flag as described here I still end up with one model instead of one for each percentage value.

Is there some built in support (I also haven’t found an option to use cross validation) for evaluating core?

Hi @dav-92 thanks for flagging, are you able to please share the config files with me so I can try to replicate?

Hi @anca

I used this command :

rasa train core --out comparison_models --runs 3 --percentages 0 5 25 50 70 95

which produces one model

I did not specify different config files like in the example because I only want to use one but varying amounts of data so I’m not sure if this is even possible.

I used this config file:

language: de
pipeline:
  - name: SpacyNLP
    model: "de_core_news_sm"
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: FallbackClassifier
    threshold: 0.7
    
policies:
  #- name: MemoizationPolicy
  - name: RulePolicy
  - name: TEDPolicy
    max_history: 10
    epochs: 100

The .zip contains the whole project

chatbot.zip (2.5 MB)

To me this seems like a bug. It’s a bit tricky to fix though as we need some other marker than the current to decide whether the user explicitly wants to do comparison training.

Could you please create an issue here @dav-92 ?

Thanks!

Issue: Training core using --pecentages only produces one model when only one config is specified · Issue #8673 · RasaHQ/rasa · GitHub

Awesome - thank you!