Thanks for Rasa team’s wonderful work. I am a new Rasa user and learning some basic usages.
I’m building Chinese weather NLU models which includes city slot.
After training the models, some entities which in the lookup table but not in the examples such as '西安' could be tagged as common.city entity, but some entities could not be tagged out, such as '日照'. I am confused about this result. With the same context and same entity regex feature value, why the output is different for this two entities?
Maybe I miss something in the config for Chinese? Or I need to add more data?
Yes. This is all the config.yml content. I think the policies config are used for training dialogue models from stories data? Please correct me if I am wrong.
Cause I only need to train the nlu model. I don’t add any custom policies in it.
Thanks, But the config.yml says if no custom policy is added, the default will be used.
And why the policy config is needed if I only train nlu part?
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
# - name: MemoizationPolicy
# - name: RulePolicy
# - name: UnexpecTEDIntentPolicy
# max_history: 5
# epochs: 100
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
@chaoyang I just asked to see the config complete file, if you comment all then also it will take the default config.yml, but you had customise as per your use case, by that it will not be default, does it make sense now?
Thanks. Below is the complete file. Sorry that I show the content without the comments cause I just thought it will not impact the training. According to this complete file, do you have any suggestions?
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: JiebaTokenizer
- name: RegexFeaturizer
use_word_boundaries: False
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
# - name: MemoizationPolicy
# - name: RulePolicy
# - name: UnexpecTEDIntentPolicy
# max_history: 5
# epochs: 100
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
``
@chaoyang as your example is mention in Chinese, its very difficult for me honestly so please bare with me. You are creating weather chatbot first change common.city to city only it those 上海 , 苏州 etc and even in lookup mention only city. What ever city name you will give in lookup it will fetch that only, apart from that it will not return. Try till this now and tell me result.
I appreciate so much that you give so careful and frequent reply on this Chinese related problem. But I don’t quite understand your suggestin. Is that mean I need to change the entity name from common.city to city ? Would you like to give me more interpretation on this? Thanks!
Model 1. Lookup table has 18 entities. Train nlu model. Not work well. Some entities are not tagged by model.
Model 2. Add 8 more entites. Now the Lookup table has 26 entities. Train nlu model. Work well. All entities in lookup table are tagged by model.
Model 3. Remove the 8 entites. Now the Lookup table has the same 18 entities with model 1. Train nlu model. Still Work well. All entities in lookup table are tagged by model.
@chaoyang Nice, cool.Model 2 work well then, try provide more training and lookup examples and train and delete older models and re-train. If you have any issue please let me know Xièxiè
It could be just pure luck due to the randomness of Machine Learning If you want to accurately compare two Pipeline Components or Policies across multiple trainings, you could set a Seed for DIET, ResponseSelector, and TED like so for example:
- name: DIETClassifier
random_seed: 1
// other parameters
I also suggest you use Tensorboard to make comparisons and choose an optimal configuration. This is also doable on DIET, ResponseSelector, and TED like so for example:
Try to set evaluate_on_number_of_examples to about 20% of your total number of examples (of course, this means these examples will not be used for training and you will have to give a bit more examples). You can use this script I write to count the number of examples you have.