Error in training nlu with BytePairFeaturizer

Hi RASAians

I was trying to train rasa nlu with new pipeline config from recently added project in github: rasa_nlu_example and had faced error. i am trying to use BytePairFeaturizer in my pipeline and faced this error. i didn’t have any problem to train nlu with normal pipeline config.

rasa version:1.10.8 python: 3 extra project installation: pip install git+https://github.com/RasaHQ/rasa-nlu-examples

pipeline config:

language: fa
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 5
  - name: rasa_nlu_examples.featurizers.dense.BytePairFeaturizer
    lang: fa
    vs: 200000
    dim: 300
  - name: DIETClassifier
    epochs: 1
  - name: EntitySynonymMapper

rasa train nlu log:

2020-07-20 21:04:27 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2020-07-20 21:04:28 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:04:28 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2020-07-20 21:04:44 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:04:44 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-07-20 21:04:58 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:04:58 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-07-20 21:05:15 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:05:15 INFO     rasa.nlu.model  - Starting to train component BytePairFeaturizer
2020-07-20 21:05:16 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:05:16 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
2020-07-20 21:05:17.987757: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
Epochs:   0%|                                                                                                           | 0/1 [00:00<?, ?it/s]Epochs: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:31<00:00, 31.43s/it, t_loss=11.023, i_loss=1.769, entity_loss=6.038, i_acc=0.761, entity_f1=0.400]
2020-07-20 21:06:08 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-07-20 21:06:09 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 21:06:09 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-07-20 21:06:09 INFO     rasa.nlu.model  - Finished training component.
    Traceback (most recent call last):
      File "/home/ubuntu/.local/bin/rasa", line 8, in <module>
        sys.exit(main())
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/__main__.py", line 92, in main
        cmdline_arguments.func(cmdline_arguments)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/cli/train.py", line 140, in train_nlu
        persist_nlu_training_data=args.persist_nlu_data,
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/train.py", line 414, in train_nlu
        persist_nlu_training_data,
      File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/train.py", line 453, in _train_nlu_async
        persist_nlu_training_data=persist_nlu_training_data,
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/train.py", line 482, in _train_nlu_with_validated_data
        persist_nlu_training_data=persist_nlu_training_data,
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/nlu/train.py", line 94, in train
        path, persistor, fixed_model_name, persist_nlu_training_data
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/nlu/model.py", line 239, in persist
        Metadata(metadata, dir_name).persist(dir_name)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/nlu/model.py", line 113, in persist
        write_json_to_file(filename, metadata, indent=4)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/nlu/utils/__init__.py", line 57, in write_json_to_file
        write_to_file(filename, json_to_string(obj, **kwargs))
      File "/home/ubuntu/.local/lib/python3.6/site-packages/rasa/nlu/utils/__init__.py", line 51, in json_to_string
        return json.dumps(obj, indent=indent, ensure_ascii=ensure_ascii, **kwargs)
      File "/usr/lib/python3.6/json/__init__.py", line 238, in dumps
        **kw).encode(obj)
      File "/usr/lib/python3.6/json/encoder.py", line 201, in encode
        chunks = list(chunks)
      File "/usr/lib/python3.6/json/encoder.py", line 430, in _iterencode
        yield from _iterencode_dict(o, _current_indent_level)
      File "/usr/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/usr/lib/python3.6/json/encoder.py", line 325, in _iterencode_list
        yield from chunks
      File "/usr/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/usr/lib/python3.6/json/encoder.py", line 437, in _iterencode
        o = _default(o)
      File "/usr/lib/python3.6/json/encoder.py", line 180, in default
        o.__class__.__name__)
    **TypeError: Object of type 'PosixPath' is not JSON serializable**
1 Like

Strange. Thanks for flagging this! I’m the maintainer of that project so I’ll gladly help you figure out what is happening.

Just to be sure, this error does not occur when you remove the BytePairFeaturizer? My knowledge on persian is limited, but I just want to make sure that this error is not caused by the tokenizer.

1 Like

Ah nevermind. I think I’ve found the error. My unit tests were using rasa test nlu to confirm that training would work but it now seems that rasa train nlu has extra demands on how we handle settings. I’m working on a fix now.

Yep. This was totally a bug on my side. Once again, thanks for reporting it!

I’ve added the training of the model as a unit test and I just merged this PR which means that the repo should be fixed on this issue. You will need to uninstall/install the package again but this update should fix your issue.

ps. A small favor, we’re interested in understanding if these embeddings improve the accuraccy of your system. If it turns out that these embeddings are helpful (or if this isn’t the case) could you let us know? Sending me a message here should be fine.

1 Like

by the way, we are always looking for datasets in new languages to benchmark new versions of Rasa. Would you be willing to (privately) share your dataset with us so we can make sure our models work well for Persian?

thank you very much. the error is gone. I’ve used RASA NLU just for Geo-parsing task and tagging address entities in a address-text-corpus was generated by python code. if your are in interest, i will share it. thanks for amazing community!

I will report my experience with new pipeline.

1 Like