I am trying to create my custom tokenizer class with the following after rasa init:
KLPTTokenizer.py (the same path of config.yml):
from __future__ import annotations
from typing import Any, Dict, Text, List
from rasa.engine.recipes.default_recipe import DefaultV1Recipe
from rasa.nlu.tokenizers.tokenizer import Tokenizer, Token
from klpt.tokenize import Tokenize
from rasa.shared.nlu.training_data.message import Message
@DefaultV1Recipe.register(
component_types=[DefaultV1Recipe.ComponentType.MESSAGE_TOKENIZER],
is_trainable=True
)
class KLPTTokenizer(Tokenizer):
def __init__(self, config: Dict[Text, Any]) -> None:
super().__init__(config)
self._tokenizer = Tokenize("Kurmanji", "Latin")
def tokenize(self, message: Message, attribute: Text) -> List[Token]:
return self._tokenizer.word_tokenize(message.get(attribute))
@staticmethod
def required_packages() -> List[Text]:
current = super().required_packages()
current.append("klpt")
return current
config.yml:
recipe: default.v1
language: ku
pipeline:
- name: KLPTTokenizer
component_type: message_tokenizer
class_name: KLPTTokenizer
policies:
requirements.txt (all installed):
rasa[full]
klpt
rasa-nlu
The rasa version:
Rasa Version : 3.4.2
Minimum Compatible Version: 3.0.0
Rasa SDK Version : 3.4.0
Python Version : 3.8.10
Operating System : Linux-5.15.0-58-generic-x86_64-with-glibc2.29
Python Path : /home/cergo/Desktop/Kurdish-Rasa/venv/bin/python
The error I get when run rasa train
:
InvalidConfigException: Can't load class for name 'KLPTTokenizer'. Please make sure to provide a valid name or module path and to register it using the '@DefaultV1Recipe.register' decorator.