Custom component for spell checking not working in Rasa 2.0

Hi,

I am trying to add a custom component for spell checking to my pipline.

This is my rasa version:

Rasa Version      :         2.8.3
Minimum Compatible Version: 2.8.0
Rasa SDK Version  :         2.8.1
Rasa X Version    :         None
Python Version    :         3.8.0

Here is my code for custom component:

from rasa.nlu.components import Component
from rasa.nlu import utils
from rasa.nlu.model import Metadata

from spellchecker import SpellChecker
spell = SpellChecker()

class CorrectSpelling(Component):

    name = "Spell_checker"
    provides = ["message"]
    requires = ["message"]
    language_list = ["en"]

    def __init__(self, component_config=None):
        super(CorrectSpelling, self).__init__(component_config)

    def train(self, training_data, cfg, **kwargs):
        """Not needed, because the the model is pretrained"""
        pass

    def process(self, message, **kwargs):
        """Retrieve the text message, do spelling correction word by word,
        then append all the words and form the sentence,
        pass it to next component of pipeline"""

        for k,v in message.data.items():
            if (k == 'text'):
                textdata=v

        textdata = textdata.split()
        new_message = ' '.join(spell.correction(w) for w in textdata)
        
        message.data['text']=new_message

    def persist(self,file_name, model_dir):
        """Pass because a pre-trained model is already persisted"""
        pass

It is the same code from here but adjusted to Rasa 2.0 as somone explained in the comments of that article.

Also, here is my config.yml:

   - name: SpacyNLP
     model: en_core_web_sm  
   - name: "spellchecking.CorrectSpelling" 
   - name: SpacyTokenizer
   - name: SpacyFeaturizer
   - name: SpacyEntityExtractor
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: "char_wb"
     min_ngram: 1
     max_ngram: 4
   - name: DIETClassifier
     epochs: 100
     constrain_similarities: true
   - name: FallbackClassifier
     threshold: 0.7
     ambiguity_threshold: 0.1
   - name: EntitySynonymMapper
   - name: ResponseSelector
     epochs: 100

Chatbot trains and runs fine, but when I input a message, I get a TimeoutError. If someone could point out where is the problem, I would greatly appreciate it!

Could you share the full traceback?

Sure, here it is:

ERROR    asyncio  - Task exception was never retrieved
future: <Task finished name='Task-2' coro=<configure_app.<locals>.run_cmdline_io() done, defined at c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\rasa\core\run.py:131> exception=TimeoutError()>
Traceback (most recent call last):
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\rasa\core\run.py", line 135, in run_cmdline_io
    await console.record_messages(
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\rasa\core\channels\console.py", line 182, in record_messages
    async for response in bot_responses:
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\rasa\core\channels\console.py", line 137, in _send_message_receive_stream
    async for line in resp.content:
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\aiohttp\streams.py", line 39, in __anext__
    rv = await self.read_func()
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\aiohttp\streams.py", line 338, in readline
    await self._wait("readline")
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\aiohttp\streams.py", line 306, in _wait
    await waiter
  File "c:\users\summer camp\anaconda3\envs\rasa_env\lib\site-packages\aiohttp\helpers.py", line 656, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

Could you try adding this to your component:


    @classmethod
    def load(
        cls,
        meta: Dict[Text, Any],
        model_dir: Optional[Text] = None,
        model_metadata: Optional["Metadata"] = None,
        cached_component: Optional["Component"] = None,
        **kwargs: Any,
    ) -> "Component":
        """Load this component from file."""

        if cached_component:
            return cached_component

        return cls(meta)

I tried, I also imported the rest of libraries as stated in Components docs. I even tried setting timeout to more seconds in console.py file, but I still get the same error :frowning:

@tkranz Heya! Did you created the virtual environment for rasa? Just a curious question, I hope you don’t mind whilst replying. Thanks

No problem, indeed I did create virtual environment for rasa.

@tkranz can you please share the rasa version rasa --version

@tkranz I hope you have seen this blog post Building Rasa NLU custom component for spell checking in incoming message | by Nikhil Cheke | Medium or Custom rasa component returning string to the next component - #2 by Juste If not please see. Hope this will help.

I wonder if the blogpost you used was using an old API. Could you varify if this component works?

Thanks for the help :slight_smile: . Rasa version I use is stated in my first post. The first link is also included in my first post, that’s where I picked up my Spell checking custom component code from. I tried with the code from the second link, but it seems it is outdated just like the first one, I get the same error.

I tried with the code from the link. First, I got the following error when rasa was training:

NotImplementedError: Cannot convert a symbolic Tensor (strided_slice_4:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

But, I guess I fixed it with installing proper version of numpy (1.19.5)

After that, when I trained again, the following warning popped up:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

But, I guess it’s ok to ignore it since the model trains and runs just fine.

I get the same TimeoutError, however I do get some kind of output below message input, I guess from the print_message function.

This is all very strange indeed, but I suppose there’s one other thing to check. There’s a suggestion on another github project that it might be related to the python version. Could you try running this in python3.7?