Add Ngram for Word level instead char level

narendraprasath · July 23, 2019, 5:30am

Rasa provides Ngram features component by default which able to do character level.

But I am looking for word level N-gram feature extraction.

How can I achieve that? Or Will I need to implement my own component for word level N-gram feature extraction. If yes, The below pipline looks correct or not?

    - name: "WhitespaceTokenizer"
    - name: "Tri-GramFeature"  ## own component #rasa
    - name: "CRFEntityExtractor"  
    - name: "EntitySynonymMapper"
    - name: "CountVectorsFeaturizer"
    - name: "EmbeddingIntentClassifier"

SamS · July 23, 2019, 8:13am

Hey @narendraprasath, and welcome to the forum!

The CountVectorsFeaturizer supports word n-grams as well. Take a look at all the available options here. In particular, you can use the (default) word analyzer and set your desired n-gram minimum and maximum lengths like this:

- name: "CountVectorsFeaturizer"
  analyzer: "word"
  min_ngram: 1
  max_ngram: 3

Does this answer your question? Also, the upcoming Rasa summit is a cool opportunity to meet Rasa contributors, creators and users, and discuss anything Rasa-related

narendraprasath · July 23, 2019, 11:47am

Thanks @SamS

The answer sounds really good. it solves my problem.

mmm3bbb · September 19, 2019, 11:06pm

Would this work for pre-trained embeddings or just for supervised embeddings.

Topic		Replies	Views
Can we use both word and character in word count featurizer in rasa Rasa Open Source	3	544	October 6, 2021
Rasa Pipeline Doubt Rasa Open Source	2	465	June 24, 2020
How works N-gram based model Rasa Open Source	0	1047	July 29, 2019
Phonetics Featurizer Rasa Open Source	19	1283	September 14, 2021
Rasa com Rasa Open Source	13	1569	April 24, 2020

Add Ngram for Word level instead char level

Related topics