Lookup Table doesn't work

Hi, im trying to use lookup tables but I’m not having sucess. This is my pipeline and data.

Exemple of “lt_filme” lookup file (filme.txt):

filme

filmes

um filme

o filme

de filmes

dos filmes

os filmes

When i parse q=movie de comédia i obtain this response from rasa nlu:

{

"intent": {

    "name": "A1M_SearchByGenre - Content&SearchMEO&V9.0",

    "confidence": 0.933992862701416

},

"entities": [

    {

        "start": 0,

        "end": 5,

        "value": "movie",

        "entity": "lt_filme",

        "confidence": 0.3769318939068408,

        "extractor": "CRFEntityExtractor"

    },

but movie is not even in the dataset or lookup files.

Someone can help or have the same problem? Thanks.

Hello,

You seem to have very less training data. Adding more data would help. Make sure to add a few examples in the training data which are in the lookup table.

Also, as you can see, the confidence of lt_filme is very less (0.37) and since movie de comédia is similar to filme de comédia, the CRF is identifying it as belonging to lt_filme entity.

Hi @srikar_1996 thanks for answering. I already tried with larger data but got the same result. I always put some of the lookup synonyms in the training phrases. So you are saying that it does not extract only when the word matches with the words in the lookup file but if it is a similar word it also extracts but with less confidence ?

For example when i parse q=“filme de dramático” i obtain this response from rasa nlu: print1000

and when i try q=“filmes de dramático”

These results do not seem to make sense because i have this words in the lookup files, and the extractor some times do not extract the word or extract with low confidence score.

Drama lookup file: Capturedarama

Yes, something like that. This happens with my application as well but I do not use a lookup table. I’m not entirely sure if it’s the same case with lookup tables.

You have multiple entities which are very similar, probably that is the reason the bot extracts with less confidence. For example, lt_dramat, lt_comedia have a similar structure.

Can you show your complete nlu file?