Problem with rasa test

Hello!

I am suspecting of a bug in Rasa test and I need help to solve it.

I am running rasa test e2e. In one scenario, rasa detects two entities for the same data with different names. This is not a problem while running rasa chatbot, but it generates error while running e2e tests. The predicted entities and the gold entities are different, so rasa detects a mismatch. However, one of the entities is not important in the tested scenario. So, is it a valid mismatch? In my opinion, it is not.

Moreover, there is a code that pads the entities list when they are different, filling the list with None so that they have the same size. However, when test calls markdown.py, it happens an error because markdown tries to get an attribute in the dict and it fails because the object is None. The details are bellow:

#Code that pads entity list when they are different
test.py (line 226):
if entity_gold or predicted_entities:
  if len(entity_gold) > len(predicted_entities):
   predicted_entities = pad_list_to_size(predicted_entities, len(entity_gold), “None” )
 elif len(predicted_entities) > len(entity_gold):
  entity_gold = pad_list_to_size(entity_gold, len(predicted_entities), “None”)

#Code that get sorted entities by ‘start’
#The problem is that ‘start’ does not exist in the padded list with None for some objects markdown.py (line 293):
entities = sorted(message.get(“entities”, []), key=lambda k: k[“start”])

We are using Rasa 1.1.4

Hi @alessandrarequena, if you think you’ve found a bug, please create a bug report on our github repo: https://github.com/RasaHQ/rasa/issues/new?assignees=&labels=bug&template=bug_report.md&title=

@alessandrarequena I’ve run into a similar issue here: Errors using `rasa test`

I’ve been able to prevent the code from throwing an error by creating a stub dict with value:

{‘start’: 0, ‘end’: 0, ‘entity’: ‘’, ‘value’: ‘’}

and then inserting that in place of “None”.

This allows the e2e tests to complete, but I’m not familiar enough with the metrics to know if this corrupts the results.

Let me know what you think. Thanks.