Entity extraction training time exploded using roles for entitites

Hi everyone ! I’m using CRFEntityExtractor in order to extract entities, and recently used one of Rasa’s new features which enables the extraction of entities according to their role. I just added 2 new entities with 2 roles ( the roles are the same for both entities) However, since I started using roles, the training of the CRFEntityExtractor takes a lot of time compared to before using roles ( almost 2hrs while it used to be 5 mins, both with GPU). I tried a lot a things ( adding more data with roles, have a balanced number of entities in each group …) but nothing seems to change the training time. I also recently tried using DIET instead of CRF, it takes less time, but it’s still quite long. I haven’t seen anyone complaining of this problem when using roles , so I’m really annoyed. Dos anyone would know what could cause this increase of the training time ? Thanks a lot !

1 Like

hi @E_Ben - cool to see you using the new feature! The CRFEntityExtractor implementation doesn’t support GPU, so that won’t speed up anything (The DIET implementation does on the other hand). I’m curious though why your training time increased. Are you sure it’s only due to adding the entity roles, and nothing else changes? How many training examples do you have?

Hi @amn41 ! Thanks a lot for your reply ! Ok thanks for the info, I didn’t know that ! Yes I really didn’t change anything apart from this entity roles in my data. I even tried to train it twice :

  1. with only one new entity with 2 roles ( I added around 200 exemples), the training time still took 1h45
  2. I kept this new entity and the 200 exemples but deleted the role entity in my data (so there wasn’t any role in my data), and the training time went back to 4 min … In total I have around 250 exemples for each new entity created ( the average number of training data for each custom entity is around 300, and I have 16 entities) Overall I have around 15k training sentences (I didn’t know if you asked about the total number of training data or the number that I added with roles ).

thanks! this is super helpful. @Tanja who implemented the feature is on vacation this week, but we’ll make sure to give this a go.

I’ve also left a comment on this issue Model Regression tests · Issue #5830 · RasaHQ/rasa · GitHub referencing this thread, so that we make sure to include this in the performance tests

mmhh… that is really interesting. Thanks for sharing this! In our experiments we saw a minimal increase in training time, but nothing compared to what you are reporting. Any chance you could share your bot so that I can have a closer look? Otherwise, could you maybe share some examples that include roles to see how your training data looks like? Also, what exact Rasa version are you using and what pipeline do you use? Thanks.

i also recorded a complaint about it at the introductory post related to entity roles and groups by @Tanja

my version was 1.10.0