Yes we have implemented your own pipeline.
We have used Mecab for tokenization but it was not robust against OOV hence we’ve built another tokenizer which works better !
We have tried various pipelines for NLU.
First, we trained the basic CRF and StarSpace provided from RASA on the same dataset (i.e. nlu.md). But we have found that it maybe a better idea to use different sets of data for enttiy extraction (EE) and intent classification (IC). That was your second approach. We then found that train EE and IC JOINTLY could improve the accuracy and we obtained the “best” result among the three pipelines.