We are using the NLU feature of Rasa to train intents with data from our messaging system. I would like to know best practices with what to store as valid training data. We are using supervised_embeddings.
Specifically:
- What is the max length of characters recommended for any one training value? Are long training values discouraged?
- Does the Python API (Python API) allow for testing against strings of any length (e.g. interpreter.parse(long_text))?
- Is it recommended to filter out certain characters or text like URLs, hashes, etc. We are training by sending raw message data from our system, which generally has HTML and URLs in the message body.
Any documentation on these limits/recommendations would really help.