As you said, you could use sleep() inside custom actions. To make it more human like, you can sleep() for the amount of time it would take a human to write the text 
Let’s say the average typing speed is 200 CPM. So it would take length/speed minutes to write the text (or 60*length/speed seconds).
cpm = 200
text = '...'
sleep(60 * len(text) / cpm)
dispatcher.utter_message(text)
Maybe play a bit with the value of cpm to find the one that seems the most natural.