Hey everyone
Me and my team are from Iowa State University Software Architecture and Design class, and we’ve been digging into Domain-Driven Design (DDD) principles while exploring the Rasa Open Source project; and wanted to share some insights + get your thoughts. Domain Model Overview
Rasa’s domain.yml
is a great example of a declarative domain model. It includes:
- Intents: user goals (
book_flight
,greet
) - Entities: extracted values (
destination
,departure_date
) - Slots: assistant memory for context
- Responses, Actions, and Forms
These work together in bounded contexts:
- NLU: extracts intents/entities
- Dialogue Management: manages slot-filling, forms, and state
- Actions: perform logic, call APIs, update slots
This modular design reflects solid DDD structure.
Entity & Aggregate Analysis
The Domain
class stands out as an aggregate root:
class Domain:
def init(self, intents, entities, slots, responses):
…
Why it fits DDD:
- Holds both data and behavior
- Controls how domain elements are changed
- Keeps domain logic consistent
But there’s still leakage—some modules update domain state directly, bypassing aggregate control (e.g., slots being set outside Domain logic).
Domain Services
Components like RegexFeaturizer
follow the Domain Service pattern well:
- Stateless
- Contain reusable logic
- Easy to test
However, many files like model_training.py
mix domain logic, config parsing, and CLI printing in a single function (train()
), which breaks clean boundaries.
Domain Events + Integration
Events like UserUttered
, SlotSet
, and ActionExecuted
are at the heart of Rasa’s event-driven architecture. These:
- Store chronological convo history
- Power dialogue prediction
- Support debugging + replay
Custom actions and external APIs also rely on event flow (e.g., SlotSet("restaurant_available", True)
) to keep integrations loosely coupled. Very clean design
Code Smell in model_training.py
The train()
function is a DDD anti-pattern:
It mixes NLU, Core, config, telemetry, and training logic—violating bounded contexts and making the system harder to scale.
Refactor Idea: Introduce a TrainingCoordinator
class to handle orchestration, while delegating to NLUTrainer
, CoreTrainer
, and a DataImporter
. This would:
- Respect boundaries
- Improve testability
- Make the training pipeline modular
Open Question
Has anyone tried breaking up or refactoring the training pipeline like this? Or faced similar issues with context leakage in Rasa’s architecture?
Would love to hear how you’ve handled these kinds of boundaries—or thoughts on how Rasa might better align with DDD in future releases.
Thanks for reading!
Team 404 - Iowa State University