Saturday, January 28, 2012

Domain Driven Design and batch processing

In DDD the design of a system is very object centric and therefore focuses on individual objects (or aggregates of objects) that interacts through sending messages (commands, queries and events). This is very unlike traditional batch processing where one or several functions are applied iteratively to a batch of input data. Between the two there is a significant missmatch, but nevertheless, once in a while we need to offer a batch oriented interface to our domain logic or need to call a service that offers a batch interface.

Implementing a batch interface
This is the easiest part. We just need to create a thin script that manages the iteration over the batch, for each entry makes a call to the proper application service (tasked with coordinating calls on the domain model) and collects any response data returned. All business logic needed to perform the batch operation is held inside the domain package, as usual. From a domain model point-of-view, the batch processing script is just another client calling the same application services as any online client would do to perform the same task. It is just that this one is making many requests over a short period of time.

Calling a batch interface
In general, batch processing is just an asynchronous call. Yes, you combine several requests into one, but a call with just one request in the batch would still be a valid one. As long as there is no importance in which requests are made together and no relevance in the responses comming back together or aggregated in some way, that is. But then it isn't truly a batch, then it is just a single request with many input parameters. In the following I will discuss a possible solution for when the service is a true batch, i.e. serving many unrelated requests in one call, asynchronously.
Let's consider a case where we have a type of domain object including a method that is to be implemented with a call to an external service. A service that happens to have a batch processing interface. From a domain perspective the nature of this method is asynchronous. We just don't know how long we will have to wait for the result. But the fact that the call is made in batches, and not one by one is an implementation detail to be handled by the infrastructure layer.
The domain should be fully decoupled from the batch handling, which should be handled by infrastructure code. In the domain we define a service interface that takes the request for one domain object and returns nothing. This service could be called by an application service or by the domain object, or any other object or service in the domain. Then we define an event handler that gets notified when the result of the request arrive. The event handler would be responsible for taking appropriate action on the domain object depending on the result.
In the infrastructure layer we will implement the service with a message queue and on the other side of that queue some code that combines individual requests into suitably sized batch calls. The frequency and size of those batches might be tuned for performance and response times. E.g. one batch for every X number of requests, but at least one batch every Y minutes provided at least one request has been made.
Then we need a batch-driven routine that handles responses, splits them into individual messages and places them on a response queue for the event handler to process in the domain.