SE Thinking - Thoughts from a Software Engineer: September 2012

This post is about Domain Driven Design (DDD) and Service Oriented Architecture (SOA) and how the former can be used to build services for the latter. But it starts of in a few observations about current state of the union.

The "DDD needs a database" assumption

I've come to find it being a pretty common understanding that software systems using DDD building blocks have to be backed by a database, and, even more specific, having repository implementations using an ORM framework. This leads to the assumption that domain entities, once added to the repository, are attached to an EntityManager that is responsible for "auto-magically" saving changed data as transactions are completed. I find these assumptions bringing to many constraints on a domain driven system architecture without any need. In fact, I can't find anything in the DDD approach, as put forward by Eric Evans, that supports this understanding. Instead I see the domain model as a piece of code completely free of any explicit or implicit dependencies on the surroundings. For a couple of years I've been working using this approach and in this post I will describe the architecture of such a system where repository implementations rely on services published by other systems as well as storing part of the data in a dedicated database.

SOA as spaghetti on top of CRUD

I've also found Service Oriented Architectures (SOA) to be implemented as mostly CRUD operations for fine grained data objects, or services with, even really complex, business logic firmly placed in Transaction Scripts (TS). TS might be good enough for simple things, but quite quickly the code turns into spaghetti as complexity rises. Complex business logic is where DDD really shines, but given the assumptions discussed above, many people don't see it as an option when there is no underlying database, but a set of CRUD like services.

And there we have another problem with many service implementations. When just using simple DTOs as parameters and/or return values and presenting the client with a CRUD-like service API, what support are we then giving the client-side developer? A bunch of getters and setters that could be set in any combination! How to know what (from a business point-of-view) makes up a valid object? And how to know what are valid modifications? Such APIs force the client-side developer to have intimate knowledge about the server-side implementation, and changes to the API won't show up as compiler errors. When APIs that I depend on change, I like the compiler to be able to notify me.

I think we can do much better and offer the client-side developer much better support by extending the service API with some DDD building blocks like Entities and Value Objects. In this post I will explain how we did that, in effect implementing a published domain.

Separation of concerns - The independent domain model

One of the principles of DDD that I think are most important to apply is "Separation of concerns". It is also part of Robert C Martin's SOLID principles by the name "Single responsibility principle". In short, every piece of code should deal with one problem only and have only one reason to change. It is also an important part of writing clean code since it increases clarity if a piece of code only does one thing. In the DDD perspective I apply this principle by separating code that models the business domain in order to solve business problems from technical code that glues the application together or handles communication with the outside world, such as databases or services exposed by other systems. This leads to a system with a traditional layered architecture with a slight twist: the domain model is kept in the center with no outgoing dependencies whatsoever. The domain model is concerned only with solving business problems, while the surrounding integration layer (or "anti-corruption layer" in Evans' terms) is concerned only with technical issues. The domain model is built using the DDD building blocks (entities, value objects, repositories, services and factories), with repositories (and sometimes also services and factories) represented as interfaces used by the domain, but implemented in the integration layer. Thereby creating the dependency from the infrastructure layer to the domain, and that's the twist! This is perhaps an unusual architectural style, but it is not new. Similar models have previously been described as “Hexagonal Architecture” and “Onion Architecture” among others. Robert C Martin has a nice summary of the history of this architecturalstyle on his blog.

This architectural style brings the following benefits:

- Easy to read business logic, since it is not mixed with code for interaction with databases, messaging services or other technical concerns. This caters for fewer mistakes as the software evolves.

- Easy to deploy. The domain code can be deployed in different runtime environments without changes, since no dependencies exist. At one point we made great use of this as we were moving to a new deployment platform while at the same time developing new functionality for the business.

- Easy to test business logic. All business logic can be tested by automation without any need for a runtime container. Doesn't really have to argue why that is good...

Repository on top of services

Since the domain is only concerned with business logic and the business doesn't care how entities are stored, only the fact that they can be stored and at what step in the business process that happens, all repositories in the domain are represented as interfaces. It is now up to the integration layer to provide implementations for those repositories.

Regardless whether entities are to be stored in a database, in files, or by calling a remote service (In my case it was services exposed via a customer specific API on top of stateless EJB, i.e. basically RMI, but it could be web services or any other protocol as well), or a combination, a repository needs access to the internal state of the entity in order to get (for storing) and set (for re-constructing) values. Typically most of that state isn't public in the domain model, so the repository implementation needs to gain access. Depending on security settings, reflection might be an alternative. In most cases it is not. Instead we made the members of the entity protected and created a specific sub-class of the entity in the integration layer that we used both for reading when state was not public, and for re-constructing entities on read requests.

With the problem of gaining access to internal state solved it is pretty easy to write a repository implementation that reads and writes most of the data over one or several remote service calls and keeps the rest in a few database tables in a dedicated database. This split storage model is often a result from the fact that existing external services might not fully support the needs of the domain model. Another reason might be performance. In some cases we had to cache carefully selected data pieces in our own database to ensure timely retrieval. The beauty of this approach is that we can do all those tricks needed in the repository implementation without having any part of it leak into the domain model code; the separation of concern is total.

Published domain and services with business meaning

As mentioned above, CRUD-oriented services that let a client store and retrieve DTOs with getters and setter for every attribute push a great burden onto the developer of the client code. If using DDD to implement the service internally, we could do better by offering that domain knowledge to the client, packaged as a published domain and services that carries business meaning. A CRUD-service will never have a place outside the integration layer of the client, while the signature of a service well crafted in business terms might be a good candidate for a service API used directly in the domain model of the client. Of course in the shape of an interface with a small implementation in the integration layer to carry out the technical lifting of making a remote call. But the point is, the service signature can be used verbatim and the integration layer can be kept thin since the client domain doesn’t have to re-define what the service means in business terms; hard earned domain knowledge is reused.

The same goes for the business objects. If we publish those parts of the domain that can be used outside our service implementation, we offer the client-side developer to directly benefit from the domain knowledge we have gathered in building our service. But what parts can be published? Well, if you think about it, the part of a good API that is most useful is the limitations it imposes, i.e. the help you get to avoid doing stupid things. By publishing a model of domain objects as plain Java objects, with constructors and accessors ("getters", but in business terms and not necessarily following the Java Beans convention), we help the client-side developer to avoid constructing invalid objects and accidentally changing state that, from a business perspective, should be immutable. With only this much the client-side developer gets much better support than with only the raw data format offered by our stateless services. In addition it might also be appropriate to offer a thin layer on top of the service calls that that exposes the services in terms of these domain objects and takes care of transforming them to/from the raw format used to go over the wire.

Having a published domain gives the client-side developer the choice to either just model transformations of it into a more suitable model for the client context or, if contexts are closely related, decide to take on a conformist approach and extend the domain objects with additional functionality. I've done both in different contexts and it is so much better than having to experiment with DTOs to find out what are valid combinations of attribute values.

Conclusion

DDD is suitable for implementing domains on top of external services. It is also suitable for the implementation of such services, and if we carefully select parts of the domain model to publish we offer great help to the client-side developer.

In a system architecture using Domain Driven Design (DDD) you typically find a few typical building blocks (stereotypes) - Entities, Value Objects, Repositories and Domain Services – where the first two are stateful objects and the rest are stateless services implemented either as infrastructure services outside the domain (Repositories) or inside the domain containing only business logic (Domain Services). A common question is “When is it appropriate to design a service instead of placing the business logic in an entity or value object?”

For developers more used to building procedural designs, rather than object-oriented ones, it seems to be more natural to place logic in stateless services and have them operate on objects that are no more than data containers. This is what is commonly referred to as an “anemic domain model”. It is considered an anti-pattern in the DDD community since it decouples data from behavior and thereby produces a much less expressive and knowledge tense domain model.

I'm not a fan of stateless services in the domain model. Instead I try to favor bringing business logic into the Entity or Value Object that holds the information needed - in OO-terms it is called the "information expert". In most cases it isn’t that hard, especially if functionality is decomposed into short methods, each allocated to the object representing the concept at heart of the functionality.

A specific type of functionality that might be trickier to handle is entity creation. Who is to be responsible? For sure, it can’t be the entity itself. In general my experience is that there will be some sort of hierarchy between concepts. E.g. a SalaryPeriod might be connected to an existing RegistrationPeriod, which also contains submitted Timesheets. Then it is a good fit to have the RegistrationPeriod handle creation of the SalaryPeriod. So in general it is almost always possible to find a good place for rules regarding creation of an entity in a parent concept. The same goes for functionality that has to operate over several entities of a given type.

However, there might be cases where no suitable parent concept exists in the domain. That is one of the cases where I find designing a domain service appropriate. And there are others. Here is a small extract from a previous blog post of mine where I briefly describe another situation where I think domain services are a good choice:

"In general I think you could talk about two types of systems, or parts of systems; those mostly concerned with changes in object state and those mostly concerned with processing data streaming through the system. In the first case entities, aggregates and repositories are a natural fit, in the second I think transaction scripts (in DDD context called domain services, since they do only concern domain logic, no infrastructure code [..]) are a nice fit. When the most important feature is to crunch some data, perhaps modify it and then route it further to some recipient (like another system or some persistent store) I think it is the "processing pipeline", i.e. the stateless service code, which should be emphasized. So in those cases the internals of the data isn't very interesting and might be better left in some simple DTO format."

To make the list of appropriate service design complete I’ll end with adding a few lines on external services. These services get injected into the domain. In the domain I would have only an interface describing the service in terms of the domain. This is the way to integrate with surrounding infrastructure or other systems. It is the same pattern as with Repositories, in fact a Repository is just a specialized service.

Another example of an external service is when some part of the business logic, e.g. a calculation is broken out into a separate service, implemented using another programming language or paradigm. From a domain point-if-view it is as if we get that calculation service from another system instead of doing it ourselves. The reason might be performance or it might be that the logic already exists and we want to continue using it instead of re-implementing. However, each time this happens the maintenance burden is increased a bit, so I think it should be done with careful consideration.

To sum up, favor business logic implementations in the domain objects, not in services. It leads to a more elaborate and therefore more useful domain model which is easier to keep consistent than logic spread over disparate services. I’m convinced in the long run this approach makes maintenance of the system easier.

SE Thinking - Thoughts from a Software Engineer

Wednesday, September 26, 2012

DDD on top of Services, DDD inside Services - A Domain Driven approach to SOA

Saturday, September 1, 2012

When is it appropriate to design a service?