From six months release cadence to continuous deployment

Since it has been a while, almost ten years, since I blogged about what I'm doing at work I thought a bit of a summary would be appropriated. I don't intend to talk you through every task I've done since no one would be interested in that, but I thought perhaps it might be interesting to some of you to read about me joining the Swedish Pensions Agency (Pensionsmyndigheten, in Swedish) eight years ago and how we collectively since then managed to move from an enforced three, but in reality six, months release cadence for all systems to a continuous deployment scheme where individual teams and the part of the business they serve is in charge of how and when new and changed features are brought to production.

Disclaimer: This is my story, my perception of how it was, what happened and what it has become. Many of my current and former colleagues who are/were involved might have a slightly, or totally, different view. I'm writing this with a hope it will inspire others to discuss and question matters of organisation and software development processes and how they affect the daily joy and development efficiency of an IT-organisation.

How it was 8 years ago

In 2014 I decided I was done with consulting, at least for now, and wanted to try out being an employee of the organisation where I spend my working days. I found a position as systems architect/developer within the development side of the IT-department at the Swedish Pensions Agency. The ad spoke about an agile and forward looking agency basing their development on Java and open-source tools I liked. At that time we were about 80 developers and testers maintaining and developing systems where some had been around a good ten years already, integrating with even older systems with their roots in the 80's or 90's.

We were all divided into three, about the same sized, silos without much direct communication, each managing a related set of systems. Communication and coordination was mainly handled through people being included in smaller or larger projects. Releases to production were mandated to happen every third month (for all systems synchronized), with a good set of manual end-to-end-testing leading up to it. Any release between those where to only include really critical bug fixes. And yes, we did a fair amount of those.

In my silo, "case management", we were further divided into people (most of us) working in the ever (it seemed) ongoing "program for case management" with a six month release cadence of new functionality, and the maintenance team (the rest of us). In the maintenance team we got a load of new functionality (and code) to maintain every six months and most of what we did was fixing bugs in someone else's code and then spend a disproportional amount of time merging the fix into branches representing the release due in a month's time, the one four months away and sometimes also the one due seven months into the future since development had already begun there.

We used daily meetings (standing!), but no one would call that "agile" today, and I don't think no one really did back then either. The "agile" in the ad I think was a more correct description of the state of mind or spirit within the organisation which came as a very positive surprise to me. There were already a great number of small experiments (or "pilots" as they were called) running in the organisation and whenever a new idea was presented by someone the answer was almost always to try it out as a "pilot". Moving to use Jira for internal organisation of work was one of those "pilots" that came to stuck.

What happened

In beginning of 2015 we were all pretty fed up with the current situation. One of our neighboring silos, "external web", started up an official pilot to try out agile delivery with self organized, cross-functional teams and a six weeks planning horizon. At the same time they had started moving towards a brand new architecture with microservices and continuous deployment pipelines. Despite maintaining a large monolithic system with about half a million lines of code and no ambitions, nor opportunities to change the architecture, we still wanted to find another way of working also in the "case management" silo.

With support of first line management we organized a first workshop aiming to define what "awesome" would look like with respect to development organisation and delivery process within our silo. The result described cross-functional teams, each focused on one or two subdomains, responsible for both developing new features and maintenance tasks, working in a continuous delivery manner where changes were deployed to production weekly or more frequently when need arise. Many would say going to production weekly isn't continuous delivery. I think the important thing is to be able to do it more frequently, but actually doing it is a business decision and they thought, and still do, that once a week is great for the internal user base of case management clerks.

In cooperation with first line management we formed a small team to meet once or twice a week to discuss, develop and coordinate our abilities to meet this description of "awesome case management". We defined roles and teams. We engaged with parties outside our silo to find out boundaries for our new processes and to create ways to cooperate. We enhanced our technical abilities.

In order to do deployments to production every week, instead of just four times a year, we had to do those deployments during office hours, which meant they had to be zero downtime from a business perspective. We also had to make the whole process, from build and test to creation of release notes, automated. Luckily the system already had a heavy set of automated unit and system level tests which made us confident that if all those passed we were good to go to production. With a small team of dedicated engineers, helped by the fact that the whole agency at the same time moved sources to Git/Bitbucket, we built Jenkins pipelines, integrated with Jira and Bitbucket, to manage building artifacts, running tests and creation of release tickets, including information on all changes going into each release.

After working out the organisational, process related and technical issues with everyone involved we turned our silo into the new team structure and ways of working in march 2016. After a bit over a year of discussions and preparations we stopped moving people to work and stared moving work to people (teams). And we didn't look back. Of course it wasn't perfect, but with continuous improvement processes on several levels we were able to improve as needed. The area where we had the most problems was in prioritization and scheduling of larger activities, all coming from differently sized projects still run in parallel across the agency. This was a bit outside our reach, since it crossed several silos. We needed something more.

Luckily, with two silos having turned their organisation and way of working around in similar ways there were a pretty good case for the agency to continue forward and find new, more agile, ways for the whole development organisation. SAFe is a framework for prioritizing and coordinating work across many teams in an agile and lean way and since our new practices heavily resembled team and delivery practices in SAFe it came natural for the agency to decide to embark on that route. Of course there where a lot of education and discussions to be done, especially with upper management, but since the director general of the agency was a firm proponent decisions were made in due time and a first SAFe Agile Release Train (ART) was started up as a pilot late 2019.

What it has become

After about a year of trying out SAFe practices and organisation structures, as well as changing to full time remote work during the pandemic, the organisation decided we had gathered enough experience to have the whole development organisation go full-on SAFe. About one and a half year later we are running five ARTs and a few supporting groups for tooling and very specialized services. Each ART is organized around parts of the agency's rather broad business domain and employ both business and IT-specialists in their teams and management groups. SAFe's lean portfolio management helps us organize and prioritize all the large or cross-ART initiatives and the time of projects fighting over individuals for work on full- or part-time is forever over.

With the pandemic no longer directing how society works we have been able to employ both on-site PI-planning events and frequent "teamdays" at the office. All our gained experiences over the past two years have proven to us that SAFe practices works both in an on-site and in a fully remote setting. Our ARTs and teams now operates with a mix of working remote and being at the office. This gives everyone a great deal of personal freedom and also lays the foundation for team being able to function despite being split over offices in different parts of Sweden.

Also the infrastructure side has continued to develop. With a modern cloud platform and CI/CD-tooling, including a formalized microservice archetype, it's now possible to spin up new services in a few days and have them deployed to production whenever the business sees fit, several times a day if needed.

Conclusion

It has been a really interesting, sometimes frustrating, but overall joyful journey for eight years. We have moved from projects throwing software "over the wall" to teams taking full responsibility for their systems and services all the way to production, from endless discussions over priorities and attempts to maximize utilization to pull-based planning and focus on throughput, from excessive merges and an in-reality six months release cadence to releasing new features as they are ready and we still want more. We are each day looking for ways to make our services better and our organisation a bit more efficient and capable. If you stay tuned chances are you may hear of some of that on this blog.

If you happen to be living in Sweden (and talking Swedish) and fancy come working with us, checkout our current job openings. Over the years our IT-organisation has grown, and it continues to grow. Our society are in the middle of the digital transformation and IT is at the heart of our agency's business. I think we will see many interesting and challenging opportunities going forward.

SE Thinking - Thoughts from a Software Engineer

Thursday, November 17, 2022