Thursday, November 17, 2022

From six months release cadence to continuous deployment

Since it has been a while, almost ten years, since I blogged about what I'm doing at work I thought a bit of a summary would be appropriated. I don't intend to talk you through every task I've done since no one would be interested in that, but I thought perhaps it might be interesting to some of you to read about me joining the Swedish Pensions Agency (Pensionsmyndigheten, in Swedish) eight years ago and how we collectively since then managed to move from an enforced three, but in reality six, months release cadence for all systems to a continuous deployment scheme where individual teams and the part of the business they serve is in charge of how and when new and changed features are brought to production.

Disclaimer: This is my story, my perception of how it was, what happened and what it has become. Many of my current and former colleagues who are/were involved might have a slightly, or totally, different view. I'm writing this with a hope it will inspire others to discuss and question matters of organisation and software development processes and how they affect the daily joy and development efficiency of an IT-organisation.

How it was 8 years ago

In 2014 I decided I was done with consulting, at least for now, and wanted to try out being an employee of the organisation where I spend my working days. I found a position as systems architect/developer within the development side of the IT-department at the Swedish Pensions Agency. The ad spoke about an agile and forward looking agency basing their development on Java and open-source tools I liked. At that time we were about 80 developers and testers maintaining and developing systems where some had been around a good ten years already, integrating with even older systems with their roots in the 80's or 90's.

We were all divided into three, about the same sized, silos without much direct communication, each managing a related set of systems. Communication and coordination was mainly handled through people being included in smaller or larger projects. Releases to production were mandated to happen every third month (for all systems synchronized), with a good set of manual end-to-end-testing leading up to it. Any release between those where to only include really critical bug fixes. And yes, we did a fair amount of those.

In my silo, "case management", we were further divided into people (most of us) working in the ever (it seemed) ongoing "program for case management" with a six month release cadence of new functionality, and the maintenance team (the rest of us). In the maintenance team we got a load of new functionality (and code) to maintain every six months and most of what we did was fixing bugs in someone else's code and then spend a disproportional amount of time merging the fix into branches representing the release due in a month's time, the one four months away and sometimes also the one due seven months into the future since development had already begun there.

We used daily meetings (standing!), but no one would call that "agile" today, and I don't think no one really did back then either. The "agile" in the ad I think was a more correct description of the state of mind or spirit within the organisation which came as a very positive surprise to me. There were already a great number of small experiments (or "pilots" as they were called) running in the organisation and whenever a new idea was presented by someone the answer was almost always to try it out as a "pilot". Moving to use Jira for internal organisation of work was one of those "pilots" that came to stuck.

What happened

In beginning of 2015 we were all pretty fed up with the current situation. One of our neighboring silos, "external web", started up an official pilot to try out agile delivery with self organized, cross-functional teams and a six weeks planning horizon. At the same time they had started moving towards a brand new architecture with microservices and continuous deployment pipelines. Despite maintaining a large monolithic system with about half a million lines of code and no ambitions, nor opportunities to change the architecture, we still wanted to find another way of working also in the "case management" silo.

With support of first line management we organized a first workshop aiming to define what "awesome" would look like with respect to development organisation and delivery process within our silo. The result described cross-functional teams, each focused on one or two subdomains, responsible for both developing new features and maintenance tasks, working in a continuous delivery manner where changes were deployed to production weekly or more frequently when need arise. Many would say going to production weekly isn't continuous delivery. I think the important thing is to be able to do it more frequently, but actually doing it is a business decision and they thought, and still do, that once a week is great for the internal user base of case management clerks.

In cooperation with first line management we formed a small team to meet once or twice a week to discuss, develop and coordinate our abilities to meet this description of "awesome case management". We defined roles and teams. We engaged with parties outside our silo to find out boundaries for our new processes and to create ways to cooperate. We enhanced our technical abilities.

In order to do deployments to production every week, instead of just four times a year, we had to do those deployments during office hours, which meant they had to be zero downtime from a business perspective. We also had to make the whole process, from build and test to creation of release notes, automated. Luckily the system already had a heavy set of automated unit and system level tests which made us confident that if all those passed we were good to go to production. With a small team of dedicated engineers, helped by the fact that the whole agency at the same time moved sources to Git/Bitbucket, we built Jenkins pipelines, integrated with Jira and Bitbucket, to manage building artifacts, running tests and creation of release tickets, including information on all changes going into each release.

After working out the organisational, process related and technical issues with everyone involved we turned our silo into the new team structure and ways of working in march 2016. After a bit over a year of discussions and preparations we stopped moving people to work and stared moving work to people (teams). And we didn't look back. Of course it wasn't perfect, but with continuous improvement processes on several levels we were able to improve as needed. The area where we had the most problems was in prioritization and scheduling of larger activities, all coming from differently sized projects still run in parallel across the agency. This was a bit outside our reach, since it crossed several silos. We needed something more.

Luckily, with two silos having turned their organisation and way of working around in similar ways there were a pretty good case for the agency to continue forward and find new, more agile, ways for the whole development organisation. SAFe is a framework for prioritizing and coordinating work across many teams in an agile and lean way and since our new practices heavily resembled team and delivery practices in SAFe it came natural for the agency to decide to embark on that route. Of course there where a lot of education and discussions to be done, especially with upper management, but since the director general of the agency was a firm proponent decisions were made in due time and a first SAFe Agile Release Train (ART) was started up as a pilot late 2019.

What it has become

After about a year of trying out SAFe practices and organisation structures, as well as changing to full time remote work during the pandemic, the organisation decided we had gathered enough experience to have the whole development organisation go full-on SAFe. About one and a half year later we are running five ARTs and a few supporting groups for tooling and very specialized services. Each ART is organized around parts of the agency's rather broad business domain and employ both business and IT-specialists in their teams and management groups. SAFe's lean portfolio management helps us organize and prioritize all the large or cross-ART initiatives and the time of projects fighting over individuals for work on full- or part-time is forever over.

With the pandemic no longer directing how society works we have been able to employ both on-site PI-planning events and frequent "teamdays" at the office. All our gained experiences over the past two years have proven to us that SAFe practices works both in an on-site and in a fully remote setting. Our ARTs and teams now operates with a mix of working remote and being at the office. This gives everyone a great deal of personal freedom and also lays the foundation for team being able to function despite being split over offices in different parts of Sweden.

Also the infrastructure side has continued to develop. With a modern cloud platform and CI/CD-tooling, including a formalized microservice archetype, it's now possible to spin up new services in a few days and have them deployed to production whenever the business sees fit, several times a day if needed.

Conclusion

It has been a really interesting, sometimes frustrating, but overall joyful journey for eight years. We have moved from projects throwing software "over the wall" to teams taking full responsibility for their systems and services all the way to production, from endless discussions over priorities and attempts to maximize utilization to pull-based planning and focus on throughput, from excessive merges and an in-reality six months release cadence to releasing new features as they are ready and we still want more. We are each day looking for ways to make our services better and our organisation a bit more efficient and capable. If you stay tuned chances are you may hear of some of that on this blog. 

If you happen to be living in Sweden (and talking Swedish) and fancy come working with us, checkout our current job openings. Over the years our IT-organisation has grown, and it continues to grow. Our society are in the middle of the digital transformation and IT is at the heart of our agency's business. I think we will see many interesting and challenging opportunities going forward.

Saturday, November 5, 2022

Discovering the fediverse

Over the past week, while being on autumn leave, the resent events over at Twitter and the reaction to this by people I follow in the Java/Kotlin/JVM space have sparked a new interest in social media for me. I've been on Twitter since late 2010 but never tweeted much. Most of the time I've been there to keep up-to-date with new versions of software I'm using at work or at home and to find links to published conference talks recommended by people I follow. For long periods I haven't been logged in at all, just because it hasn't been interesting enough, I suppose. I also suppose it is partly on me since I haven't engaged enough, but I also think the advertisements and the algorithm deciding what tweets to push have made me feel not really at home. In addition, whenever going even slightly outside the Java/Kotlin/JVM bubble into tweets on the climate crisis or local politics, I immediately end up in the middle of the dumbening, anger and hate filled threads that social media in general seems to be full of today.

Last week I started to see tweets about people setting up accounts on something called "Mastodon". "Just in case", they said, with no more explanations. However, the starting point for me this week was Martin Fowler's writing on his own explorations into "Mastodon" and the "Fediverse". I'm not going to reiterate his writing here, instead I urge you to go for a read yourself. It sparked my interest and gave me some useful information going forward with my own discovery.

Another good source of information I found is Per Axbom's "A Brief Mastodon Guide for Social Media Worriers" and other posts linked from this one. It really opened up the previously hidden "Fediverse" to me, not hidden because it is in any way secret, just because I didn't have a clue about its existence. It turns out a network of community driven, connected (federated) servers running open-source software offering decentralized social media, including micro-blogging, live streaming, video and photo sharing, have been growing since about 2016 (as far is I understand). All using the ActivityPub protocol to exchange data as per their users needs, much like e-mail servers have been doing since Internet was born.

Even though the fediverse is large and diverse with many different types of social media as (surely not exhaustively) mentioned above I have myself so far only explored Mastodon for micro-blogging. At the beginning of the week it was a network of about 3,100 community run servers hosting about 500,000 users. Less than a week later, as I write this, it has grown with another 200 servers and 180,000 users. It seems there was an upsurge in usage also in April when Elon Musk's buying of Twitter first became official and then now once again when his takeover is a fact. All this puts great pressure on both servers and administrators having to cope with more traffic and more moderating duties. Remember most of them are just volunteers hosting and moderating in their spare time. Some of the servers are really big and seem to attract most of the new users. However, since they are all federated together you can follow anyone in the whole fediverse regardless of which server you are at (That is not entirely true since server admins tend to exclude federation with servers known for spamming or other non ethical content. However, I consider that a good thing!). We should hope for more servers being started by communities, organisations and corporations wanting to be part of the larger community. It seems it would be better for everyone if the load is spread horizontally.

For us as new users it boils down to finding a server matching our interests and values and then set up an account or ask for an invite. I used the official moderated server list at joinmastodon.org/servers to find a suitable one. But I'm sure there might be other ways to do it. My interest in programming and open-source led me to Fosstodon.org which seems to have a really great set of server rules. When I had my first look at the server it was open to registration of new users, but about a day later when I felt ready to dip my toe and join in it had changed to "Request an invite". A bit disappointed but nevertheless determined to give it a try I filled in the form, including answering the question on why I wanted to join the server. To my surprise and happiness it was only about 30 minutes before I got my invite and could join the server for real. It turns out switching to "invite-mode" is a way for server admins to keep spam accounts created by bots from sneaking in with the stream of new people currently joining Mastodon.

Now I have filled my profile and thereby set up my new digital home at fosstodon.org/@se_thinking and started to find people I want to follow, both on Fosstodon itself and on other servers. One really nice thing about Mastodon is that in addition to curating your own home feed with people you follow and filters of things you don't want to see you can also follow and interact with the local stream of all "toots" (messages) from all users on your local server. With a themed server like Fosstodon where most users share a common interest this seems to give you two interesting streams. You can also follow the federated stream of toots from all servers followed by any user on your server. On Fosstodon, which has almost 30,000 users, there is a good chance we collectively are following at least one user on a great number of other servers which makes the federated stream massive. I haven't quite figured out how to use it or if I like it yet.

So, am I to quit Twitter now? For the moment I haven't cut anything, I've just added Mastodon to my collection of social media, but I'm pretty sure I'll spend more time on Mastodon and less in any of the others. Perhaps a close down of my Twitter account might happen in the future if I see it not adding any value.

Another interesting side effect of my activities this week is that I've rediscovered my old blog. And here I am, writing a new blog post for the first time in almost ten years. Is this a one time wonder? I don't know, but my newly found ambition is to take up blogging again. I think it would be fun to blog about what I'm currently doing at work, such as Software Architecture Visualizations using Simon Browns C4 model and Structurizr, Consumer Contract Testing with PACT and growing Development Efficiency within our IT-organization, among other things. Only time can tell if I will succeed but if you are interested in finding out follow me on Mastodon (or on Twitter).