on deployments

You need to deploy software all the time. That's my believe, and here's why. Let's start at the beginning, though.

What is a deployment? For the discussion here it's basically the process of moving code from wherever it is to production. For software that means that hypothetical value turns into real value exactly at that point. It's when the Pizza gets delivered. All the activities before going to production are really pointless unless you actually roll something out.

Of course, production deployments are also a little scary. Deploying broken code might lead to the exact opposite of the intended effect and actually take value away from users, so exercising some caution is probably a good idea.

But a lot of organisations are outright scared when it comes to moving stuff to prod. Not concerned, not cautious - they dread doing it. And consequentially, a few things can happen, and all of them are not great – that's where I'm holding my strong opinion.

The first thing to go when you're scared shitless of deploying something is frequency. On some of the most critical things I've worked on, we deployed almost daily, and no one would be concerned about that. Moving fast allowed us to ship small increments with a boring regularity. We'd sometimes watch a bit more careful if a bigger change went live, but overall, it was just an eventless low-key stream of deployments that made sure we wouldn't sit too long on bug fixes, valuable improvements or anything else that we've been working on. If you stop doing regular deployments, you also stop deploying small things and rather move to deploying bigger change sets at the same time. This in turn also increases the risk of something in that change set breaking, so in an order to avoid stuff going wrong, you actually introduce more risk. So keep deployments very, very frequent. The smaller the units that get deployed, the more regularly code gets pushed out, the better.

The second thing that I've observed happening is the establishment of all sorts of approvals, checks or other red tape that makes sure a deployment has to go through a certain process before it can go live. And that can be as low-key as only a select group of people being able to actually approve a deployment or selected folks having to manually start a job to run the deployment itself. What that does is introduce a barrier to actually moving fast. Now you need to build something and convince the gatekeepers that something is safe. The thing here is - if something is not safe, or not tested, or doesn't adhere to the quality standards of the team, how did it make it way into the codebase in the first place? This is fixing the wrong problem - the right problem here is to make sure that all code that gets merged to mainline is fit to be released to production at any point in time. If you cannot trust that whatever is floating around your mainline can move to production, then that's a great problem to solve. Additional approvals are not.

The third thing is that deployments become a thing in itself. Deployments should ideally be happening regularly, triggered by anyone working on a team to ensure value makes its way to your users as fast as possible. The slower you make that process, the more gates and barriers you introduce, the more the boring deployment will turn into an event itself. When working at Adidas, I was entirely clueless when the last deployment happened, simply because we had them all the time. The antithesis to that is having deployments as a celebration. That comes with a fixed timeline, a bunch of performative tasks that add little to no value and, as a cherry on top, alignment meetings to align on a deployment. All in the name of making sure we're not doing anything reckless, but with the purpose of making it really hard, slow and rare to have things move to production.

What I feel is super important to speak about is the difference between Releasing something, and Deploying something. Deploying, for me, refers to the process of making technical changes available on the production or live environment. It's a purely technical thing, that doesn't necessarily come with any changes for the end users. I'm sure Google releases a new version of their search page all the time, I just never notice – it has little impact for me. Releases, on the other hand, are something that is very user visible - might be a new feature, an iteration on an existing one or any other user visible change. I don't like to mix the two up - while usually deployments contain the necessary changes to release something new, there's tools that allow the controlled rollout of new features independent of the technical deployment - things like feature flags, that allow the selective management of feature availability independent of the code that's deployed.

Coming back to my strong opinion, the goal needs to be that everyone in your team can trigger a deployment pretty much any time they feel a change is completed and merged to the mainline. You should have a way of ensuring, automatically, that new code doesn't mess things up. You should have a way to make sure a new deployment doesn't blow things up - again, automatically. Your team should know how to roll something back if a deployment didn't work (that happens, and shouldn't be a big deal).

There's so much more value in enabling all of the above - and with that a direct line between feature development and value creation for your users than making it super hard, awkward and expensive to getting code shipped in the first place.

Don't get good at solving the wrong problem, solve the right ones. And then: ship it.

write it down

Have an idea? Write it down. Made a plan how to tackle something? Write it down. Disagree with something? Write it down.

Writing is brilliant in that it does two things at the same time: It makes you express something in a form that is easy to consume for others, and it comes with a built-in commitment that what you mean at a given point in time moves out of your brain, into a rather fixed state. Writing is the single best thing you can do to enable true collaboration.

One of the highlights of working at Shopify was the writing culture. There was a text document for almost anything - and people would work in those documents. Comment, Redo, Share, Quote, Decide, Approve. At one point, it became second nature to simply start writing in a google doc and share as the document developed into a more presentable form. You moved from idea to thoughts, and then to a first draft. All within the same space.

It makes a ton of sense to work like that if you think about it. There's speaking or discussing work, referencing work and then there's the actual work. Writing things down is actual work, it costs time, it's tedious and it forces you to decide on many things - words are rather specific. Having a writing culture creates a shorter path between initial ideas and the progress on the way to a solution.

More importantly, there's the aspect of enabling asynchronous collaboration. You can read a document whenever you want to. The author can be offline, on vacation or simply refusing to speak to you - in a transparent organisation, artefacts are accessible and available to most people. That enables not only the consumption of one single document - super valuable. It also allows for the discovery of arbitrary documents that might help to gather more context for past and present decisions.

The alternative to having a transparent record of past activities is the move of actually having a call with someone that shares context for an hour. That also works, but it costs a lot of time, and whenever that person is not available - good luck.

Lastly, I personally feel that clear writing can only be achieved if your thinking is clear. I admit, my thoughts here aren't always as clear as I'd like them to be, but the general impression I'm having is that having to write something down helps me in structuring, sorting and clarifying my own thoughts. That usually leads to not only better writing, but also ultimately, better decision making.

Speaking of tools, I'd argue that whatever tool you're using, it should allow for some basic collaboration. Leaving comments inline, basic revisions and an overall ability to annotate content are crucial in creating documents that aren't just static repositories of information, but spaces for active collaboration, exchange and discussions. And ultimately, decision making. Google Docs, Confluence, Notion - all of those tools fit the bill. But whatever you decide on, just make sure you're actively using it. Saves a ton of meetings if you just write stuff down.

on problems, ideas and solutions

There's probably nothing cheaper than an idea.

Truth be told, I'm not a particular fan of ideas. They are one of the most necessary things you want to have in your engineering team, but ideas need to be carefully managed. They should operate in a space somewhere between problems and actual solutions. They're glueing two spaces together. But let's talk about problems first.

Whether it's something wrong in your codebase or something wrong with your product - there's a high chance that you have stuff that can be improved. Finding valuable things to focus attention to is a delicate and rewarding activity, and ideally it leads to some shared understanding of what problems your team ideally is focusing on. Personally, I found it rather valuable to spend time on discussing the problems the team observes, sharing knowledge about how we perceive impact and importance of certain aspects. A shared understanding of a problem space guides ideas - which can be a good thing.

What's an idea? An initial spark that might lead to a solution for some kind of a problem. "We might be better off looking at a NoSQL database for our object cache" is an idea. It's not a refined solution, but it's an approach on how to (potentially) tackle an ideally understood problem. And the fact that ideas are not yet bound by the real-world details that have to inform and influence the final solution is what makes them powerful - ideas are where you want to think big. It's also what makes ideas nothing more than rough directional aids - they are, or at least can be, far out there, making them not immediately applicable.

Where the value lies is in converting ideas into applicable solutions. Execution matters is a two-word combo you probably heard before, and it's true. Ideas are cheap, the magic is in executing. And in order to able to do so, you need to shape a solution. And then build that solution.

When you're high above the clouds in your ideation phase, it's easy to skim over real concerns or cut corners that shouldn't be cut when going live. Like wearing brand new sneakers for the first time, there's a moment when you have to commit to actually confronting the new nice thing with the constraints of the real world - and this is where the complicated decisions will have to be made. Deciding for NoSQL is simple, choosing a specific product, implementing that, weighing the differences and so on, this is hard. This is decisions that matter going forward, and they are both more impactful and less forgiving than dreaming about the next castle in the sky.

Ideas are nothing but glue - they are cheap, discardable, and anyone can have them. Value is in solutions to problems that actually exist. Don't discuss ideas. Discuss problems, discuss solutions, but see ideas for what they are. Glue for more important things. It's about building, not dreaming.

fast building

People will, at one point in your career, talk to you (or with you) about the broken window phenomenon. That is an observation that, if windows are left visible broken in part of a town, the surroundings will usually start to deteriorate at a faster rate than if someone took the time to fix the original windows at some point. It's usually brought up to make sure that some flaw is fixed before it leads to the recognition that mediocrity might, in the end, be acceptable. Which would, in turn, lead to more mediocre things.

I wholeheartedly agree with that sentiment. From a end product perspective. But there's also an angle here to consider when actually building the product itself. You want to be absurdly fast, at least theoretically, when building something. And now I want you to have an honest look at yourself - how long does your CI pipeline run? Is that (whatever that number is) the best you can really do?

Everything on this blog is naturally an opinion piece, but I guess this is more of an opinion piece than the others. And here it comes: Spend some time to make your CI builds really crazy fast. Like at most 2-3 minutes. Longer than 5 minutes is dubious, longer than 10 minutes is weird and longer than 20 minutes is just outright abuse of infrastructure. Let me remind you here that it's at least the year 2024 when you're reading this, and whatever you're compiling and building is probably not more complex than the Linux kernel - and that thing takes less than a minute to build on modern hardware. Adjust your goals accordingly.

If you've read more than 2 other posts on here you'll also realize that I'm mostly focusing on value, so let's get to the point on why it's so imperative to have fast builds: You don't want to have people waiting for machines. It's bad enough that we have people waiting on people - and that's harder to avoid at times. But there's no point to having people wait for machines. If you want to merge something, you should be able to do so in a few minutes, and if you want to deploy something, you should be able to do so in a few minutes as well. Make things fast. It removes friction, it removes idle times, it removes context switches. All of that stuff doesn't add value, is annoying as fuck and can easily be optimized away.


Well, I don't know too much about every stack in the world, but from a common sense point of view, start with only doing the bare minimum in every CI run. For a backend project that might be compiling, building an image and pushing that somewhere. Do you need Sonarqube, Linting and super slow tests for every PR? Probably not. I usually try to find a subset of tests that make sure that the most critical flows are covered, while deferring longer running tests to nightly cadences. Again, no one in their right mind is challenging the importance of automated tests, but your task is to weigh two things against each other: Is it more important to be able to regularly work fast with as little friction as possible, risking a broken build or some broken functionality every once in a while - or do you want to always play it safe?

Make it easy to build forward, make it easy to rollback. Don't make people wait on machines.

microservices and monoliths

Monorepos, Microservices, Shared Libraries and other things to get really excited about. Or not, depending on quite many things.

Let's talk about the dynamics between teams, what they build and how they deploy - and how choosing the right or wrong technology for that might help or hold you back.

Starting with a simple example - a single person building some kind of app or service. Most of us would probably start off with one single repository and a single service or app, since there's not much added value in spreading things across, especially if you're working on something all by yourself. Having everything in one place just keeps things simple.

Now, two things can happen that we should consider here. The first is that your app is experiencing some kind of crazy growth and you'll have to make sure it's able to scale really well. The second thing, and usually a consequence of the first one, is that your team grows, and you have to make sure that your project is able to handle a bunch of engineers working on the same thing. I'd also that both are good problems to have, and ones that can be solved in various ways.

The zeitgeist way of solving both of those challenges is to split things apart. You've probably heard of microservices at this point, and it's one of the many methods you can use to decompose one bigger thing - a monolith - into smaller units. If you're more working on client side projects, a usual way of dividing a bigger code base into smaller parts is to extract reusable (or pseudo-reusable) units into libraries or other forms of potentially shareable artefacts.

Now it's important to remind ourselves that splitting up a big unit of anything into smaller units is in itself introducing complexity, first of all. While previously, there was one service to deploy, it's now two or more. And while a change could previously done with a simple change in one repository, it might be spread across multiple places, introducing more work and cognitive load. That's not to say that a change like this is always bad - it's absolutely not, but it's usually not free. It's complexity that is useful to introduce if you're solving another problem by doing that. As previously discussed, that can either be an organizational problem - scaling up your codebase to allow for more folks working on it at the same time. Or it can be a technological issue - being able to scale parts of your application independently, allowing parts of your functionality to be reused outside of their original scope or other ways in which cutting out specific parts might be beneficial from an engineering, architectural or operational perspective.

Here comes the problem - the solutions aren't one size fits all, and their baseline cost is seriously high. Let's speak about microservices, specifically. If you're scaling from one to three services, breaking one monolith apart, you need to be aware that you now have three services to maintain, to run, to evolve, to observe and to regularly patch to make sure it doesn't expose any vulnerabilities. It's just a lot of work that you previously didn't have to do. So you need to make sure that the underlying problem you're addressing is actually big enough to make that worthwhile. How do you determine if it's worthwhile?

Start solving problems once you have them. If you haven't run into any organizational or technical issues yet, and it's not just because you didn't look closely - chances are it's a case of premature optimization. On more than one occasion I actually merged a microservice architecture into a monolith, simply because it removed the cost of a more distributed approach. And if the team is small enough, that's usually a good idea.

Another angle to consider is that of a deployable unit. If you have a bunch of microservices, but they are tightly coupled, chances are, you are not really looking at independent units of anything in the first place. There's the term of a distributed monolith for cases like this, and if you're dealing with something that would fit that description, it might be worth considering merging a bunch of your services into one bigger piece. Find something that is usually developed and deployed together - a good sign that stuff belongs together. As an example to make it more tangible - you probably don't care when the folks over at AWS deploy a new version of S3, simply because it's well abstracted, stable and you're not depending on specific changes in there for your application to evolve. If you have a category and a product service in your system, and a simple field change needs an aligned deployment, you might want to consider if those services are truly independent, or more part of a distributed monolith. Look at your deployable unit, and make sure it's easy as possible to develop code inside that deployable unit.

One thing I like to remind myself of: The one time cost of splitting up a grown monolith into smaller pieces once the problems of a monolith really start to manifest themselves are probably smaller than the accumulated costs of solving the problems of prematurely splitting up components without having the problems solved by that measure. Solve the problems you have, when you have them.

building and assembling

we software engineers build systems. Those can be small systems, big systems and anything in between. The most commonly associated activity that comes to mind when speaking about building systems is probably writing code. And debugging code, and testing code and deploying code. Of course, that's pretty much spot on - would be very pointless to learn all that coding stuff if you didn't need it. But I feel there's an important distinction to be made that we might not be doing often enough.

When building a system, I'm trying to be clear on what components I'm creating - that's the stuff I'm really building, and what part of the system comes to life by plugging things together. That can either be components that already exist, like a database or any other existing thing, or it can be something that has to be built specifically for the system that's, well, being built.

There's two hacks I'm aware of to actually build more, faster. The first is to be very critical of what to build in the first place - you'll create more value by focusing on what actually adds value, and not doing the rest. The second hack is to only build what actually, positively has to built. You don't get to software engineer faster by writing code faster, but by writing less code.

As a general rule of thumb, I try to avoid solving any problems coming my way by writing code. It's kind of a last resort. In order of preference, I usually like to first solve whatever problem hits me by using something that already exists. Need a fancy dashboard for business? Use metabase. Need a database? Postgres. Stream processing? Kafka. It's 2024, and quite frankly, the amount of good things that have been invented already is just staggering. Using great solutions is you standing on the shoulders of a giant.

My second approach is to reassemble whatever is in front of me. More often than never, existing systems already contain most, or all, capabilities needed, even if requirements change. This then comes down to reassembling existing components, repurposing logic or finding ways to extend functionality in surgical ways - without going full rewrite. Less code is still better than a lot of code.

If both approaches fail, writing new code is what needs to be done. But, like any software engineer, there's two people I think about a lot. That's me, and future me. And what that means is that I'm very mindful about making any new code easy to use, and reuse, in any current or future system. And that doesn't come down a lot to what programming language, technology or stack is being used, it's an exercise in interface design and coupling. What does that mean?

You want to create, and expose, interfaces that are as generic as possible without being overly abstract. Think of placing an order with some e-commerce service. A system for order placement should expose one method, one for placing an order, and not much else. And that's a good flight level for more than one reason. Firstly, everyone in the domain understands what that thing probably does. Secondly, you're making it easy to use, and reuse. Thirdly, there's low potential for leaking too many implementation details - and those leaks are making it really hard to use systems somewhere outside their first place of residence.

Having blocks like this make it really easy to design a system that is as much defined by what's written in code, specifically and inside of distinct units of functionality, as it is defined by how those units of functionality are wired together. Systems change, requirements change, and nothing is better than having a system that can almost be reconfigured to work with a changing environment.

This view of the world also makes me rather cautious of folks that output code like there's no tomorrow. Being able to write code, and potentially also fast, is of course not per se a bad thing - but you want to be selective on when to do that. Writing code that allows for seamless assembly is what you want, not just lines and lines of a cobbled together solution to a specific problem with little to no shape, thought or potential to grow. And probably that's why I don't call engineers coders.

I call them engineers.

on collaboration

there's few terms that mean so much and so little at the same time. Given my last post was a little biased on the "get shit done" side of things, I felt it was good to write a bit about collaboration. I don't mind collaboration, I think it's absolute key to getting meaningful things done - it's just very important to think carefuly about what kind of collaboration you want, need and can facilitate. And what kinds you probably don't need.

First of, and I've written that before, in my experience, groups are incredibly good at collaborating once the joint goal is clear - if everyone wants to achieve the same thing, the likelihood that folks will find ways of working together to achieve that goal is rather high.

Things get slightly more hairy once you need to establish collaboration between folks that might not share the same goals - short term value creation vs. long term clean architecture, as an example. While the superficial goal - getting something delivered - might be identical, the secondary goals are wildly diverging, even incompatible. How do you facilitate effective collaboration in a setting like this?

Truth be told, I actually don't know the answer to this one. But I can share some things that worked well in the past for me, and some truths that i took away for me.

First of, be clear on why you actually need folks to work together. Is it to increase the speed of something, parallelising work? This is often the case when sharing e.g. an engineering task between multiple people. Or is it because you need to make sure a decision is made in a balanced and as-informed-as-possible way? Or is it just common practice in your organisation that important decisions are usually not explored and made by individuals? All of those things are different modes of collaboration, and all work slightly different - and require different guardrails to make them effective.

First, let's speak about parallelizing work. You want to bake a cake, but to make it faster you hire two bakers. Now, entertaining that example, they would probably break up the big task into smaller goals and distribute them among themselves, leading to some speed up. Hint: they won't be double as fast as one baker. I generally feel that this is the easiest form of collaboration, and one where there's not too many things that can go wrong once the initial complexity has been resolved - how to split and distribute the work. Given competent individuals, the actual execution should be rather eventless. In settings like this, I'm trying to ensure that each person has the space and autonomy to be impactful, while ensuring they get the support from their peers should they get stuck. Coming back to the beginning, since the goal is pretty clear, groups of people are usually rather good in cases like this to collaborate effectively and find structures and self-organize in a way that's beneficial for the group and the outcome. It's easy.

What if the task is to find out what cake to bake? Now, that's more tricky. Way more tricky, in fact. You could also say it's very easy, as long as everyone agrees - and that's the fallacy here: is there such a thing as group decision making? There might be, but it's tricky.

Imagine you're putting the baker in a room with two people who previously ate cake and are now somewhat experts when it comes to cake. They have an animated discussion about what cake to make, and at one point, they vote. Against the advise of the baker they opt for a cherry cake, which the baker is not able to bake at this time. Perfectly good decision making that leads to a bonkers result. But is it better if the baker just bakes whatever he feels like baking? Probably also not, there's value in having a decision and considering input. So it's something in between, somewhere between a person calling the shots and a democracy?

Personally, when I'm not clear on how decisions are made in a given organization, group or situation, I tend to ask "who's deciding if we can't agree". And there's always someone. For real, there never was not someone. Make sure you know who that someone is, and clarify what the roles of everyone in the room are. People need to know whether they're only consulted, whether they need to make a decision themselves or whether they're just consuming oxygen in a particular circumstance. Most importantly, it needs to be clear who is bearing the accountability for any given decision. Clear roles, clear accountability. Fast and good decision making requires quite some organizational clarity. That might be hard to establish, but a lack thereof just means you'll make less decisions, you'll make poorer decisions and you'll have a good amount of disagreeing groups - simply because it's not clear who's deciding for whom.

While it seems hard at first to delegate specific decisions to specific folks, it's harder to not create this clarity in the long run. At the end of the day, groups never really make decisions, only individuals do. Empower them.

call to action

Forget consensus. Scratch making compromises. Fuck alignment. Don't attend that one call where twenty people with no skin in the game don't say anything meaningful, anyway. Build a solution in half the time it would have taken to figure out the right group of stakeholders.

Make it easy to rollback. Make it easier to deploy. Scratch your Role Based Access Management. Have one role for everyone. Let go of privileged access, be completely transparent.

Listen to what is said, not to who says it. You have two pedals. Concerns are the brake. Action is the accelerator. Guess which one is making you fast. Do something or don't do something, but don't half ass anything.

Stop working on useless things. Stop predicting the future. Stop optimising for problems you don't have yet. Solve the problem you have. Do not pass go.

Talk about the most important thing. If you do not know what the most important thing is, finding out what the most important thing is is the most important thing. Spend all your energy on that. Be in a hurry to create clarity. Do not defer important calls. Without focus, all the action is worthless.

Ideas are cheap, solutions are expensive. Don't talk about ideas, talk about solutions. Talk about how to get rubber on the road, not how to build castles in the sky.

Accept disagreement. Encourage disagreement. Find disagreement. It means that a decision has been made. It means decisions are made. Progress is a sharp tool, and it occasionally cuts through the fabric of a team. It'll heal.

Give people a chance to prove you wrong. Prove people wrong. Confidence needs space to grow. Ownership needs owners. Both needs autonomy, not control. If you can never be disappointed, how can you be delighted. Let folks think outside the box. Trust is the loose coupling of organisations. If you trust the right people, magic will happen. If you trust the wrong people, don't trust them again. Be honest. Don't lie. Don't work with people that lie.

The wrong action is better than no action. Optimise for as few wrong actions as possible, not as few actions. Being cautious doesn't add any value. Being cautious sounds smart. Building things is smart.

Find people that get things done. Help them get things done. They'll help you get things done, as well. Be sneaky if necessary. Always be helpful.

Doing the wrong things right is worse than doing the right things wrong. Help to make things better. Stop wrong things.

Hindsight is 20/20. Progress comes with change, and change can be a risky thing. Things breaking means that things are changing, and that's good. Help that fewer things break, not that fewer things are changed. Never judge people for honest work. Don't point fingers, don't say names. Be part of the solution.

Be the group, and win in the group. No one is special. Your job title doesn't matter. Getting meaningful things done does. Speaking about the work is not work. The work is the work.

Let's get something done.

solve the problems you have

Let me take you along for some kind of thought experiment. What if you completely ignore any kind of architectural - or system design decision making in your next project and only do one thing: focus on the next problem you actually have, and solve that. So instead of planning how your piece of software should be looking like when you're done, just let the shape and architecture of the final thing be up to chance.

This of course sounds slightly radical. Normally, we're trying to predict how the future will look like, and make choices regarding architecture, the name and the interactions between our components ahead of time. But we get that wrong just so often, I'm wondering what would be happening if we just try to stop doing this crystal-ball magic altogether.

Just imagine.

You're starting a project, and for every decision along the way, you just choose the easiest, the lowest hanging fruit, as a solution. Need a HTTP Server? Google what the most common HTTP Server for the stack you're using is and roll with it. Need persistence? Start with files or a common database. Need more functionality? Add it. Let the path you're taking be dictated by where the lowest fruits are hanging.

This of course sounds dangerously reckless. But is it? If I'd lay the situations in my career where I had to work with a mess that was caused by proactively trying to address problems you didn't have in the first place next to situations where someone forgot to solve a real problem they're having – the former one would have a super strong lead. I've worked with Microservices in Organisations that would struggle to work on a single project, I've dealt with the fanciest NoSQL-Databases in situations where Postgres wouldn't have reached more than 1% CPU utilisation. Yet in both cases, the cost of introducing either of those solutions was measurably really higher than just doing the more intuitive thing (RDBMS and Monolith, to be specific).

I've got a theory that explains both why Ruby on Rails is not popular and why we don't focus on the real, as opposed to the imagined, problems. We'd realise that we're mostly dealing with solved problems, and that there's only a very limited level of excitement building a CRUD-Service in 2024. But since we enjoy reinventing the wheel, and since overcomplicating is a very direct path to endless job security, we do just that - building things that are more complicated than they have to be, instead of just solving whatever problem is at hand.

Of course, there's situations where you still have to scale and didn't prepare for it – great! Solve that problem once it's a problem you're actually having, and not one that you just hallucinated. Because then it's really just a problem that you actually have, which is the only valid reason for starting to solving something.

Real problems. Boring solutions.

don’t add that button

We very recently bought a car. And I was surprised how many models are on the market that still use manual shifting. I know how to drive them, probably quite well even, but you couldn't get me to buy one of those. The machine is just much better at shifting the gears up and down than a human. And also, it's not an activity I enjoy doing. Which brings me to the point of this post.

Every system that reaches some level of complexity has some operational dimension, at some point. Whether that's only some regular database maintenance or more involved tasks depends entirely on the system, but let's pretend for a second we're looking at a system that deals with loading product data from a source and providing said data to other services using something like HTTP and JSON. Of all the services I've worked with, this one I've probably seen the most.

As you go and build this system it evolves from just being a database that gets populated by running a job every once in a while to something more complex. You figure that doing a request to your database for every single GET might make things slow, so you add some object caching. That's fun, and it's also really helping with your performance. As you go, you add more bells, more whistles. There's nothing wrong with that, but there's one thing to be very vigilant about: the first button.

In our hypothetical example, imagine you find out about an edge case where the object cache does not always get purged correctly when new data wanders into the system. It's an edge case, doesn't happen too often and really, there's 5 more important items on your to do list. So you add a button. Or a how to in your internal wiki. Some means of manually resolving that problematic situation. And that's the first button. Don't build the first button. Why?

You do not want to build manual controls for anything that the machine can do without input from a human. Purging a cache doesn't require any input from a human. There's no parameters. It's simply "fix the situation". Once you start to build controls for things that the machine should have been doing in the first place by itself, you actually build a new kind of solution – one that contains one or many humans as orchestrators. And that's the worst kind of solution, since you're doing two things. Firstly, you're introducing a really unstable, non-deterministic and not always available element into your system. That's generally not a good idea. The second problem is: it just doesn't scale. If it was only about one button, fair. But people need to understand what conditions are problematic, how to detect them, and then go into a system to resolve something. That's a lot of training and contextual knowledge that needs to be shared.

Build systems that run themselves. In our example here, the very moment when you detect that there's a problem with cache invalidation is the right moment to fix that cache invalidation. This can be fixed, it just needs the right investment. Might take a little longer, but fixing this problem right there is the right solution – not having someone push random buttons.

Build robust systems, not fancy buttons.