strong things and weak things

A little while ago a coworker pointed to the problem that he doesn’t understand how I come up with decisions, what my criteria are. There’s probably a million answers to that, but since the topic was system design, it got me thinking. What are my guardrails, my thought processes when it comes to designing systems of any size?

I went through a lot of specific examples in my head and reflected on the decisions I made or the beliefs I held in those situations. It was an interesting exercise, and it led me to learn two things about myself. The first is that I’m designing systems mostly based on a very small set of convictions – and that those convictions changed over time, at least to some extent. You could call also call those convictions my “system design core beliefs”. There will probably be more than one post on those. Today it’s about strong systems, and weak systems.

Systems, at least the more interesting ones, consist of multiple components. Sometimes two, sometimes a hundred. Each of those components has a different reason for why it exists. Some are databases, others are queues, some are backends, some are CDNs – they are different, and through the virtue of how they are built and ultimately connected, they form a system.

One of the mental models I apply to group components inside a system is to think of weak things and strong things. Let’s dive into strong things.

You want to build your system around some central ideas. That helps you both to have the right conversations early on – and it helps to create tons of clarity. It’s as much a vehicle for driving organisational progress as it is a technical driver. A central idea might be to store all data in one big Postgres, or a central idea might be that most inter-service communication is happening through Kafka.

Whatever you need to execute on your central ideas are things that need to be robust, strong, reliable, resilient. When all of your business data resides in a Postgres, this is where you don’t want to let the intern setup a Postgres server yourself. You want to make sure that those systems are as stable as they can possibly be. They are central, and being cheap on the foundation of your design is just not a particularly smart choice.

On the other hand, your weak things are everything that is non-critical. Things that are allowed to fail intermittently. If your site isn’t reachable for 3 minutes it’s not awesome, but it happens, and your business will likely not even notice the interruption. One of the fundamental freedoms of weak things is that they are just far less critical. They are the “break things” in “move fast and break things”. Everything that is not part of the central ideas is a weak thing.

Good systems are very intentional on what their central ideas are, and they make sure that two things are true at the same time: you’re strict about the strong things. Governance, great observability, on-call rotations, painful decision making processes and a rather slow pace of change - you name it. But you’re also really liberal about the weak things – they are the components where you can move fast, iterate and make progress.

So when you’re looking at what you’ve built, ask yourself: what was your central idea?

the thing about deleting code

Refactorings, clean ups, consolidations – doesn’t matter what you call it, but there’s something calming about finding ways to touch code without the intent to functionally change or improve it. It’s like cleaning the workshop. And much like cleaning the workshop, there’s an art to focussing disproportionately on actually: deleting code.

Deleting code is the real-world equivalent of taking stuff to the trash bin. People who do not regularly take stuff to the trash bin are either dead or widely considered as hoarders. Both is concerning.

Actually going there and throwing stuff out is both incredibly satisfying and not exactly easy to get started with. But not doing it is even harder in the long run, so the best time to pick up the habit is probably now. What is a good place to start? Let me explain my thought and action process.

I consider myself occasionally to be a rather simple person: I’d take every opportunity to get rid of something that I personally just don’t like. Code is very easy to either like or not like: very obfuscated, poorly documented hot mess of nested if-statements? That’s a promising candidate for something that just needs to go away. There’s often no point in trying to save something that’s beyond salvageable. To keep it brief, there’s a three step process that kick off then:

First, you extract whatever useful you can find within the can of shit. That might be a lot, or not much at all, but start over in a fresh file or module or library and just copy what you need. You’ll probably feel the exciting spark of not having a single useful test helping you in that, and that’s ok – no one knows that the current version worked in the first place, so you’re probably fine. That copying is the most liberating part, and it gives you the opportunity to start fresh.

Now that you’ve got a fresh implementation of whatever was in FreakShow.java, it’s time to point everything to your new thing. Since you’ve ideally not just moved code from A to B and changed tabs to spaces, you’ll likely spend quite some time here to update invocations, mocks and the general usage of your new component. That’s fine, and it generally can be a rewarding activity.

Phase 3 is the march to the trash bin: You just get rid of the old stuff. Delete it, move it away. As a little guide for myself, I’m trying to half the code – if you’re just beginning that process in your code base, you’ll probably be able to reach that point quite often. More mature code bases might not benefit that much from frequent deletion interventions, but your mileage will vary.

It’s much more liberating for your codebase, for your skills and for the overall quality of what you’re potentially delivering to create a habit of starting new in places that need this level of attention. It’s a super effective muscle, and one that comes with a lot of fun when properly used.

Happy Deleting!

the thing about excitement

Every team emits or radiates some kind of energy. Some teams are lethargic, others are friendly. There’s cold teams, teams that aren’t teams but more a collection of ICs, pretty much all states that social groups can have, you find in teams as well – since they’re pretty much social groups to begin with. Groups, in my humble experience, have their very own structures on what behaviours they appreciate, and therefore encourage, and what behaviours they choose to ignore, or at least not especially respond to. The culture of a team has everything to do with the sum of all behaviours that are encouraged and, similarly, discouraged.

What are behaviours? It’s whether you start with documentation or start with code. It’s whether you setup a meeting or a repository. It’s whether you openly address conflict or give it time and space to grow behind someone else’s back. All that is behaviours that shape how a team functions, and also how well the people within this team can do their job.

Now, behaviours and actions are somewhere on a spectrum from very super fucking easy to really, really hard to do. Giving direct feedback to someone that you’re struggling with the way they interact with their peers in a meeting is hard. It’s not easy. Just not doing that and bitching behind their backs is relatively easy. Constructively resolving disagreements is hard, ignoring or avoiding conflicts and eventually just not collaborating is rather easy – at least in the moment. Doing the proper fix, and not just throwing more code on the big ball of mud is much harder. I could go on just trying to make the simple point that quite often, the thing that takes more effort is the right thing to do.

Teams can be graded on how often they choose to do the right thing, instead of the easy one. One the indicators I look at to understand where a team is on that spectrum are: incidents. Both how often they occur, and how teams respond to them. Ideally, incidents are both rare and, in essence, addressed in a structured way. Think playbooks. I’ve personally been part of too many teams where incidents felt like the most – positively – exciting times during work hours. Suddenly, all hands showed up on deck, demonstrating both their commitment to the cause as well as their incredibly deep knowledge of the system. While that certainly helps in any case to resolve the incident of the day, it’s wrong. It always leaves me with the feeling that this is the wrong kind of excitement. Mature teams don’t celebrate incidents. They do everything they can do prevent them. And if they happen, the dominating mood isn’t one of excitement, it’s one of professional tension paired with a high level of structure and focus.

But why do I think you should not be excited? Well, think of a fire fighter. If your house was on fire, and the fire brigade rolled in – and you could see that the folks are actually having a good time, how would you respond? It’s probably natural to feel some form of excitement, but it’s the hard thing to regulate yourself, realising the urgency and impact of a situation, and then acting accordingly. Being organised, calm and structured in a moment that would otherwise invite a chaotic storm of ideas is hard, but it’s just the right thing to do. There’s too many post mortems out there from where teams tried to fix stuff and only made it worse. Too much excitement, too little thinking. I’d like to throw in the fact that we have a word just for that in the German language: “Verschlimmbessern”, for when you actually make matters worse when trying to improve something.

So while I enjoy spending time with a bunch of folk getting very excited about a broken system and the subsequent opportunity to show off their raw skill – almost every other way to be excited is better than that one. Stop, collect yourself, and solve the problem like a grown up.

the Tetris approach to technical debt

When dealing with software systems in any of the larger organisations I was working, legacy things are a big topic. Whether that’s a big legacy system that is basically immutable, or just really complicated components of a system that is actually still kind of in development. There’s some attributes that those systems, in my view share. They have been built up before the current team started to work on the current thing – they’re coming from a different time. That also means that there’s only a lose sense of ownership towards that legacy thing – it might be good or bad, but it’s certainly not mine. Another one that makes me designate something as a piece of legacy technology is that the assumptions that lead to the thing being built, bought or not abandoned have changed significantly since its inception. So even though a system might still be absolutely relevant day to day, no one in their right mind would go and build the same thing again. A typical case of this is a system that has outgrown its original requirements. Imagine starting a small business and using Excel for bookkeeping and tracking financials. Once you scale to a few millions in revenue and have more employees than fingers, you’ll realise that you probably wouldn’t have chosen Excel if you knew then what you know now. Hindsight, you know.
With that lengthy explanation out of the way, here’s finally my observation. Good engineering teams are awesome at building new things. Great engineering teams are awesome at throwing stuff away. Let me explain.

You know Tetris. Blocks flying down, solid lines disappear. The more consistent you design the layout to have solid lines, the longer you’ll be able to keep playing the game. The less you’re focused on managing the existing blocks, the shorter the game will be.

Good engineering teams are doing a good job of designing and building new components, often next to existing systems or interfacing with some legacy stuff. That’s already bold and far from trivial. Interfacing not only across technical systems, but also entirely different decision realms is hard.

Great engineering do the same, but looking not only forward, but also sideways, and even backwards. Building new things can be a huge lever in helping to replace systems that are past their expiration date. Being smart in allowing newer systems to take over functions of older stuff, being bold in deciding that some unmaintained hot mess needs to go – that’s the decisions that will eventually compound and allow an environment to be still able to innovate years down the road.

Of course, the world is complex and not all teams have the leadership, agency, skill or resources to drive this continuous process of renewing the system landscape you were hired to take care off. And that’s fine, reality hits hard.

But the game might be harder without trying to make the blocks go away.

the thing about risk

Here’s an observation: The less risk there is in doing something, the more time you spend to speak about it. This is about decision making in organisations, and it’s something that just struck me today.

I’ve been in situations where taking risks was just part of the routine – with the occasionally bad outcome that indeed something non-awesome happened. Any action is associated with some level of risk. No action at all is also associated with a risk, at least if you’re a business. Why this is important for me today is that I realised that I’ve spent a significant portion of my day building bridges to stakeholders. Bridges that would allow them to green light the most insignificant change. The sole reason they haven’t so far is: risk.

Now, those people aren’t stupid, there’s probably something to be concerned about. There’s always. The question is: at what point does the notion of actions being risky stop adding value – in the sense that it informs a discussion, helps to steer decisions and so on – and simply turns into a song everyone has heard one too many times.

Risks are incredibly convenient, as are concerns. They’re the rational lipstick on the pig that is both indecisiveness and inaction. However, pointing out risks in itself doesn’t provide value, and as such should only ever be endorsed in combination with an “however”, that demonstrates a path to the desired goal by circumventing or smartly managing a concern.

And here’s where organisations and cultures differ greatly. My personal, steaming hot, take is that organisations that have learned that risk-aversion in itself can steer decisions over time unlearn to realistically rate risks. On the other side of the spectrum, organisations that have a low tolerance for risks blocking progress learn, over time, to estimate risks more accurately. This is because they actually collect evidence of bad outcomes, something that is of critical importance for really understanding risks.

Think of it as an abstract space in which to make decisions. Kind of like a circle, of you need something more visual. Every time you are limiting your decision space by giving in to potential, unrealised risk, you are making that space smaller. You’re delimiting it on the outside, without any real reason to doing so. Now, doing that once is harmless, but what your organisation will adapt to is that new shape, not the previous one. There’s almost no elasticity. And once you start to make that space smaller, something odd happens. If you’re trying to do something that would have been fine 2 years ago, it will be seen as too risky in the present. Not because anything bad happened, not because the real world has moved so much – simply because the organisation changed its risk tolerance.

In organisations that are open to risk – and are conscious that anything can always go wrong – the space in which to make decisions is bigger, and also it’s modelled based on evidence, where available. Healthy organisations know that it’s safe to take a risk – as there is experienced resilience that bad outcomes can and will be dealt with. Healthy cultures don’t give risks and concerns a front row seat in decision making. This is not a text advocating recklessness, it’s advocating being reasonably bold. Healthy organisations know that the biggest risk comes from inaction, not the wrong actions.

focus

Ok, let’s talk about focus.

I learned a parenting hack a few weeks ago. One that really helped me to smoothen some previously tense or potentially tense situations. Whenever I need my toddlers to do something – like put on clothes, brush teeth, some part of the routine, and they’d be opposed to it (which is not crazy rare), I started saying something. That something is: “ok, we’ll do $something_fun soon, but the very next thing we’ll quickly do is X”. For some reason, that connects super well to their brains, and it’s remarkable to be reminded of something super important: focus. We’re focusing on a distant goal, while being clear that also the tangible, short-term actions have to be taken.

There’s probably a list somewhere that contains 20 signs that you’ve been doing engineering leadership for way too long, and finding analogies for everyday challenges in absolutely normal situations probably ranks highly there. But now that we’re here, let’s focus on focus a bit more.

Of all the things that feel like a magic trick in my professional life, answering to a group or an individual the question “what’s the most important thing for you to be working on” is my number one. Dysfunctional groups aren’t dysfunctional because it’s a ton of fun to be in a dysfunctional group (it’s not), it’s usually happening because people passionately and with dedication pursue different, and in a good number of cases, incompatible goals. The problem in those cases is not how software is built, or how the rituals are organised, it’s that the most important question has not been asked or answered: What is the most important thing to focus on.

Teams are remarkably adaptable, at least if there is some healthy fabric that keeps the substance alive. I stopped counting the situations in which this ounce of clarity transformed a hopelessly lost group into a delivering powerhouse. And make no mistake, there’s something self-sustaining there. The moment a group recognises that it is able to make progress towards that mythical most important thing, the faster they sometimes get. There’s joy in recognising that you’re having an effect, and that actions lead to tangible outcomes. It’s common sense that’s wildly uncommon, unfortunately.

There’s a place for ambiguity, for dealing with situations where there’s no clear guidelines on how to make the best decision or how to move forward. Doing two things because you can’t decide for one is probably the worst thing you can do. Just decide, switch on that laser beam and focus on getting the next thing out of the door.

This is, of course, a simplification of the real world. Most teams face a perpetual dilemma of having to work on the technical foundations, spending time on incidents, rituals and also finding space to do some actual feature development. This is where engineering management needs to provide the space to focus on what’s really relevant, while not focusing on a bunch of interesting stuff that’s of little value. If we know what the most important thing is, everything else is just

Not that important.

500 words

I used to have a habit of wrapping up my day by writing 500 words. As it happens with habits, I dropped that one and got a bunch of others in its place. Looking back, I feel the 500 words helped me to reflect on the matters of the day, or to conclude or continue some thoughts on the more long-running insights.

The fantastic thing about writing is that it can be a purely unidirectional process – from thought to words, and that’s pretty much it. There’s freedom in just writing, without applying too much polishing, massaging or tuning to the final result. It’s probably the same difference between authentic conversation, that is just happening in the moment, and a rehearsed speech or presentation. Both have their time and place.

One thing that “corporate” did to me is to make me careful – careful in how to phrase ideas, which words to use, which words to avoid and so on. That’s a good thing, being professional is not a bad thing, and neither are healthy filters. When it comes to my own writing, I found and find that limiting. But it’s super hard to shut down a routine and an inner janitor that is carefully checking every message during working hours just to have some more freedom when writing outside of those. Well, here I am trying, and probably oversharing a little in the process.

The best piece of advice I ever got in regards to writing was to just write. Not to do reviews in the process, not to do editing after every sentence. I’ll take it a little further, and I won’t fix anything but typos in this. Let’s see where it goes, let’s see where it takes me. Incidentally, it really is super comparable to being “in the zone” when writing code. Not every line of code in itself has to be art, what counts is that the final result does what it’s supposed to do.

For code, that’s probably something like solving a problem or implementing a function or whatever. For written things it’s the gist, the meaning, that has to be transported. And maybe it’s an overly pragmatic and limited viewpoint, but not every word matters in that regard – as long as the message makes it from a to b.

Writing starts to suck, at least my own, when my thoughts take a detour on the meta level. That is, when it’s no longer about the content or the message, but more about the style of writing or some other self-filtering that’s getting ready to self-apply. And while that can be useful, on a certain level, for the most part, it’s just very much limiting. So please excuse the occasional slip-up as I’m trying to work around my inner north-korean thought police.

If you’re wondering what I’ll be writing about – I do have the same question. I guess we’ll find out along the journey, and I can’t wait to see where it takes me (and you, whoever that is.)

But I can guarantee it’s gonna be around 500 words each time.

Reboot

There's never a better time for a change than now. In that sense I decided it's about time to clean up my blog posts, only leaving some of the more recent ones around. The old blog content was certainly fun – but I have to be mindful it's also from an entirely different episode in my personal and professional live. I don't think it's relevant anymore.

Off to new beginnings.