How I Think About Software Design

tl;dr - Given that a piece of software is generally designed to be functional indefinitely, I believe we should consider ease of continuing use and modification when designing it via a) explicitness in code over clever implicitness, b) high traceability of errors c) ensuring it is well structured for modifiability within repo or across services, and d) high visibility of every possible metric related to the service in use for introspection. This guards as best we can in a general way against issues like team churn, documentation rot, inertia via technical debt and future architectural requirements as it keeps optionality at the forefront as much as possible, while not sacrificing external value.


One of my favourite characteristics of this industry is the sheer volume of publicly available discussion around techniques, methodologies and paradigms for how to design the software we write. There’s a plethora of different publications and articles available to us, through websites such as Hacker News, StackOverflow, to personal blogs, tweets, and wikis covering everything from the low level such as a which sorting algorithm to use to high level concepts such as high-availability data store design or how to architect your ever increasing list of services. More than that, there is plenty of coverage on how to run your software team or organisation, how to hire, how to ensure velocity while not accruing crippling technical debt. The meta-characteristic of this world is inevitably people have preferences or tend to fall on one side or another, with every publication on the new or rehashed ‘one true way’ or ‘wrong but useful’ idea met with naysayers and advocates of an alternative. This idea of an emergent property due to the variety of ideas available to us led me to think about other meta-level characteristics of writing software - given the reality of all these different ideas, are there any meta-level considerations at the time of software writing that map to ‘effective’ emergent properties of a piece of software / website / production use, and if so could they be followed to create ‘effective’ software design in any situation.

That label is incredibly subjective so I want to clarify what I mean by ‘effective’. It is multi-faceted and somewhat depends on your perspective. On one hand it’s external facing - as a user or a business paying for or using the product, the internal structure of the product doesn’t matter at one point in time, as long as it delivers value at that time. The situation where the internal structure of a product does matter externally is when a team needs to make a change in order to improve onboarding, fix a bug or add a feature, which effects both current and potential future users. On the other hand the term is internally facing and related to some function of problem scope, return on investment expected and lifetime of the product. You have very different considerations in terms of code quality if you can truly guarantee you will throw away the code than if there is even a chance it will survive indefinitely, likewise scope and ROI (similar to the Iron Triangle) . In all cases you want to make your metrics of internal product health, or ‘effectiveness’ are good enough and stable as changes occur - because they will.

I want to highlight what I consider to be one of the few universal truths - all you can guarantee is change. It doesn’t matter which programming language you choose, which framework you choose, what database you use, at some point down the line one or all of you previous choices will seem sub-optimal at best, ridiculously short-sighted at worst. This situation is unavoidable so this should be a consideration - meaning that maintaining a suitable level of optionality should be a consideration.

To summarise, to me, ‘effective’ products / software are those which both not only deliver business value or user value as mentioned but also have internal characteristics that reflect that this universal truth has been taken into account, and thus are ‘effective’ in the long term to the external and internal users.

This is tricky and perhaps paradoxical line to walk - every decision you make causes all the branches to collapse and those reduces optionally but at the same time you need to make decisions in order the further the development of the software or product! Hence why maintaining a suitable level of optionality is the best we can hope for, too much either way and you will fail to be effective on both internal and external goals.

How do we actually do this? What considerations can we implement that don’t jeopardise the output of the product significantly but guarantee some level of optionality going forward. Here are some of the key points I’ve landed on:

1) Write your code / tests / documentation / setup scripts like someone else will read them later that day and judge you: This is an important one - as soon as you commit code and it reaches master, it is legacy code, there’s no avoiding it. Therefore you need to make sure this is the most understandable code there is - to me this means small things like explicit naming conventions, using the path of least surprise for all actions, comments and avoid clever implicit actions with side effects that are not immediately obvious. Sacrifice succinctness for cognitive ease. Likewise, get a second opinion either by asking or Pull Request review.

2) High Traceability of Errors: errors will happen, production errors doubly so, so it’s important to ensure that when errors do occur, efforts to investigate them are as easy as possible. This can be achieved through good traceability in error stack traces, ability to recreate the conditions around the error (one reason I love Fullstory.com) and avoid issues such as time-travelling debugging that can sometimes happen with async systems.

3) Well Structured For Modifiability - this is related to the idea that you will inevitably need to retire code, add new features or at some point change something in the product, and the need to make that process easy. Whether the service is a monorepo or composed of multiple services / micro services, you want clear and obvious interfaces between different actions or parts of an action so that variant testing, retirement or bug fixing is easy to test in isolation, and you don’t have to jump through N hoops in order to remove an unneeded piece of functionality.

4) Telemetry/Metrics - it’s important that everyone who has the ability to fix / modify / improve / debug a piece of software or product has access to all the relevant metrics or analytics events concerned with that software or product. This is important both for errors but also usage of features you have added - if they are not being used, remove them. Maintaining them will affect the options during future development. I would argue for a structured approach to collecting as much information as possible from the start if possible, and have a plan in place to assess noise (too many alerts, unwieldy numbers of user events) rather than starting with the minimum and missing key information. Introspection is key, hence we should have the tools necessary to get high resolution answers by default.

In conclusion, a disclaimer - this is clearly just my opinion, and holds nothing new, and is honestly an almost paradoxical approach to software design. However I believe there are some benefits to considering the universal truth mentioned, and how we can factor that into how we design software, helping us balance decision making with maintaining useful levels of optionality. At the very least to give us one more tool in our arsenal to combat the every growing complexity of software as it continues to eat the world.


Lastly as a footnote, a thought I had while writing this up - what is software?. Is it encoded mathematics? Is it art? Is it simply a process that could be carried out by a human but performed much, much faster to make it viable? The reality is it likely a mix of all three, the ratio varying depending on what you are looking at. The latter example is what interests me the most - for example, we could simulate browsing the web using only humans and fax machines, but it would be so many orders of magnitude slower that it would be unviable. The point is that any analogies to deal with human process change or process management are potentially mappable to software design or managing change in software design. Potentially.

comments powered by Disqus