Skip to main content

Reboot DevOps (Part: II)

In the Part I we discussed how DevOps isn't about solving application problems using infrastructure but about being able to deploy to our targets in a sustainable way. We spoke about how increasing confidence enables personnel to release more often as risks are hedged by having solid application packages.

Magnifying glass source: Wikimedia commons

Today we will speak about how monitoring and observability can increase our confidence to enable us to release more often.

Monitoring an application is the surfacing of the metrics that allow us to see whether a system is operable, not operable, or in an exhausted state. On a basic level, this may mean CPU usage, memory usage, network throughput, errors, and exceptions. What monitoring seeks to provide is whether a system (or service within it) is working or not at any one point in time. Having the ability to visualise this data in production builds confidence, as knowing that a system is working or not at any point after releasing is better than finding out through external sources.

Monitoring is something done on a system, observability is something that the system is. You have to actively make the system observable. Where monitoring is seeing the metrics of the system, observability is raising the right metrics to the surface. This may mean a few things: does your centralised message brokering platform provide the ability to log? Are logs among your services formatted consistently? Which business actions are important enough to warrant making observable?

Observability is about asking the right questions and by asking the right questions you will be able to know what your desired metrics are.

For example:

Why does my CPU usuage go up between 2-3pm even though my orders go down?

In order to be able to answer the question above we will need to replicate the state as well as control and data flow of our application in production at that specific time. If we're able to replicate this, then we're able to debug issues easier, then fix them. However, it's not only about fixing issues, it's about fixing potential issues. This is where confidence is really built.

Finally, monitoring and observability build confidence because we are able to take the uncontrolled aspects of a system and raise them to the surface. Enabling us to debug and replicate issues as they are in production to take the guesswork out of fixing issues.

Thank you for reading! I hope to post part III in the next few weeks.


Popular posts from this blog

My first time speaking at a conference

Since time immemorial we humans have valued the art of public speaking. Today, I want to share with you my experiences in speaking at conferences for the first time. Recently, I spoke at both DDD Melbourne and DDD Perth. Both of which were positive experiences that I learnt a lot in.

from zero to production in eighty days

When I mean zero, I literally mean zero. A brand new project, a PO that's new to IT, no existing processes in place and a small team of four including myself and the PO.

The departmental organisation we were working for doesn't have any developers, scrum masters, product owners working for them. Everything they did was either done by another department or outsourced completely.

This is a story how we went from zero to production in eighty days.

Context and agile practices

At times we have competing responsibilities - ship code or don't ship it because of a small edge case bug; put pressure on our team or make the business happy; coach our friends or write code.

This is a normal part of our everyday professional lives, and it's important to strike a balance that will help us in the future, but also deliver in the short-term.