A calm place

Or the importance of operational excellence

When I joined my current team last year, there was a lot of excitement going on. First, the schedule was determined by people outside of the team. Almost everything being built was based on ad-hoc, seemingly urgent support requests. The team was busy rushing features and bug fixes out to get through the backlog, which only kept increasing. Second, the on-call rotation was being passed through the whole team of almost 20 engineers. The weekly rotation hit you roughly twice a year. It meant sleepness nights, waking up multiple times. But it was over soon, and your motivation to fix things dropped to zero – merely hoping the situation will improve until your next turn.

It was time to challenge the status-quo. Persistently asking why we woke up people just to snooze an alert for a few hours. Why we had little ownership. Why the motivation was low to fix the root cause.

I challenged myself to measure my contribution by my very own, subjectively measured “team calmess index”. It started with giving support. Standing behind bold decision to only alert during office hours for many cases. Removing the large team-wide responsibilities and make them specific to a sub-group. And also finding time to fix some of the underlying system’s errors. Within weeks, my subjective “team calmess index” improved a lot. Without a change in technology, without a change in skills. Only the right management support, focus and ownership.

It has become a calm place. As the numbers tell.

Number of incidents per week with 3 month moving average.