The show man

Or why the worst managers succeed.

There’s this kind of team in each company that everyone knows. Not because it’s a successful team. But because that team is famous for big escalations, production problems and an architecture that evolved badly with no way out of the mess.

And then there’s the manager of this team. Highly successful.

Why? Simple. He is the one who gets recognized by the customers.

He’s constantly visiting. Fighting fire. His weeks are turbulent. Full of de-escalations, workarounds, meetings. Resulting in an action plan and a promise to do better. He leaves for the weekend with a big thank you from the customer. In the end, he was the only one who was visible to the customer that week. And he’s the only one who gets mentioned in customer’s reports seen by the leadership team.

AWS Well-Architected Framework applied – Cost Optimization

The AWS Well-Architected framework is a recommendation by AWS, summarized in an 80 page PDF document. Booooring. True. So I’m taking a different approach. This is a hands-on, developer focused way of thinking of it.

Context

The 80 pager talks about 5 pillars: (1) Operational Excellence, (2) Security, (3) Reliability, (4) Performance and (5) Cost Optimization. Those are very high-level aspects. Taking those guidelines and following them however results in a cost-efficient, scalable and reliable cloud environment.

In just joined a new team. A team that was not using any deployment automation and manages many systems that were built before the cloud became a real thing. So there’s clearly some trickiness to managing this infrastructure, but the initial impact by cleaning up the current state is high.

In my first two weeks I’ve been focusing on the pillars cost optimization (5th pillar) and security (2nd pillar). Today, I’m only going to talk about cost optimization.

(5) Cost Optimization

Let’s start with “why?”. Why should a “normal software engineer” bother? In the end, you might just be an employee of a multi-billion-dollar company. It’s simple. It’s to help increasing the core financial metric of the company (be it UFCF, profitability, margins), and therefore your bonus (which likely depends on one of those metrics). The less you and your team spends for getting the same value, the higher the contribution. While it might be a minor contribution to the overall pot of a multi-billion dollar company, those targets are often getting pushed down. In my specific case they were pushed down to our team: Our team’s AWS spending was around 60K USD / month, resulting in over 700.000 USD per year. Think of: How many employees could we add to our team instead of spending that much? (Or how many first-class flights around the world can you fly, if you want a more fun relation?)

In short, I believe it’s everyone’s responsibility as well-payed employee’s to be cost conscious.

As specific actions, I’ve started to automate some parts of the AWS infrastructure by managing IAM roles, IAM users and a handful of other resources through AWS CloudFormation templates. At the same time, I also started to auto-curate some parts around cost and security. Specifically, there are now daily scripts that auto-terminate EC2 machines 30 days after they have been stopped, and deleting detached volumes. That 1-2 day activity resulted in cost avoidance of 3000 USD / month (5% of the spend, and having measures in place to avoid these kind of costs forever).

So, now, what can everyone do and how can everyone contribute? Let’s start with specifics that allow to change your mindset as well as an easy introduction to provided tools:

  • Have a look at the AWS Trusted Advisor’s Cost Optimization section. It’s fairly basic. But as specific example, it gave me insight that our team had 10 TB of detached EBS volumes.
  • And simply the AWS Cost Explorer from the AWS Billing Dashboard.
  • Way better is Cloudability, a 3rd party tool that allows to analyze AWS costs, and offer optimization. The easiest is probably to setup a weekly report from Cloudability. This helps to raise awareness. Nothing else. Just slowly helping with becoming cost-conscious. Then there are the simple and advanced reports, insights and optimizations, which are well-documented on the Cloudability pages.
  • One of my next targets will likely be Rightsizing. For example, our team still has 25’000 IOPS (guaranteed I/O operations on disk) provisioned in a test environment, resulting in 1750 USD / month (in addition to disk space). This might be the right choice for the testing needs, but if not, let’s simply not buy guaranteed I/O.
  • And then there are reserved instances. Cloudability offers very good insight and a risk assessment of which and how many instance hours a team should buy. Also, for larger companies, the reserved instances can be bought on the global account and therefore distributing the risk among all accounts in the organization.
  • Once you’ve done the basics, look what’s available out there. From auto-scaling to auto-spotting to leveraging more AWS managed services.

I doubt you’ll end up flying first-class around the world. But at least you avoid someone asking you to row across the ocean, or even paying back part of the unnecessary spend you caused.

Two kinds of batteries

There are two kinds of batteries. Those that come fully charged and can be used once. And those that come half-charged and are meant to be used many times.

I was surprised those very same approaches exist for trust.

My world view assumed that trust starts neutral (or half-charged). You do good things together, and trust builds up. And it goes down when things go bad. Once you have charged the trust battery close to 100%, projects succeed, no matter what. If it’s down to 30% or less, projects start to fail.

Recently I learned from a close co-worker, that his world view assumes full trust at the start. That is, 100% charged. Everytime the other person screws up, it goes down a bit. Never up.

It’s useful to me to understand that this other world view exists. It’s also useful for those to understand that most others think differently.