It Wasn’t My Fault!


Well, the global AWS outage happened just four days after I sent a newsletter about COEs and how “nobody gets blamed.”

Great timing, right?

I wish I could’ve been in the weekly global ops meeting to see the temperature in the room. That’s the one where teams present their recent issues and learnings. I can only imagine how lively that one must’ve been.

Turns out the culprit was a DNS failure in the Amazon DynamoDB endpoint in the us-east-1 region.

And while that sounds region-specific, it actually affected a bunch of global services - like IAM - because they depend on control-plane endpoints in that region.

I’ve talked before about availability zones and regional redundancy, but it looks like there was no escape from this one. Unless you’re running a multi-cloud app. But that’s too much for me to even think about right now.

Can’t wait to read the COE and see what “actually” happened once it’s published.

For the record - I had nothing to do with it!

I’m still recovering from ACL surgery, and between PT, doctor visits, and my wife’s surgery, I’ve had my own kind of incident response to deal with.

That said, I did come across a great post from one of the folks at Cline about their new CLI. The author built it so they could run multiple agents directly from the terminal - pretty cool if you’re into automation or agent frameworks. You can read it here:

👉 Cline CLI: My Undying Love of Cline Core

Maybe one day I’ll have an agent that can handle my on-call rotation…

Cheers!

Evgeny Urubkov (@codevev)

600 1st Ave, Ste 330 PMB 92768, Seattle, WA 98104-2246
Unsubscribe · Preferences

codevev

codevev is a weekly newsletter designed to help you become a better software developer. Every Wednesday, get a concise email packed with value:• Skill Boosts: Elevate your coding with both hard and soft skill insights.• Tool Tips: Learn about new tools and how to use them effectively.• Real-World Wisdom: Gain from my experiences in the tech field.

Read more from codevev

Last week, I ran into this tweet: the tweet It kinda triggered me. Why would someone pay $0.40 per secret per month when you could just use AWS Parameter Store and store them as SecureStrings FOR FREE? That’s what I use for oneiras.com, so I was determined to find out if I’d missed something. Am I unknowingly paying per secret? Or is there actually a reason to use AWS Secrets Manager instead? Turns out, there are a couple, but only if you really need them. The Big One: Automated Secrets...

Someone pushes a new feature to prod the same day you go on-call. Hours later, your phone goes off - not a gentle buzz, but a full-blown siren that could wake up the entire neighborhood. You open the alert, and it’s for a feature you didn’t even touch. Maybe it’s unhandled NPEs, maybe something else. Doesn’t matter. You’re the one on-call, so it’s your problem now. When Things Break In those moments, it’s usually faster to just debug and fix it - even without full context. I’m pretty good at...

About eight years ago, when I was still a QA, Microsoft Azure “lost” our primary database. Without it, we were basically out of business - it was the main source of truth for, well, almost everything. I don’t remember exactly what the database held anymore, but I do remember the chaos that day. And the stress. A lot of it. Today, I saw a tweet about how the Korean government had all its data in a single location, with no backups. It reminded me: we all know this lesson, but we keep relearning...