Well, the global AWS outage happened just four days after I sent a newsletter about COEs and how “nobody gets blamed.” Great timing, right? I wish I could’ve been in the weekly global ops meeting to see the temperature in the room. That’s the one where teams present their recent issues and learnings. I can only imagine how lively that one must’ve been. Turns out the culprit was a DNS failure in the Amazon DynamoDB endpoint in the us-east-1 region. And while that sounds region-specific, it...
2 months ago • 1 min read
Someone pushes a new feature to prod the same day you go on-call. Hours later, your phone goes off - not a gentle buzz, but a full-blown siren that could wake up the entire neighborhood. You open the alert, and it’s for a feature you didn’t even touch. Maybe it’s unhandled NPEs, maybe something else. Doesn’t matter. You’re the one on-call, so it’s your problem now. When Things Break In those moments, it’s usually faster to just debug and fix it - even without full context. I’m pretty good at...
2 months ago • 2 min read
About eight years ago, when I was still a QA, Microsoft Azure “lost” our primary database. Without it, we were basically out of business - it was the main source of truth for, well, almost everything. I don’t remember exactly what the database held anymore, but I do remember the chaos that day. And the stress. A lot of it. Today, I saw a tweet about how the Korean government had all its data in a single location, with no backups. It reminded me: we all know this lesson, but we keep relearning...
3 months ago • 3 min read
When I was first learning how to code, I spent tens - maybe hundreds - of hours glued to online courses. Video after video. Tutorial after tutorial. Once I “made it,” though, I found myself more drawn to books. Maybe it was because my employer stopped paying for PluralSight. Maybe because I suddenly had an O’Reilly subscription. Or maybe because I realized video pacing never matches what I need - either way too slow (wasting hours) or too fast (forcing me to rewind endlessly, or drop the...
3 months ago • 1 min read
While recovering from ACL surgery (yes, that’s why there was no newsletter last week), I started re-reading Great at Work. I’d read it about a year ago, but almost nothing stuck. Maybe that explains why I haven’t exactly been "great at work." Chapter 2, "Redesigning Your Work," hit me at the right moment. Then a friend sent me this article: Altoids by the Fistful. The gist? We used to love building stuff—fun weekend projects, scrappy hacks, things that made us feel alive as developers. Now?...
3 months ago • 2 min read
Nothing too exciting happened in the last couple of weeks — I’m in the middle of a refactor that I need wrapped up before my surgery next week. The work is around the eligibility logic for our ITL (Integration Testing to the Left) product. Originally, the logic lived where ITL runs were created and was triggered for each code review. That worked fine, but the algorithm always picked the pipeline itself, which wasn’t always the best choice. We added customization on the ITL website, which...
4 months ago • 1 min read
My friend and I have tried all kinds of hacks to bring personalized images into Oneiras.com — training LoRAs, doing face replacements, you name it. The results were never quite good enough, and the cost + infrastructure overhead made it feel like a distraction. But last week, Google released the nano-banana model, and it finally does what I’ve wanted for months. Combine that with a three-day weekend, and it’s live: you can now upload multiple images to your profile and generate dream visuals...
4 months ago • 2 min read
The internal product we built last year—and are still improving—has lived under a domain like this: https://stage.frontend.servicename.orgname.aws.dev. Needless to say, it’s long, ugly, and impossible to remember. So during a backlog grooming session, I raised a story to add a more manageable domain and redirect the long one to something like this instead: https://servicename.aws.dev As with any backlog grooming session, we agreed that we need to do it but couldn’t decide whether it’s a 1 or...
4 months ago • 3 min read
It was 5 PM on a Friday, and our intern had already dropped off his laptop, since it was his last day. Two days earlier, he’d started cleaning up the AWS resources he created in our test account. What none of us realized at the time was that his cleanup would block every single deployment in our pipeline—for two full days. The culprit? CloudFormation dependencies between stacks. The Setup In our setup, whenever an SQS queue is created, we automatically generate CloudWatch alarms for...
4 months ago • 2 min read