Digging Out From Under Amazon Prime Video's Microservices vs. Monolith Debate Fallout
Net API Notes for 2023/05/15, Issue 215
In March, the Amazon Prime Video Technology blog released an article entitled "Scaling up the Prime Video Audio/Video Monitoring Service and Reducing Costs by 90%". The title is a bit click-baity, but if you pay attention to the space, you've seen this kind of self-congratulatory post a fair amount: we had a problem, we fixed a problem, and aren't we clever? In this particular case, the problem was distributed systems overhead; more specifically, they created many 'microservices' running on serverless infrastructure. The piece outlines that the architecture had scaling problems and came with an unsustainable cost.
Which is fine - we all live and learn. The Prime Video Team is yet another team, like those at Segment or Istio, where they tried a thing, decided what they arrived at wasn't right, and changed. The claims of returning to a monolith are a bit overblown; it was more like a team moved from one distributed systems arrangement to another, better tuned one, but kudos to them - the only unchanging software architecture is a dead architecture.
More than a month later, however, something interesting happened.
In this edition of Net API Notes, I will unpack what happened and highlight what's useful to your work. This one is gonna get a little link-happy, so let's get into it.
Mining What's Useful in the Microservices vs. Monoliths Rhetoric
DHH Co-Opts The Narrative for Clicks
David Heinemeier Hansson, or DHH as he is commonly referred to online, published an inflammatory article in May called, "Even Amazon can't make sense of serviceless or microservices". He brays that microservices were the "biggest siren song" for needlessly complicating your system (and serverless made it worse). To DHH, microservices are like diversity, equity, and inclusion (DEI) efforts: an "intellectual contagion that just refuses to die". Now that the rest of the world can see microservices for the shambling mess that it is, DHH is happy to pause for applause, having led the effort to "beat back the zombie onslaught".
And then, being the astute reader of rooms that he is, ends by encouraging readers to "keep your rhetorical shotgun locked and loaded".
The Amazon story came and went barely a ripple in the industry's collective consciousness. Then DHH dug it up a month later as a pretext for a victory lap, and every technology thinkfluencer and industry pundit-kin jumped to deliver a hot-take (including yours truly). DHH's (and, by extension, 37signals') playbook, whether it is extolling Ruby on Rails, promoting one of their books, or reinventing container orchestration, goes something like this:
- Find something that big companies - with all the competing objectives, resourcing, and edge cases - struggle with
- Use provocative, emotionally charged language to point out how absurd the household name is; if anyone objects, claim you are "speaking truth to power" or "fighting for the little guy"
- Introduce an alternative that appears simple and approachable while ignoring 80%-90% of the issues related to size and complexity that their particular context doesn't have to contend with
- Profit!
Years ago, I read REWORK, along with much of their other authored work. The book was DHH's "perfect playbook for anyone who's ever dreamed of doing it [entrepreneurship] on their own". It still has a special spot on my bookshelf, next to Merrill Chapman's In Search of Stupidity. Working several enterprise jobs has given me a greater empathy for companies and their employees caught in the giant hairball. But even before that, I could recognize REWORK as the one-note song it was: if you are like us, doing the same thing, in the same area, with the same scruples, you too can approximate the same success!
Over the decades, social media has evolved anyone seeking an online audience into a controversy-seeking shark. Inflaming passions - even over something as abstract and nuanced as service architecture - churns engagement. By pouring blood in the water, DHH not only started a commenting frenzy among API folks; he threw further shade on Amazon - a company he's repatriating his data from, a project that has, (SURPRISE!) created a feature-limited, rudimentary containerization management that you too can use!
Learning Lessons Amidst the Hustle
Given the self-serving nature of DHH's piece, there's not much helpful advice there that can be applied to other situations. However, there is useful information if we're willing to dig and extrapolate.
Adrian Cockcroft, of Netflix architecture fame, said what many of us were thinking with his reaction entitled, "So Many Bad Takes - What Is There to Learn from the Prime Video Microservices to Monolith Story". In addition to a wonderfully concise re-articulation of what happened, Adrian reshares a valuablel approach to figuring out architectural challenges:
"The Prime Video team had followed a path I call Serverless First, where the first try at building something is put together with Step Functions and Lambda calls. They state in the blog that this was quick to build, which is the point. When you are exploring how to construct something, building a prototype in a few days or weeks is a good approach. Then they tried to scale it to cope with high traffic and discovered that some of the state transitions in their step functions were too frequent, and they had some overly chatty calls between AWS lambda functions and S3. They were able to re-use most of their working code by combining it into a single long running microservice that is horizontally scaled using ECS, and which is invoked via a lambda function."
The sentiment that architecture should change was shared by Amazon's Werner Vogels in his piece, "Monoliths are not Dinosaurs":
- With every order of magnitude of growth, you should revisit your architecture and determine whether it can still support the next order level of growth.
- There is no one-size-fits-all.
In yea old before-times of January 2020, Amazon's Kelsey Hightower wrote a provocative work entitled, "Monoliths are the Future". After being referred to in DHH's piece, Kelsey clarified his stance on Twitter:
But, as I've seen over and over in the time I've written this newsletter, "write modular code" is easier said than done. The challenge is where to draw the boundaries; what comprises this service versus that service. One of the best heuristics for deciding the bounded context I first heard from architect Nick Tune:
"A bounded context is a bet on the things that will change together."
If you have to coordinate changes across multiple teams that own different services, there should be some reflection on whether those things should be separated. Likewise, if an aspect of a service remains stable and unchanging while another aspect requires frequent updates, there may be some benefit to separating those items. Above all, we remember that these are bets - decisions made with imperfect information and worthy of change when we learn more.
Authors of the Team Topologies book, Matthew Skelton and Manuel Pais, have spoken out about the "debate" in the past. To them, developers are missing the point when trying to decide between Monoliths or Microservices. Instead, they urge software creators to focus on team cognitive load.
As I've talked about before, what Amazon engineering does will be copied by others. Ultimately, however, we need to make sure our architectures aren't trying to solve problems they aren't capable of solving. Software architect Uwe Friedrichsen has a great two-part series about breaking up monoliths (part 1, part 2). He had this to say:
"I still see companies way too often trying to "solve" their org and process based speed and quality issues by introducing new technology (here: microservices). but the timeless truth still holds true: technology does not fix people-related problems."
If you still need more, I'd co-sign with Adrian and recommend Sam Newman's Building Microservices: Designing Fine-Grained Systems. First published in 2015, the second edition underwent a fantastic update in 2021. The book does an excellent job of defining when microservices are useful (and just as importantly, when they are not).
Milestones
- GitHub now blocks all token and API key leaks for all repos. "Since its beta release, software developers who enabled it successfully averted around 17,000 accidental exposures of sensitive information". Dang.
- Speaking of Github, the lawsuit over Copilot is still happening after a judge refuses to toss out key charges. I previously covered using Copilot to create OpenAPI descriptions.
- Travis Spencer, and the fine folks over at Curity, received investment from GRO, a European investor. The infusion will continue expansion and growth in the API and digital identity space.
- There's a new set of OWASP API Top 10 security enhancements. Security Boulevard has a recap of the changes in the 2023rc.
- In issue 209, I talked about how Twitter's new API terms would kill their developer ecosystem. Recently came word that WordPress was dropping social sharing due to the API price hike. I'd think this was a big deal, but apparently, the powers that be are more concerned with assigning the NPR account "to another company" out of spite.
Wrapping Up
As always, thank you to the Patrons and Substack subscribers. Because of their support, this newsletter is free of paywalls, advertising, or information selling. A few people covering the caffeine ensure the rest benefit - Thank you!
That's all for now. Till next time,
Matthew (@matthew and matthewreinbold.com)