Why ActivityPub Is An Exciting, Emerging Possibility for Decentralized Architectures
Net API Notes for 2022/12/13 - Issue 207
A month ago, in another newsletter, I wrote about the chaos at Twitter. I wasn't the only one so upset by the state of things that I left for an alternative. Millions have joined Mastodon, a Twitter-like social media experience built on ActivityPub. Other platforms, like Tumblr and Flickr, are also looking to add ActivityPub support. There's even renewed interest in an ActivityPub-enabled distributed ecosystem of sites and services - which proponents call THE FEDIVERSE .
What is ActivityPub, and what lessons should net API developers take from its sudden ascendance? I'll cover that and more in this edition of the Net API Notes.
WHAT IS ACTIVITYPUB?
ActivityPub is a decentralized social networking protocol built on existing, tried-and-true HTTP behavior. It was developed by the World Wide Web Consortium (W3C) and is a standardized way for online communities to share content.
The decentralized bit is key. Rather relying on a single, centralized platform, people can run their own servers and ActivityPub provides the messaging between them. Federation is a broad term for a group with smaller sub-groups. Further, each of these subgroups retains a measure of autonomy within the larger whole.
Email, for example, is a federated system. Different servers probably handle my corporate email and your workplace's email. However, because those servers adhere to a standard protocol, communication is possible between them.
THE HISTORY OF ACTIVITYPUB
ActivityPub is not the first federated protocol. Before ActivityPub, there were several other protocols and technologies that attempted decentralized social networking (Laconica, OpenMicroBlogging, OStatus by StatusNet, Friendica, GNU Social, Identi.ca, and Pump.io's ActivityPump [there's a whole history of the web in the naming conventions on display in that canonical list] - for more depth, see this Mastodon thread). However, those efforts did not gain widespread adoption due to technical incompatibilities and a lack of cultural "critical mass".
Several parties developed ActivityPub in conjunction with the World Wide Web Consortium (W3C) and they released it in 2018. Since then, ActivityPub has been adopted by numerous platforms. Mastodon is the most often cited. However, other services like Pixelfed (image sharing), Bookwyrm (book cataloging, similar to Goodreads), and PeerTube (video hosting) have emerged. Despite catering to different needs, each of these services can exchange messages not just with different users, but different implementations (as this tutorial shows, a Mastodon user can comment on a Pixelfed image from their Mastodon account).
HOW DOES ACTIVITYPUB WORK?
Typically, I've found W3C documentation dry, even a bit tedious. However, the W3C overview by Christine Lemmer-Webber, Jessica Tallon, Erin Shepherd, Amy Guy, and Evan Prodromou is fantastic. Anyone interested in ActivityPub's interworking should check that out. For everyone else, I'll summarize below.
The ActivityPub protocol is based on the ActivityStreams 2.0 data format. That specifies a standardized way of representing actions and activities on a social platform. Suppose I create a new post (or, colloquially, a 'toot' on Mastodon). Each ActivityPub actor, like me, has both an inbox (for receiving things) and an outbox (for sending things). Both my inbox and outbox have their own URLs for me (or a client application) to POST to. My new post is represented as a JSON-LD object and placed into my outbox.
The actions performed on these boxes will seem very 'webby' to folks here that work with net APIs:
- You can POST to someone's inbox to send them a message (server-to-server / federation only... this is federation!)
- You can GET from your inbox to read your latest messages (client-to-server; this is like reading your social network stream)
- You can POST to your outbox to send messages to the world (client-to-server)
- You can GET from someone's outbox to see what messages they've posted (or at least the ones you're authorized to see). (client-to-server and/or server-to-server)
What does one of these messages look like? Well, it is JSON-LD, which means it is JSON, which means it is grokkable text:
{
"@context": "https://www.w3.org/ns/activitystreams",
"type": "Note",
"to": ["https://some.example/an_example_actor/"],
"attributedTo": "https://mastodon.social/@matthewreinbold",
"content": "Hey, ActivityPub is cool."
}
The server of the instance I'm on sees a new message in my outbox and broadcasts the activity to other servers that are subscribed to my updates. Sidekiq asynchronous jobs perform these updates and, with the recent influx of folks, can sometimes be a performance bottleneck (more below, under 'scaling'). Once those messages are delivered these other servers subsequently deliver my message to subscribers' inboxes.
If we stay on Mastodon, ActivityPub is one of many protocols used. Verification of site ownership is made possible by the nearly twenty-year-old XHTML Friends Network protocol. Also, everyone's outbox is available as good 'ole RSS: find a user's profile, like:
https://mastodon.social/@matthewreinbold
And add '.rss' to it. Voilà! There's the command-line web us Gen-Xers get all misty-eyed over.
https://mastodon.social/@matthewreinbold.rss
From one Mastodon server (or "instance"), a person can follow and be followed by anyone else on any other Mastodon server anywhere else in the world. Returning to our email analogy, this is just like you can send an email from one server to anyone else on any server in the world. ActivityPub conveys many types of content, including text, pictures, and videos, but also concepts such as "likes," replies, and polls.
BUT WAIT, THERE'S AN API!
While, in theory, it would be possible to roll your own JSON-LD messages to POST directly to your outbox, services like Mastodon have a "conventional" REST-ish API.
I've been using the Mastodon API since 2018 for my Quote of the Day bot (#QOTD). For the most part, things ran without a hitch. However, when the great migration started happening, I saw 500 API timeout errors in my bot's log files. This brings us to:
SCALING ACTIVITYPUB (AND, SUBSEQUENTLY, THE FEDIVERSE)
One thing noticeably absent from Mastodon (or, really, the larger Fediverse) are celebrities with their GINORMOUS follower counts. Which (for now) is probably a good thing. In November, Aral Balkan wrote about how 'every toot is a potential denial of service attack'.
From Aral's post (remember that Sidekiq is a way of doing asynchronous threads in Ruby):
For example, let’s look at your birthday post … besides requiring thousands of Sidekiq jobs to spread your post through all their servers (you have 23K followers, let’s assume 3K different servers), as soon as you create the post 3K Sidekiq jobs are created. At your current plan you have 12 Sidekiq threads, so to process 3K jobs it will take a while because it can only deal with 12 at a time.
Then, for each reply you receive to that post, 3K jobs are created, so your followers can see that reply without leaving their server or looking at your profile. Then you reply to the reply you got, another 3K jobs are created and so on.
If you replied to the 100 replies you got on that post in 10 minutes (and assuming my 3K servers math is right). You created 300K jobs in Sidekiq. That’s why you get those queues.
The solution, in part, is to have popular folks run their own instance. However, as folks are learning, there are ops considerations (Postgres tuning, DB_POOL counts, object storage, etc.).
Then, there are good, old-fashioned DDS-style attacks (in this case, one instance calling GET to inboxes thousands of times per second, causing the pull queues to skyrocket). Thankfully, the solution, in this case, was to block (or "un-federate") the particular instance. But admins having to discover and circulate this kind of coordinated activity is a problem.
TO RECAP: BUT REALLY, WHAT IS ACTIVITYPUB?
Message queues are enjoying a moment in the enterprise architecture spotlight, and for good reasons. They:
- Provide a decoupling mechanism for distributed systems
- Increase reliability and fault tolerance with their queues
- Allow for more granular scaling
- Enable "fan-out" patterns (one producer and many consumers)
ActivityPub is a web-based protocol for sharing information across servers. In that way, it serves much the same purpose as WebSub, or PubSubHubbub, albeit with a narrower expectation of JSON-LD objects defined in the ActivityStream model. External vocabularies can be used to express additional detail not covered by the Activity Vocabulary. However, custom extensions to the object model do risk breaking interoperability, which could be an issue going forward.
Those of us of a certain age (~cough~) might remember the heady days of the "open web". The period of the late aughts was an exciting time of interoperable protocols, experimentation, and hyperbolic potential.
Nostalgia is a powerful drug, and, of course, we papered over a lot of the problems also present in that era - security, privacy, and accessibility, to name a few. However, one billionaire's incessant hot air on the fowl site seems to have rekindled some of those "open web" sparks. And if Tumblr and Flickr implement ActivityPub (with their millions of active accounts), there's not just smoke - but FIRE.
Not only will we enter a very interesting era of social networking, we'll also see a rise of accompanying tools and frameworks for managing ActivityPub. Yes, ActivityPub is a protocol. But it is poised to be the foundation for distributed, social computing for the next decade. And I can't wait to see what folks build on top of it.
MILESTONES
- Fred Brooks, the author of the 'Mythical Man Month', has passed away. The book popularized several concepts, not the least of which was Conway's Law.
- Monzo introduces Argo to help with microservice rollbacks at scale.
- GitHub announced calendar-based versioning for its REST API.
- After 25 years, Amazon's Werner Vogels published the 1998 distributed computing manifesto in its entirety. This document, authored in Amazon's early days, transformed its architecture and paved the way for AWS's success.
- 5.4 million Twitter users' data has been leaked online. The hackers exploited an API vulnerability this past January, but now the entire dataset has been leaked.
- Stoplight has a great story on how the Italian government uses Spectral to provide API consistency across agencies.
- Last.fm recently celebrated its 20th anniversary. The longevity is to be praised. Can you name another service that supports the same API after two decades?
WRAPPING UP
I mentioned Mastodon several times in this piece. Several fine folks managing instances are now receiving their first invoices. If you are on an instance, please consider donating to your server host; the bandwidth and compute costs that were previously borne by VC and advertising dollars are now coming from volunteers. I'm a $10-a-month sponsor to Mastodon. If you are a user, please also consider donating.
Also, consider hosting your own instance! There are tutorials on getting a containerized image up and running (here's one for Linode and another for Digital Ocean). Unfortunately, I'm probably past the point where I should be running my own ops; with luck, many of the fully managed hosting providers, like masto.host, will be back to accepting new signups soon. Until they open back up, people like me should continue to financially support the instances we're on.
That's all for now. Till next time,
Matthew (@matthewreinbold and matthewreinbold.com)
Update 2022-12-14
Two contrary points of view that I should have included, had I seen them at the time of publishing:
- Ops is hard. This thread describes how how hard. Expecting enthusiastic community builders to lean in to maintain technical infrastructure for the long haul, in Mastodon’s current configuration, may be unrealistic.
- Open source people looking to pitch in are discovering some stumbling blocks: mainly, where conversations happen are opaque. Some good lessons here for other projects looking to grow their communities.
Update 2023-04-23
New piece from the Verge: “Tumblr is working with ActivityPub, as are Flipboard, Medium, Mozilla, and even Meta. There’s now an official WordPress plug-in for ActivityPub, which will enable the protocol for something like half the internet all at once.”