By Matthew Reinbold in Newsletter — Apr 21, 2023

Cutting the Gordian Knot of Consumer API Testing

Net API Notes for 2023/04/21 - Issue 213 - Consumer Test Environments

In the beginning, the software development team pushed an API to production and called it good. On the second day, consumers attempted to integrate with the API and said, "Hang on, wait a minute!"

On first blush, testing an API would be straightforward. However, if API producers aren't careful, their good-faith efforts to support consumer testing can devolve into a mess of environmental dependencies. In this edition of my Net API Notes, I will outline a common corner that API teams paint themselves into and suggest an alternative. To the note!

Alexander the Great cuts the Gordian Knot by Jean-Simon Berthélemy (1743–1811)

Avoid API Lifecycle Entanglement With Your API Environments

Standard Disclaimer

My standard disclaimer applies; I'm about to toss out a whole host of labels and definitions. Your names or definitions may differ. Or you may slice the activities differently than what I propose here. If so, that's great! Stick with the model that makes sense, is used in your context, and helps you get your work done.

My intention with the following list is not to cast shade on other fully-formed models but plant a seed of inquiry with those looking to learn more. OK! With that out of the way, let's draw and define!

APIs Advance Through Lifecycle Defined Environments

When we think about or graph the relationship between API producer, consumer, and the gateway between them, we almost exclusively depict the production environment.

The consumer's code makes a request to the gateway.
Assuming there is no problem with things like authentication and rate limit checks, the gateway directs the request to the API producer's code.
The API producer's code processes the request and constructs a response to the gateway.
The gateway returns the response to the consumer's code.

This representation lacks the number of development environments that exist before this production ideal. Developers pushing directly to production from their local machine is rare (or at least should be in all but the most trivial situations). Modern software development defines a life cycle of the activities (or a SDLC - which stands for Software Development Life Cycle). Executing these activities necessitates dedicated environments to creating a predictable, repeatable process with consistent results.

Those environments may include, but are not limited to:

The Local environment (not pictured) is where developers write, debug, and unit test their code at the most basic level. The development environment is usually set up on individual developers' machines configured with the necessary tools, libraries, and frameworks.
The Development environment is where multiple developers' code is merged and checks whether a developing feature in the current iteration/sprint is working as expected. This environment helps ensure the required dependencies are referenced, installed, and resolved correctly, the correct folder permissions are in place, and integration tests are automatically run with every code push.
The QA (or Testing) environment is dedicated to testing the application thoroughly. It often includes a variety of testing types, such as functional, load, and stress testing. This environment should also perform many automated checks – everything from open-source scanning to vulnerability detection. The testing environment should resemble the production environment as closely as possible to identify any potential issues accurately.
The UAT environment is specifically dedicated to allowing end-users or software intermediaries to test the software and ensure it meets their requirements before advancing to production. The UAT environment resembles the production environment, replicating its configurations and infrastructure; however, referenced components – like database dependencies – may not include caching or other scaling configurations.
The Staging (or Pre-Production) environment is used for final testing before deployment. It is a replica of the production environment, including permissions, configurations, and service dependencies, going so far as to incorporate regular data refreshes. This environment exists to minimize the risk of production deployment-related issues.
The Production environment is the live environment where the application is deployed and made available to end users. It emphasizes stability and security with strict access control, monitoring, and disaster recovery systems in place. The production environment may also have comprehensive operational documentation requirements.

API Consumers Wants To Test, Too

But the consumer looking to integrate with the API must also test. Creating new accounts, updating balances, or deleting payment transactions are examples of unsafe operations that a client wouldn't want to try on a 'live' system for the first time . Consumers need a testing environment, too.

And here's where opportunistic thinking can get the production team in trouble. The API producers may look at the array of environments already existing for the production lifecycle and assume that fulfilling the consumer need is as easy as providing access to the appropriate 'lower' environment. The consumer wants to test and, what luck, one of the existing environments is already named "test"! What luck!

Except-

Shared Develop Environments are a Form of Coupling

Any immediate practicality possibly gained from repurposing lower environments soon begins to run into trouble. Doing so inadvertently couples API producer and API consumer lifecycles through the shared environments. While an org might have chosen APIs as the architectural solution because they wanted increased developer autonomy and lower communication overhead, using environments in this way has the opposite effect.

Let's run through some common scenarios.

Development Environment Uptime Is Not Guaranteed

A big reason that an SDLC spreads activities across various environments is that we provide the appropriately sized (and thus cost-effective) resources for a given exercise. Even just ensuring an environment is provisioned correctly, patched, hydrated, and secured takes effort. Because of this, often the development environment is the most unstable; there when the team needs it but an afterthought at any other moment.

That is until the API consumer is trying to build their experience. When the development environment is down, the consumer is "blocked". To properly support the consumer, the producers now face a difficult trade-off: either coordinate with the consumer on when they should expect the environment to be available or guarantee uptime for their lower environment. Neither one of these options is ideal. But the headaches are only getting started.

Changing Code In Lower Environments Now Breaks Others

To build, the consumer expects stability. However, the producers' development environment exists to facilitate new… um… development. So, if the producers have gotta build, and consumers require stability, the unfortunate answer is creating another development environment (Dev v1.1). The consumer continues the development phase of their SDLC on Dev 1.0 while the producer wrings new features into shape on Dev 1.1.

Consumers Will Want to Use Their Data, Not Others

Till now, we've only talked about a single client. In all likelihood, an API would have multiple consumers. And while they may be content sharing the producers' QA environment, what they may not appreciate is trying to debug whether the behavior they see is due to a bug or due to others interacting with their data unbeknownst to them: "I just created a new transaction this morning, but now it's gone. Did it not save? Or did someone else delete it as part of their testing?"

Already accustomed to solving environmental coupling problems with more environments, an API producer may assume that the most logical approach is to give each client their instance of QA. But wait! There's more!

User Acceptance Testing (UAT) Bottlenecks All API Clients Together

Unlike much of what can be done with automated testing, UAT represents a hard gate. At this point in a lifecycle somebody is entrusted with either deciding "yes, this meets our needs" or "no, there is a problem". A release can be delayed if that decision maker isn't available.

In some organizations, it is understood that all consumers need to attest to a release before it can proceed to production. If just one of those clients disappears for multiple weeks around the New Year, for example, all work stops. Not only has the producer's lifecycle become enmeshed with a consumer's lifecycle; now all clients have become conjoined, only able to move as fast as the slowest among them. Yikes!

Unraveling the Giant Environmental Hairball Is Possible

A better way is possible. But first, we need to establish a few truths to break the tight coupling caused by how we've used environments in the above examples.

Use an Explicit API Design-First SDLC Step to Avoid API UAT Bottleneck

Often teams will adopt whatever SDLC has been established within the organization. And that SDLC may encompass all types of software creation beyond just APIs. There may be times, like in creating a dashboard or mobile app, when UAT requires significant surveying and UX lab observation.

There are times when getting rich feedback from end users requires the soup-to-nuts experience; there are few other ways to test whether the business requirements are fulfilled. But with APIs, the API-Design First approach advocates designing the API's contract before writing any code. Just as using an eraser on the drafting table is easier than using a sledgehammer on the construction site, it is easier to iterate to acquire user acceptance on a contract than on running code in a dedicated environment.

API development should have an additional design acceptance phase in their SDLC before code creation. And then API producers should build to that contract. If they can do that, the producers can avoid bottlenecking consumers over a dedicated UAT environment.

Remove the Consumer from the Producers' SDLC

API-Design first is excellent, but a greater degree of pain is caused by linking the Consumer and Producer SDLCs. Therefore, it is essential to break this coupling. Consumers can still proceed along with their SDLC process. The difference is their lifecycle reference the finished API product, not in-progress work in the lower environments.

Think about it this way: I consume various ingredients if I make a cake. One of those ingredients is flour. The flour that I use for my cake recipe is a finished product. I don't select the wheat from the field, go into the mill's environment, and run quality checks against the flour about to be bagged. I still go through my process to make a cake, but I'm doing so with the final ingredients, not the works-in-progress. The act of making the flour is some other team's job and I trust them to do it with quality, efficiently, and independently from my work.

As an API consumer, there are still very valid reasons why I would need to run unsafe operations against something other than a production environment. In those situations, API producers should provide a consumer test environment (or CTE) for calls not meant for production. Some places may call the CTE a client sandbox (however, be aware that sandbox can be a politicized word in some contexts). Like the staging environment, the CTE should be as close to the production environment as possible. Producers should simultaneously push to the CTE environment when they push to prod.

In Conclusion

I'm not just talking hypotheticals. In my experience overseeing a company's API efforts, we saw this play out multiple times among teams - even internal ones. The producers would hit upon something valuable, clients would look to build upon it but need somewhere to make their trial calls, and then - in a moment of weakness - the producers would grant client access to their lower environments. The next thing you know, VPs are finger-pointing as to who threw their forecasting out the window.

That could be a better situation. Take my advice and avoid granting access to your lower environments. Make user acceptance a dedicated step as part of the design. It may not be easy, at least at first. But, with some forethought, you can avoid environmental coupling in your API lifecycle.

Milestones

Few interesting notes this time around:

The OpenAPI registry got a nice facelift, thanks to Mike Ralphson.
As I forecast in a previous issue, the Fediverse continues to grow. Mastodon has now surpassed 10 million accounts.
According to the last Salt Security report, API attacker activity is up 400 percent over the previous six months.
OpenAI announced the availability of their ChatGPT API.
Postman Flows - a way of orchestrating data across multiple API calls - has moved into general availability.
Akamai is acquiring API security startup Neosec for "several dozen millions of dollars".
After a hiatus, the APIs You Won't Hate email newsletter is back. This pairs nicely with the recently completed, "Surviving Other People's APIs" from carbon-neutral bushwacker Phil Sturgeon.
Reddit will begin charging for access to its API. Well, developers who want to build apps and bots to help people use Reddit, and researchers studying Reddit have a pass. But companies that "crawl" Reddit and "don't return any of that value" will need to pay. The move seems related to Reddit data regularly being used as training data for Large Language Models (or LLMs) like ChatGPT.
Twitter, which continues to cause rubbernecking even after my previous note, announced its new API pricing: $42,000 per month. That went over how you would expect. Unsurprisingly, Microsoft said it wasn't paying. Upon hearing the news Elon threatened to sue. <exasperated sigh>
Speaking of pricing, it was brought to my attention this week that SwaggerHub no longer has an individual plan. The next lowest tier is $78.75 monthly, or $945 a year (a 20% savings!). I'm all for paying for good tools, but come on. Really?
Finally, Gordon Moore, legendary Intel Co-Founder and the author of Moore's Law passed away at 94.

Wrapping Up

The last newsletter made me nervous. While I was proud of the result, I am always concerned that there's apathy toward deep, nuanced writing in an age of TikToks and Instagram stories. A thank you to the long-time readers and a welcome to the new subscribers who continue to validate that long-form writing has a place. Seeing my email client fill up like this is some of the best encouragement a guy could have:

As always, thank you to the Patrons and Substack subscribers for ensuring this newsletter remains paywall and advertising free for the benefit of all readers.

That's all for now. Till next time,

Matthew (@matthew and matthewreinbold.com)