Surviving API Breakage: Lessons from Modernizing Legacy Systems

Net API Notes for 2024/08/27, Issue 243

During the month of August, I'm letting guest authors behind the wheel. Next to drive to a special guest Indu Alagarsamy.

Indu Alagarsamy is currently a Principal Engineer at the New York Times. Her fifteen-year career has spanned the healthcare, finance, and biotech industries. She leads modernization initiatives, specializing in designing distributed systems and event-driven architectures. Indu is also a highly sought out conference speaker.

Indu is the organizer of the Domain-Driven Design SoCal Meetup. She writes about overcoming software complexity with Systems Thinking, Domain-Driven Design, and Service Blueprints at DomainAnalysis.io.

I'll be back doing editorial direction in September. But for now, I want to express my deepest thanks to Indu for the copious amount of effort she put into this piece on breaking changes and her willingness to share her story with this audience. Onto the piece!

Surviving API Breakage: Lessons from Modernizing Legacy Systems

I have been working on an interesting application modernization project at the NYTimes. This project involves moving a line of business from a decades-old monolith to a modern SaaS application. The new modernized system comprises of 

  • The SaaS application that takes care of the commodity capabilities that the monolith used to do
  • A collection of services that provide custom capabilities that are unique to the NYTimes domain. 
  • A collection of integration services that communicate between the SaaS application and the existing custom capability services. 

In this article, I will discuss the importance of backward compatibility and the painful experience consumers experience when backward compatibility is broken. I will share what I learned about backward compatibility and provide guidelines for API developers to deal with breaking changes with their consumers in mind. Any medium to large business with multiple services in production should be thinking about backward compatibility as a first-class citizen of their release process. 

But first, let me define some terms.

My definitions 

Before I talk about my frustrations as a consumer of the SaaS API, let's get some basic understanding of the terminology so we can be on the same page:

  • When I use the word API in this article, I mean an interface or a contract between two entities. It defines the expectations, the operations that can be performed, and how the data is exchanged. REST is a common example. 
  • When I talk about backward compatibility in this article, I mean that API clients continue to work when they transition to newer versions of the API.  They should avoid breaking contracts and unexpected changes in behavior—it just works.  

When the SaaS vendor broke our application

Our applications initially used the SaaS Vendor’s REST API Version 1 with a route convention of  /v1/domain-object/{key}, which returned the details of a domain object. This domain object used to have a collection of items. Each item had an attribute, which was an identifier of sorts; let’s call it the item-number. Without getting into too many details about this specific API, the item-number served an important purpose.   

We were in the middle of developing our event-driven solution, using this item-number attribute as an important identifying piece of information in our message schema for events. The idea was that the event consumer, upon receiving the event, could use the item number to retrieve the item.  

Using the Version 2 API: /v2/domain-object/{key}  returned the domain object and the collections, but it no longer had this item-number attribute, which was a surprise. V1 has the attribute. V2 doesn't. We found this the hard way around November of 2023. Rolling back V2  would also have been an expensive change because several integration services had switched to V2. Naturally, we were at a loss. We raised a support ticket with the Vendor, and their initial responses were as follows: They said it was no longer supported; why are we using this? How about we use a completely different approach, etc. The Vendor was rightful in introducing breaking changes in their major version; however, not knowing what we were using was getting deprecated was hard. This kind of change is not only inconvenient but disruptive for API consumers. 

Needless to say, this was all very painful. Thankfully, we have a great relationship with the Vendor, and they agreed to take this on as an enhancement request and re-added it in a subsequent release in February 2024. Luckily, we were still in development. Nevertheless, it added to our delivery schedule, as we hadn't planned for this. 

Another critical point is the importance of releasing the updated SDK with every new API release if you're also in the habit of maintaining SDKs. The SDK in this context means the programmatic wrapper around the Web API. Releasing the API without the SDK simultaneously means that your consumers using the SDK cannot upgrade to the latest version. This can significantly impact their work and should be averted. When our Vendor fixed the API, they didn't release the SDKs at the same time. A different team that relied on the SDK was blocked. Eventually, the Vendor released a newer version of the SDK with the fix.  

How I approached Backward Compatibility as an API developer

As a developer, I was acutely aware of the importance of maintaining backward compatibility. Before I joined the NYTimes, I was a developer at Particular Software, the makers of NServiceBus API. In this context, the NServiceBus API is a programmatic API, not a Web API, but the same principles of backward compatibility apply. NServiceBus API provides code abstraction on top of messaging technologies, simplifying the development of  complex distributed .NET applications.  I was one of the developers on the team at Particular, and I worked on the NServiceBus API during the very early days of version 3.

In addition to developing NServiceBus, I  regularly spoke  with customers to see how they use the API, provide design suggestions on their distributed architecture, and bring information back to the team on how we could improve the API. I knew how critically embedded the APIs were in our clients’ mission-critical business applications, so breaking backward compatibility wasn't an option when new releases were shipped. 

Manually testing backward compatibility

Wearing the tester hat, I rigorously tested the functionality in various scenarios before releasing the V4 at that time.  

For example, I had an application that was using the V3 version of the API and publishing an event. I created a consumer for that event using V4 of the API to ensure that both applications could still send and receive messages.  

I then flipped the test. I had an application that was using V4 to publish the event. I created a consumer for that event that was using the V3 version of the API. I ensured that they were able to send and receive messages. 

I made a matrix of these combinations based on ALL the API features. NServiceBus API offers several features for application developers, such as the ability to send and receive messages, process managers such as Saga, the ability to encrypt messages, and send a large body of data using the DataBus in addition to the Publish-Subscribe test I mentioned above. My goal was to test every significant part of the API to ensure we could identify issues before publicly releasing the version. We could be confident in our backward compatibility, and our consumers would not be impacted in any way.

Scaling testing through automation

We knew that this process of me testing my matrix manually every time would not scale. My friend Simon, who also worked with me then, implemented my matrix as a series of automated tests. This was our wire compatibility suite of tests. We then included this as part of the build pipeline so that these tests run as part of any PR that someone creates. We can catch these things early instead of finding a slew of problems right when we are about to release a major version, causing delays or, worse yet, not seeing them during our testing process and finding out when the customers report them to us. 

This thorough testing ensured that we maintained backward compatibility on the wire, which was crucial for our consumers.  

Guidelines for API Developers

  1. Use SemVer :  

SemVer is a standard that provides rules and guidelines for assigning and incrementing your version when you release software. While SemVer is primarily for libraries, some concepts can also apply to “REST” API. 

Let's say you're releasing your first version for consumption. A version number that follows the SemVer guidelines would be 1.0.0. In this example, 1 is the major version. 0 is the minor version, and 0 is the patch. 

  • If you fix a few backward-compatible bugs, you will release your next version as 1.0.1. 
  • Let's say you introduce a new feature or functionality that is backward-compatible. You would release your next release as 1.1.0. 
  • If your new feature or functionality breaks backward compatibility, you release a major version, i.e., 2.0.0

Using a versioning strategy informs the API consumers so that they can upgrade their usage at their own pace. They may be in a development cycle working towards their target release date when the Vendor introduces a new major version. If the consumer decides to upgrade, they will incur additional costs, specifically if the major version breaks backward compatibility and adjusts their project scope and timeline. Or they may prefer to wait until they are ready to switch to the new version. 

For REST APIs, there are several versioning strategies. One common approach is to have the version in the route or the path. For example:

  • API/v1/domain-object/{key}
  • API/v2/domain-object/{key}

Marwen Abid's article Four REST API Versioning Strategies outlines other versioning strategies for REST APIs. 

Trade-Offs: When you follow SemVer strictly, if you break backward compatibility for a feature either by intent or as a mistake, per SemVer guidelines, you must release a major version. Before adopting SemVer, it is important to ensure that all members of product development teams are on board. For example, Marketing folks may prefer any significant new feature releases to be released as major version increments; however, following SemVer, you could end up incrementing the major version because it was a breaking change and not a new feature. Everyone on the same page would benefit the internal teams, you, and your API consumers. 

  1. Deprecate and give advance warning before you break backward compatibility: 

If we follow SemVer, we can break backward compatibility if we bump up the major version. However, does that mean you catch your consumers off guard? If you do have to break backward compatibility, there are proper ways of doing this. In fact, here's SemVer's guidelines

  • First, update the documentation to let users know about the upcoming change.
  • Release a minor version that warns the users that the feature is being deprecated first.
  • Release the major version that removes or breaks the existing functionality. 

Per SemVer guidelines:

“Before you completely remove the functionality in a new major release there should be at least one minor release that contains the deprecation so that users can smoothly transition to the new API.”

How you implement SemVer guidelines depends on your tech stack and whether it’s a library vs REST.  If you’re using a tech stack such as .NET, Java, etc, always look in your tech stack to see what is available for you. 

  • In .NET, the Obsolete Attribute provides a clear warning to the consumer during compile time of the functionality that is going away, along with a clear direction of what consumers must use instead. 
  • In Java, the @Deprecated tag also provides a clear warning to the consumer during compile-time. 

In my modernization project example, which involved REST APIs,  there are a couple of ways the Vendor could have informed us when they were about to introduce a new major version that was going to break backward compatibility: 

  • In the Response header, you can define your own custom header that you clearly document specifically for this purpose. I like Clearbit's approach. They have an "X-API-Version" to indicate the version and an "X-API-Warn" to warn their consumers of potential problems. When they encounter this header in the API clients, it automatically prints a warning in the server logs. Zapier made this approach even better by including more information, "X-API-Deprecation-Date" and "X-API-Deprecation-Info," to provide essential details. 
  • In the Response object, you can define your custom convention for the response body to warn consumers. The downside of having this information as part of the response object is that the schema needs to be documented, and if the schema object changes, the consumers are forced to update the responses as well. 
  • Releasing a minor version of the SDK that gives the deprecation warnings to the consumer. The mechanisms for doing this will vary based on the tech stack. The good thing about Web APIs is that you can capture metrics on the number of users actively using it to help decide if you need to defer its removal. 

Regardless of the framework you use to deprecate old functionality or REST APIs, as a favor to your consumer:

  • Inform them in the docs and .
  • Inform them using the framework provided approach such as Obsolete, Deprecated, or using HTTP Headers, as to why this thing is going away. 
  • In the warning, provide clear instructions or a link to the documentation on the expected usage.
  • In the warning, provide the important information on when this information is going away, i.e., the next major release.

Following the SemVer guidelines, your consumers will be aware of this upcoming change and know what to expect. This will help them make the needed changes when their schedule permits based on where they are in their development cycle. Best of all, they won't be surprised when the next major version drops!

  1. Include  automated tests for wire compatibility in the  build pipeline:

If the APIs you develop are libraries that deal with networking, serialization, files, or any other form of persistence, then you definitely have to pay close attention to being wire compatible; two versions are wire compatible if the consumer application which uses one version is still able to communicate with another consumer application which uses your newer version. For example:

  • Application A -> uses Version 1 of your API -> communicates with Application B -> which uses Version 2 of your API
  • Application A -> uses Version 2 of your API -> communicates with Application B -> which uses Version 1 of your API

You must ensure wire compatibility works in both directions, backward or forward. By doing this, you're allowing the consumers of your API to use a rolling deployment instead of a big-bang deployment.

NServiceBus is an example of how to  ensure wire compatibility for their consumers through integration in the build pipeline.

  1. Include  automated tests for identifying breaking changes in the build pipeline

Snapshot testing can be quite a useful way to detect breaking changes to your API. Incorporating Snapshot testing into the build process can help catch things much earlier in the development cycle as an early warning system, especially when the API contract is broken unintentionally. 

What is Snapshot Testing? It's a testing technique that compares how a certain functionality currently behaves to the results from a preserved baseline version. While it's also used for UI testing, the following examples will walk you through how it's useful in testing API responses, specifically when the responses are complex object models. It can definitely increase developer productivity and catch compatibility problems. 

Let's say you're working on an integration with Stripe (Stripe is different from the Vendor I described in the introduction of this article for breaking backward compatibility in my case). You might want to test the different business scenarios that involve cancellation which uses the Cancel API. As part of your integration test, you call Stripe's Cancel API.  Take a look at the response object. The screenshot is just a partial view of the Subscription object. As part of your integration test, the list of attributes that you care about in the response object may still be a lot of properties you have to assert on as the response object may be a large complex object.  And this is just one operation. In reality, you're going to have a lot more interactions. 

Here's where Snapshot testing comes in handy. Simon's Verify is an example of the implementation of snapshot testing written in C# for .NET APIs. Instead of your unit test having pages and pages of properties, your unit test using Verify would look like this: 

var result = service.Cancel(“enter the correct key here”);
await Verifier.Verify(result);

The first time you run the test, it will bring up the diff tool, showing what was received in the .received.txt file. The .verified.txt file will be empty, and you can copy the results to the .verified.txt file and save them. Subsequent runs of the tests will compare the response to the data in the .verified file. If there are any differences, the test will fail, highlighting which property or value is the culprit. 

For more on how to use this, listen to the Podcast hosted by Dan Clarke, in which Simon explains Snapshot testing using Verify.

By integrating Snapshot testing into the tests and making it part of the build pipeline, we can catch when backward compatibility is broken, whether you are a vendor publishing the API or an API consumer. 

If you're using Postman, here's an article that walks you through Snapshot testing using Postman

  1. Write Great Release Notes that have instructions for upgrades

When I worked at Particular Software, part of my engineering work involved reviewing the release notes and clarifying to consumers how they could update to the released version. Many mission-critical software systems are built based on NServiceBus' messaging API. Since major versions could potentially have breaking changes, every major release has an upgrade guide to make things easier for consumers.  

The Upgrade Guide includes clear guidelines for what is being removed or renamed, a code snippet showing what the old version looked like, and instructions for achieving the same functionality with the new major version. I am proud of my contributions to the upgrade guides for major versions 3, 4, 5, and 6. 

The following is an example of the Upgrade guide, which shows one of the changes introduced in NServiceBus version 9. It clearly shows that sendOptions.RequiredImmediateDispatch() is now sendOptions.IsImmediateDispatchSet(). Consumers can copy the code to help with their upgrade process. 

  1. Beta test with a few customers so you can get early feedback and adjust: 

As you build your new features or versions of APIs even before they are release candidate material, once it is viable, have a small set of consumers try the upgrade with you. 

Get on a Zoom/ Google Meet/Teams call with your clients and in a screen share session,  you can help your client to upgrade to the latest version. This process of observing your clients as they upgrade and see if things are evident based on the release notes and the guidance you provided in your upgrade guide can help you learn. I highly recommend this approach based on my own experience. Upon observing how the clients went about the upgrade, I was able to provide helpful information to the development team. This resulted in cases where we changed the API or simply clarified the documentation and upgrade guides.  By doing this, you are ensuring that when you do release your major version, it can be a seamless experience. And guess what? The easier it is to upgrade to the major versions you release, the longer your consumers will be with you.

Useful References That Inspired This Piece

Have your consumers in mind every step of the way

The APIs you write for your consumers are being used for something much larger, and they become part of a more extensive ecosystem, a system that provides some usefulness to humans. The more frequently they catch up with your latest releases and upgrade their systems, the more integrated they are with your services. Breaking backward compatibility can break that confidence. However, by ensuring backward compatibility, you pave the way for a smoother and more confident transition for your consumers. The more painful it is to upgrade versions, the less frequently consumers will upgrade. The less frequently they upgrade, the more likely it will become much more difficult to upgrade to the latest with the accumulation of a lot of changes. The less frequently they upgrade, the more irrelevant your API becomes.  

Take stock of your API and pipeline and see what changes you need to make to ensure that backward compatibility is not a second-class citizen in your development process. Make the changes. Your consumers will thank you. And even if they don't, it's the right thing to do. 

Thanks! 

Thank you for reading this far! I would like to thank the following people:

  • Simon Cropp for reviewing this article. Simon has and continues to make significant contributions to open source and is a treasure to the .NET community. 
  • Chris Richardson, the author of Microservices Patterns. He shares his wealth of knowledge on designing good microservices and helps the software community 
  • Matthew Reinbold for inviting me as a guest to contribute to the newsletter for the month of August 2024 and being extremely patient with me  :)  

And thank you, Indu! If you'd like to read more of Indu's writing on APIs, DDD, and more, sign up for her email newsletter at DomainAnalysis.io . ~ Matthew

Milestones

This is already a pretty long note, so let's end with another API-adjacent, Invincible-inspired, meme.

Wrapping Up

Would you like to help support Net API Notes? Signing up to ensure you don't miss an edition is the easiest way to get started. Mentioning Net API Notes to your peers is another. Finally, if you can afford to, you can upgrade your signup to a paid subscription for as little as $8 a month - funds that help defray costs associated with the ongoing API job tracking project, pay for tools to improve the overall writing quality, and support specialized deep dives. Thanks to all of those that helped out, in whatever capacity, in that past.

That's all for now. Till next time,

Matthew [@matthew (Fediverse), matthewreinbold.com (Website)]

Subscribe to Net API Notes

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe