Net API Notes for 2021/07/07 - Issue 168 - Github's CoPilot

Net API Notes is a regular, hand-curated digest of impactful news and analysis for busy API practitioners. Are you reading this on the web and not subscribed yet? Sign up today and be the first to get ad-free, actionable info delivered weekly to your inbox.

An extended weekend and a trip to the beach mean that this newsletter is out a day later than intended. However, that worked in my favor since the wrinkles regarding this first Github Copilot story continue to emerge.

NOTES

GITHUB COPILOT AI IS GENERATING AND LEAKING API KEYS

STRAT / DESIGN / DOC / DEV & TEST / DEPLOY / SECURITY / MONITOR / DISCOVERY / CHANGE MANAGEMENT

Last week, Microsoft, in association with OpenAI, unveiled a new project called Github Copilot AI. Copilot is an assistant that makes suggestions on how to complete everyday programming tasks. The machine-learning algorithm was trained on all public GitHub code, regardless of license. That has more than a few people upset. More on that in a moment.

I wanted to highlight how many of the early Copilot examples show code being generated to call third-party APIs. Like Gartner's Mark O'Neill, some celebrated how this will be a boon for new developers utilizing APIs. That certainly may be true, eventually. The problem is that rosy future has to get past a thorny reality: there are many public repos containing API keys. And since machine learning echoes the past to create a future, Copilot shares the keys that it finds with others.

Github has acknowledged the problem and said they are working on a fix. I'd love to be a fly on the wall for those discussions. Some repos, as you can imagine, exhibit poor security practices, like embedded keys. However, if Copilot was trained on every public repo, I would imagine there are also a fair number of tutorial, test, and experimental repos; code attempting to express or accomplish something other than security practices. Should sample code demonstrating sorting efficiency also show a high effort for securing the scaffolding? And how will Copilot determine which code is good security practice and which code should not be emulated?

Of course, Github and OpenAI will need to solve these problems while also dealing with numerous copyright protests. Machine learning has a history of ignoring licenses for Internet media (photos, text) when creating their training sets. However, OpenAI researchers may have underestimated how intense the open source community is about their code licenses. Github PR knew they had a problem shortly after launch, as subtle edits to their FAQ attempt to (re)define copyright. The least charitable characterization of Copilot is that it is laundering open-source code into commercial works.

I expect to watch this continue to develop. In the meantime, when dealing with API keys and cloud-hosted repositories:

  • Do not embed API keys directly in code
  • Do not store API keys in files inside your application's source tree
  • Create application and API key restrictions
  • Delete unneeded API keys to minimize exposure to attacks
  • Regenerate your API keys regularly (with automation)

RATE LIMITING GRAPHQL APIS BY COMPUTING QUERY COMPLEXITY

STRAT / DESIGN / DOC / DEV & TEST / DEPLOY / SECURITY / MONITOR / DISCOVERY / CHANGE MANAGEMENT

Guilherme Vieira posted on the Shopify engineering blog, "Rate Limiting GraphQL APIs by Calculating Query Complexity". The importance of query complexity is something I've seen trip up RPC and REST-ish API developers who are getting started with GraphQL. In the formal request-based patterns, rate limiting is well-understood and a part of any API management and gateway package.

Query-based patterns, like GraphQL, add an additional wrinkle. Yes, you may still have a client that exhausts their allowed number of calls. With GraphQL, you might also have a situation where a client makes one request. Still, the query is so computationally intensive that a request becomes functionally equivalent to a thundering herd on the infrastructure.

Shopify has taken that challenge and made it a feature. Vieira goes into great detail on how to think about query complexity. If you're working with GraphQL, definitely check this one out.

EIGHT (!?) UNEXPECTED CHALLENGES OF RUNNING AN API-AS-A-PRODUCT

STRAT / DESIGN / DOC / DEV & TEST / DEPLOY / SECURITY / MONITOR / DISCOVERY / CHANGE MANAGEMENT

Over on the Nordic API blog, Bill Doerrfeld interviews Alan Glickenhouse and Ed Freyfogle on the unexpected challenges with API product management.

Out of the eight, I want to highlight "needing a product manager". Even at this point in our API journey, I still see APIs treated as a technical solution. Yes, APIs are an architectural pattern. However, we can't overlook their impact as an organizational driver. That requires stakeholder conversations, story grooming, and roadmap guidance. Typically, those tasks are performed by product managers.

For the other seven insights, check out the entire piece.

And, if this product-centric subject matter resonates with you, check out this piece on the value of API platforms by Scott Middleton.

MILESTONES

WRAPPING UP

Last time, I mentioned a new post over on my blog about the importance of software environments in creating behaviors. I followed that up by discussing the importance of template defaults (and why they should be consciously defined). Do you have experience in this area? I'd love to discuss more.

Also, check out NetAPI.events if you're looking for API-centric get-togethers. If something is missing, let me know, and I'd be glad to add it.

Finally, thank you to my Patreons. Your help ensures that this newsletter is free of advertising, information selling, or paywalls. Because of your support, we all win!

Till next time, Matthew

@libel_vox and matthewreinbold.com

While I work at Postman, your friendly neighborhood API platform, the opinions presented above are mine.

Subscribe to Net API Notes

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe