Net API Notes for 2019/03/07 - The 'Post-API' Era of Academic Research
Surprising to no one that reads this newsletter, we live in a world influenced by algorithms. Access to those algorithms, however, is changing. Knowledgeable professionals, in an open source world, could inspect the programmatic work and assess impact. But, as the Stephen O'Grady note from last week mentioned, Software-as-a-Service has obscured the source code behind a cloud. Computational researchers, increasingly, have had to turn to APIs to study how ubiquitous services impact our lives.
Deen Freelon, an associate professor at UNC, wrote a paper entitled "Computational Research in the Post-API Age." In it he states:
"On April 4, 2018, the post-API age reached a milestone. On that day, Facebook closed access to its Pages API, which had allowed researchers to extract all posts, comments, and associated metadata from public Facebook pages. This decision followed the company’s April 2015 closure of its public search API, which provided searchable access to all public posts within a rolling two-week window. The closure of the Pages API eliminated all terms of service (TOS)-compliant access to Facebook content. Let me underscore the magnitude of this shift: there is currently no way to independently extract content from Facebook without violating its TOS."
It isn't just Facebook. Late last year, changes to Twitter API access increased the challenge and cost for academics to access historical tweets. This limited study into how information about diseases like Zika spread to how social movements like Black Lives Matter work to how social media can be used to promote democracy. YouTube, facing a rash of negative publicity in light revelations on the behavior of its recommendation algorithm, is restricting API access commonly used by academics).
Post Cambridge-Analytica, it makes sense that these popular services would take a harder line on access to data. A free-for-all is not what we want. However, academics seeking to uncover hidden biases and detrimental side effects are increasingly, also, faced with either abandoning their work or violating TOSs.
Deen's advice to computational researchers:
"First, they should learn how to scrape the web; and second, they should understand the potential consequences of violating platforms’ TOS by doing so."
Unfortunately, even that isn't enough. ProPublica, an investigative newsroom that has written at length about problems with Facebook's ad targeting, had its screen scraping foiled by changes to Facebook's design. It appeared deliberate on Facebook's part. Moreover, while Facebook has a researcher API, in beta, ProPublica was denied access.
I think it's fair to say that we want a future where companies:
- Continue to innovate new products and services
- Are responsible with algorithmic intent applied in our names
- Allow academic, journalistic, and regulatory access (most likely through APIs) for the public good
That last point seems to be missing (or, at a minimum, haphazardly applied). In the same way that restaurants undergo regular safety and health inspections, we need similar regimes for algorithmic impact. If there are places that have this or do this already, I'd love to hear about what's working.
NOTES
Not a ton of notes this week given the length of the editorial (above). However, I did want to highlight a post written by Kirsten Westeinde on the Shopify engineering blog. It is entitled "Deconstructing the Monolith: Designing Software that Maximizes Developer Productivity. Rather than adopt either a microservice or a monolith architecture, the post describes the creation of a modular monolith.
I found the post to be an honest evaluation of what made sense within one company. I'd hesitate in advising modular monoliths to other folks; it has the potential of netting the worst behaviors from both worlds. However, this approach seems to work for Shopify, and I applaud them for sharing this valid alternative for consideration.
MILESTONES
- Remember Oracle v Google? It goes on (and on and on). Most recently, 78 Computer Scientists got together to encourage the Supreme Court to preserve the right to implement each other's interfaces. There are some notable names among the folks listed.
- OAuth.tools has launched in Beta! OAuth.tools is a site that allows people to learn OAuth & OpenID Connect interactively. It allows visitors to run various flows, view requests/responses, and examine tokens. It also should work with any OAuth server. If you have feedback, pass it along to Travis Spencer.
- The popular PHP CMS, Drupal, has a pretty significant security flaw in its RESTful web service. If you're running Drupal, make sure you patch.
- Last but not least, everyone's favorite spandex-clad, European cyclist, Phil Sturgeon, has joined Stoplight.io. I look forward to what that team cooks up.
WRAPPING UP
In the last note, I mentioned how causal conflation of 'specification' and 'API description,' particularly by a vendor, drives me crazy. This set off, to me, a surprising amount of conversation. A reasonable question was "Why?" For the answer to that question, along with a comic as to why a hot dog is not a sandwich (really), I wrote a blog post.
Speaking of hot dogs, I'm looking for one. I'm hiring a Principle Data Analyst at Capital One. Our distributed systems architecture, driven by our 9000 developers, is growing at a steady clip. I'm looking for someone able to perform complex quantitative and qualitative analysis across our ecosystem. More importantly, this person must have a passion for telling compelling stories with data.
Is that you? Shoot me an email with a little about yourself and what you're looking to do, professionally, and let's see if there is a match.
Also, check out Webapi.events. Some great, 2019 events can be found there. If something is missing just let me know; I'd be happy to add to it.
Finally, as always, thank you to my Patreon sponsors. Encouragement comes in many forms and is much appreciated.
Till then, Matthew
@libel_vox and https://matthewreinbold.com