PostgreSQL foreign data wrappers are a great way to expose APIs to SQL. And SQL is a great way to reason over the data. Credit: Thinkstock In the original proposal for the World Wide Web, Tim Berners-Lee wrote: A generic tool could perhaps be made to allow any database which uses a commercial DBMS to be displayed as a hypertext view. We did get those hypertext views, in the form of APIs, but not in a generic way. Database-backed web applications grew APIs that yielded diverse XML and then JSON outputs. Programming languages grew libraries to help developers consume those outputs. Learning to use each individual API was challenging, joining outputs from several APIs even more so. In 2009 I was building a system to combine calendar information from many sources. It asked the same question of all of them: What events are scheduled for a given place and time? To do that it had to use a half-dozen APIs, each requiring a different way to make a request and unpack the response. So when I learned about Project Astoria I was an immediate fan. Astoria was the generic hypertext view of databases that we needed. With Astoria layered on top, every database could automatically provide a default API. If the half-dozen systems I was querying for events supported something like Astoria, none would have needed to invent bespoke APIs and all would be queryable in the same consistent way. The idea matured as Open Data, aka OData, an OASIS standard since 2014. In principle any database-backed web app could now sport an “OData head” that would provide a default API, requiring no code to be written by developers of the app, and no new request/response protocols to be learned by developers using the API. In practice that mostly hasn’t happened. More than ever, software construction requires developers to compose solutions using a growing proliferation of APIs. Often there’s a library to wrap each API in your programming language of choice, so you’re spared the effort of making raw REST calls and parsing the results. But each wrapper has its own way of representing results, so when composing a multi-API solution you have to normalize those representations. Since combining results happens in a language-specific way, your solution is tied to that language. And if that language is JavaScript or Python or Java or C# then it is arguably not the most universal and powerful way to query (or update) a database. What is the best way? It’s been hiding in plain sight all along: SQL. Battle-hardened for decades, and evolved beyond the pure relational model, SQL has restablished itself as the preeminent interface to data. And it’s positioned to become the API unifier that we need more than ever. Foreign data wrappers for APIs Steampipe (steampipe.io) is an open-source tool that fetches data from diverse APIs and uses it to populate tables in a database. The database is Postgres, which is, nowadays, a platform on which to build all kinds of database-like systems by creating extensions that deeply customize the core. One class of Postgres extension, the foreign data wrapper (FDW), creates tables from external data. Steampipe embeds an instance of Postgres that loads an API-oriented foreign data wrapper. The FDW in turn communicates with a growing family of plug-ins that consume APIs and feed the data through the FDW into Postgres tables. To make these abstractions concrete, ask yourself how you would solve the following problem. You operate public AWS services, and you’d like to know if any of their endpoints show up as vulnerable in Shodan, a service that scans public endpoints. Your solution probably looks something like this: Learn how to use the AWS API that finds your endpoints Learn how to use the Shodan API the checks your endpoints Learn how to combine those two APIs to answer the question Here’s the Steampipe solution. ``` select a.instance_id, s.ports, s.vulns, a.security_groups from aws_ec2_instance a left join shodan_host s on a.public_ip_address = s.ip where a.public_ip_address is not null; ``` ``` +---------------------+----------+--------------------+-------------------------------------------------------------+ | instance_id | ports | vulns | security_groups | +---------------------+----------+--------------------+-------------------------------------------------------------+ | i-0dc60dd191cb84239 | <null> | <null> | [{"GroupId":"sg-042fe79169eb42818","GroupName":"lockdown"}] | | i-042a51a815773780d | [80,22] | <null> | [{"GroupId":"sg-042042bac705630f4","GroupName":"bastion"}] | | i-00cf426db9b8a58b6 | [22] | <null> | [{"GroupId":"sg-0423f79169eb42818","GroupName":"default"}] | | i-0e97f373db42dfa3f | [22,111] | ["CVE-2018-15919"] | [{"GroupId":"sg-0423f79169eb42818","GroupName":"default"}] | +---------------------+----------+--------------------+-------------------------------------------------------------+ ``` The two tables joined here are provided by Steampipe plug-ins for AWS and Shodan. The first maps the sprawling catalog of AWS APIs to (currently) 269 tables; the second provides a dozen Shodan tables. You configure these plug-ins to authenticate to the APIs with the same credentials you’d need if using the APIs directly. But you don’t need to know anything else about underlying REST calls, or libraries wrapped around them. The solution is made from tables that work the same way within and across APIs. You inspect them (aws_ec2_instance, shodan_host) to discover the names of their columns, and you join them in the time-honored SQL way. A plug-in for every API Clearly this two-API solution depends on the existence of plug-ins to map both APIs to tables. If both services implemented OData that wouldn’t be necessary. The APIs would automatically be queryable, albeit arguably not joinable with the elegance that SQL affords. But these two services, like most, don’t present a unified interface to their APIs. So that unification has to be layered on top of them. Steampipe’s plug-in SDK smooths the way for plug-in authors by abstracting connection management, retry logic, caching, and of course the mapping of API results to tables. Steampipe plug-ins are written in Go. They leverage the comprehensive catalog of Go libraries that wrap APIs. But only plug-in authors need to know that. As a developer working with Steampipe you only see tables, and you only write SQL. Nowadays, because SQL has evolved, that includes features like Common Table Expressions (aka CTEs or WITH clauses) and JSON columns. But it’s still just SQL. Can such plug-ins feasibly be built for every API? Well, Steampipe launched in early 2021 with a handful of plug-ins, today there are more than 60, and the number is increasing quickly. So far most have been written by the core team but external contributions are growing. Thanks to the plug-in SDK, which does the heavy lifting, it’s straightforward to build a plug-in that maps an API to a set of tables. Standing on the shoulders of Postgres By embedding Postgres, Steampipe inherits all of its capabilities. So, for example, you can join API-sourced foreign tables with native Postgres tables. And while Steampipe’s primary benefit is live querying of APIs, you can create materialized views to persist that data and write Postgres functions to operate on it. You can even load other Postgres extensions and use them with Steampipe tables. Postgres’s built-in tablefunc extension, for example, can do crosstabs, in SQL, with spreadsheet data from Steampipe’s Google Sheets plug-in. Another benefit of embedding Postgres: Any Postgres-compatible API client can connect to Steampipe. That includes command-line tools like psql and GUI-based ones like Tableau, Power BI, Metabase, and Superset that bring visualization and interactivity to live API data. Postgres may never be as widely embedded as the ubiquitous SQLite but it’s more capable, and increasingly it’s used to power an ecosystem of interoperable tools for working with data. Steampipe extends Postgres to deliver unified access to APIs, and a common SQL environment in which to reason about the data they provide. Related content news Google Cloud adds graph processing to Spanner, SQL support to Bigtable The enhancements to cloud databases are expected to help in the development of AI-based and real-time applications. By Anirban Ghoshal Aug 01, 2024 6 mins Databases SQL news Google Cloud Spanner gets dual-region configuration option Google says the dual-region configuration option will help enterprises comply with data residency regulations in countries with limited cloud infrastructure. By Anirban Ghoshal Jul 23, 2024 4 mins Google Cloud Platform Databases SQL analysis How to choose the right database for your application From performance to programmability, the right database makes all the difference. Here are 13 key questions to guide your selection. By Martin Heller Jul 15, 2024 12 mins NoSQL Databases Databases SQL feature Why you should use SQLite Learn why this compact embedded relational database shines for many desktop, mobile, and edge computing applications By Serdar Yegulalp May 22, 2024 8 mins MySQL Databases SQL Resources Videos