Use Powerpipe to visualize and validate data in your own Postgres, SQLite, DuckDB, or MySQL database. Credit: Eder / Shutterstock In Steampipe unbundled we showed how its plugins, which originally worked only with the foreign data wrapper loaded into Steampipe’s batteries-included Postgres, are now also available as stand-alone distributions that you can load into your own instances of Postgres or SQLite. Now Steampipe itself is unbundled: its dashboard server and benchmark runner have migrated to a new open-source project, Powerpipe. When you start Powerpipe it defaults to a local instance of Steampipe so existing dashboards and benchmarks work as always. The mods that drive them source their data from the same plugins, using the same SQL queries. But you can use Powerpipe’s --database argument to specify a Postgres, SQLite, DuckDB, or MySQL connection string. Why unbundle Steampipe in this way? Unix philosophy holds that a software tool should do one thing well, and combine readily with other tools that abide by the same principle. Steampipe began that way, but as the dashboard and benchmark layers appeared it became clear that refactoring was in order. For devops engineers who use the tool to visualize and assess cloud infrastructure, it makes sense to decouple those layers into components with well-defined interfaces. But Powerpipe isn’t just for them. Among the set of new mods released with Powerpipe, there’s one that visualizes biomedical data from RNACentral by way of a public Postgres endpoint. To view those dashboards you install the mod, then start Powerpipe like so: powerpipe server --database postgres://reader:NWDMCE5xdipIjRrp@hh-pgsql-public.ebi.ac.uk:5432/pfmegrnargs The dashboards are now browseable at http://localhost:9033. You can of course connect any Postgres client to that endpoint. What Powerpipe brings to the party is an as-code approach to visualizing and validating data. The widgets that drive dashboards and benchmarks are written in a declarative language, HCL, that complements the declarative SQL queries that fill those widgets with data. All your HCL and SQL code lives in repos, under version control, open to the same kind of collaboration that you expect and enjoy for all other code artifacts. Working with that code is a developer-friendly experience in another way too. Powerpipe watches your files and reacts instantly when you make changes to either the HCL that configures dashboard widgets or the SQL that populates them. IDG SQL-powered relationship graphs The suite of dashboard components includes everything you’d expect, including tables, charts, infocards, and input widgets. It also includes some things you might not expect, notably the nodes and edges that form relationship graphs. Here’s an example that uses a select widget to choose an InfoWorld author, then builds a graph that relates the chosen author to articles written and to the (perhaps multiple) categories associated with each article. IDG Here’s how the nodes and edges are defined. Two of the nodes query for categories and authors, and the third uses the RSS plugin to query the selected author’s RSS feed. Then two edges connect the nodes. One relates article links to authors, the other relates the same links to categories. It’s all just SQL, leveraged in an unconventional way. -- enumerate categories node { category = category.category sql = <<EOQ select category as id, category as title from infoworld_categories() EOQ } -- enumerate authors node { category = category.author args = [self.input.authors.value] sql = <<EOQ select author as id, author as title from infoworld_authors() where author = $1 EOQ } -- list articles by author node { args = [self.input.authors.value] category = category.post sql = <<EOQ with feed_links as materialized ( select author, 'https://www.infoworld.com/author/' || author || '/index.rss' as feed_link from infoworld_authors() ) select link as id, title as title, author, jsonb_build_object( 'link', link, 'author', author, 'published', published ) as properties from rss_item r join feed_links f using (feed_link) where author = $1 order by published desc EOQ } -- relate articles to authors edge { args = [self.input.authors.value] sql = <<EOQ with feed_links as materialized ( select author, 'https://www.infoworld.com/author/' || author || '/index.rss' as feed_link from infoworld_authors() ) select link as to_id, author as from_id from rss_item r join feed_links f using (feed_link) where author = $1 EOQ } -- relate articles to categories edge { sql = <<EOQ select link as to_id, category as from_id from infoworld_category_urls() EOQ } More typical uses of these relationship graphs help devops engineers understand how the elements of their cloud infrastructures fit together, as for example in this graph which shows how the components of a Kubernetes deployment—deployments, replicasets, pods, containers, servers, and nodes—relate to one another. Each entity in the graph is hyperlinked to another graph that drills into the entity and details its surrounding infrastructure. IDG These visualizations, which are also available for AWS, Azure, and GCP, are an excellent way to explore and understand your cloud architectures. And the queries you write to do that are reusable. You can flow the same data into dashboard charts and tables. SQL-powered benchmarks and controls Powerpipe is also the engine that powers suites of compliance benchmarks, also for AWS, Azure, GCP, Kubernetes, and others. Here’s a benchmark that validates the image URLs embedded in the extensions field of InfoWorld RSS feeds. IDG And here’s how it’s defined. The benchmark includes one control that runs a multistep query to select feed URLs for authors, drill into each item’s media tag, then check the HTTP response code for each URL. control "image_urls" { title = "Image URLS" sql = <<EOT with authors as ( select author, 'https://www.infoworld.com/author/' || author || '/index.rss' as feed_link from infoworld_authors() limit 1 ), image_urls as materialized ( select a.author, r.extensions -> 'media' -> 'content' -> 0 -> 'attrs' ->> 'url' as url from authors a join rss_item r using (feed_link) ), response_codes as ( select i.author, n.url, n.response_status_code from net_http_request n join image_urls i using (url) ) select url as resource, case when response_status_code = 200 then 'ok' else 'alarm' end as status, case when response_status_code = 200 then 'image found' else 'image not found' end as reason, substring(url from 1 for 150), author from response_codes EOT } benchmark "feeds" { title = "Check InfoWorld feeds" children = [ control.image_urls ] } Notable points here: The feed_link column returned from the first CTE (common table expression) joins with its counterpart in the RSS plugin to fetch the feed for each author. Because the back end is Steampipe, which is Postgres-based, the Postgres JSONB operators are available to drill into the media tag and extract the URL. Each url then joins with the corresponding column of the Net plugin—an HTTP client wrapped as a database table!—to check the response code. A control is just a SQL query that returns the required columns status, reason, and resource. You can think of a control as a unit test for data, with Powerpipe as the testrunner. These controls typically support standard compliance suites: CIS, FedRamp, GDPR, HIPAA, NIST, PCI, SOC 2, and more. Mods like AWS Compliance offer broad and deep support for these, building on the equally broad and deep API coverage provided by Steampipe plugins like the one for AWS. But you can build benchmark mods to validate any kind of data, wherever it lives: in ephemeral tables populated by cloud services via plugins, or in your own databases as native tables. Dashboards and benchmarks as code Whether you’re visualizing data with interactive tables, charts, and graphs, or validating data using controls, the model is the same. You use SQL queries to acquire the data, and HCL widgets to display it, with live editing in both cases. The code lives in packages called mods that you can install, create, and remix. Like the Steampipe product from which it was decoupled, Powerpipe is a single binary that you can run locally, or in a cloud VM, or in a CI/CD pipeline. And like Steampipe, it’s available in hosted form at Turbot Pipes where you can collaborate with your team and share snapshots of dashboards and benchmarks. The existing suites of Powerpipe dashboards and benchmarks focus on what devops folk need most: a standard query language, with live access to cloud APIs, embedded in HCL wrappers that live in repositories along with the rest of your managed code. That’s the sweet spot, but with the unbundling of Steampipe you can now use the same technologies more broadly. Related content analysis 7 steps to improve analytics for data-driven organizations Effective data-driven decision-making requires good tools, high-quality data, efficient processes, and prepared people. Here’s how to achieve it. By Isaac Sacolick Jul 01, 2024 10 mins Analytics news Maker of RStudio launches new R and Python IDE Posit, formerly RStudio, has released a beta of Positron, a ‘next generation’ data science development environment based on Visual Studio Code. By Sharon Machlis Jun 27, 2024 3 mins Integrated Development Environments Python R Language feature 4 highlights from EDB Postgres AI New platform product supports transactional, analytical, and AI workloads. By Aislinn Shea Wright Jun 13, 2024 6 mins PostgreSQL Generative AI Databases analysis Microsoft Fabric evolves from data lake to application platform Microsoft delivers a one-stop shop for big data applications with its latest updates to its data platform. By Simon Bisson Jun 13, 2024 7 mins Microsoft Azure Natural Language Processing Data Architecture Resources Videos