A Hybrid Analytic Schema Using Snowflake DB and Looker — Chapter 1

Flexible and 200x faster queries with no funny business

Jeff Shukis
4 min readMar 15, 2021

Abstract

My goal is to introduce you to a novel analytic database schema that solves a problem commonly found as analytic database schemas evolve and scale. First I’ll describe the use case, then I’ll outline a typical analytic schema evolution, showing how problems arise. Then I’ll introduce a new style of Hybrid database schema using Snowflake DB, with a corresponding meta-data model in Looker, showing 200x better query performance without sacrificing flexibility.

Why are we so bad at analytic schemas?

Analytics — the use of math to answer business questions — has long been a favorite topic of mine. From ETLT to analytic databases to BI tools and visualizations, I’m a fan. And so are LOTS of other people these days; when was the last time you saw a resume not sprinkled with “big data” and “analytics” keywords?

But holy cow does market demand exceed the number of knowledgeable workers, the result of which is a plethora of square pegs mashed into round holes and, ultimately, less than successful projects. Perhaps the most cringe-worth mistake I see is the “all toolchain and no schema” anti-pattern where, at the start of the project, the team selects an enormous and resume-enhancing set of fully buzzword-compliant tools, and quickly starts coding — because “agile”, because the tools can do “anything”, and because “data lake” means no need to design a good reporting schema.

But the too-common result is a reporting database schema whose enormous flaws are “iteratively addressed” until the result is a still-enormously-flawed schema with a pile of partially effective band-aid improvements, all so baked into the project that there’s just no time for a redo. Such schemas live on for years!

A Step Back — How Schemas Go Wrong

BI tools, whether Tableau, Looker, PowerBI or other, work best with well-structured database schemas. Those schema also tend to be easier for business users to understand. The venerable Star Schema is the classic example — you have read Kimball, right? These days, however, our data sets often live in the “Data Lake” where schemas are fluid, quirky, or — perhaps worst of all — entirely optional. The traditional approach is to “ETL” this ropey source data into an orderly Star Schema, which makes the BI analyst happy but is murder on your ETL team given the murky Data Lake provenance. When that proves too rigid and too brittle the data team often “pivots” to a more abstracted and much finer grained schema. Finer-grained schemas increase flexibility and can reduce fragility, but sacrifice understandability and ruin performance.

True story: I recently saw a BI query that consumed almost a Terabyte of database RAM to run a report that analyzed under a million business events… which had been decomposed into around half a billion database rows in an analytic schema. But oh boy was that schema flexible!

Modern computing significantly insulates us from the horrors of a sub-par reporting schema, at least for a while. The customer mentioned above actually had a Terabyte of database RAM to throw at this report, for example. Analytic databases today have dramatically more CPU, RAM and I/O than they did even a few years ago, so a schema that is even two orders of magnitude too complex may still be considered “acceptable” as measured by query response time, at least initially. But as the data grows, it soon uses up all of our big cloud database resources, and once again we are fighting about performance and understandability versus flexibility.

Good News — It’s Solvable

But this is a happy story about not always having to make the tradeoff between flexibility and performance. The tale takes the form of a technical proof of concept that shows Snowflake and Looker, plus a novel hybrid analytic schema, used to deliver some surprisingly good results. Good results as in 200x faster queries.

Because the Internet favors brevity, I have broken up the proof of concept into six parts:

Full disclosure: I work for neither Snowflake nor Looker, and neither company knows anything about this article as of the publication date — and that is on purpose.

A Quick Preview of Results:

Standard Schema: 370 second queries, but lots of flexibility!
Snowflake Hybrid Schema: Equally flexible, still no indexes, no materialized views, and no funny business, but ~2 second queries.

Next Chapter

Chapter Two: The Contact Center Analytics Use Case

--

--