A Hybrid Analytic Schema Using Snowflake DB and Looker — Chapter 6

Adding Looker for dynamic code-generated BI

At this point in building our contact center analytics solution, we have a dynamically updated database schema in Snowflake and are ready to bring in our chosen BI tool. We aren’t going to use just any BI tool, we are going to use Looker. Among the Looker features that are valuable to us are:

  1. SQL generation against a meta-data model, with query push-down to the database — a combination that is a great fit with Snowflake and the other cloud-scale analytic databases. Contrast this with Tableau which until early 2020 liked to do all the joins all the time, and pushed little to none of the SQL down to the database. Tableau is AWESOME in general, and generates more user loyalty than any other BI tool — and for good reason — but it’s not a great fit for this dynamic schema use case.
  2. Has a nice logical model layer, in which you, the model creator, can turn an obscure machine-readable set of database tables into a human-readable logical schema for easier reporting. It takes me back to the good parts of OBIEE, but is dramatically easier to use.
  3. Is configurable almost entirely by writing human-readable config files and then checking them in to git. Seriously, Looker development is based on source code control, and this means that you can generate and automate almost anything you want to. This is very important later in our example.

When building with Looker, the analyst first goes into Development Mode and creates a new “Project” with “Model” and “View” config files, defined using LookML — a deceptively simple text-based markup language. The exact process in Looker is outside of our scope. What the analyst is doing is both “teaching” Looker what the physical schema looks like, including where the joins are, and also creating a more human-readable logical reporting schema on top of the physical schema, with the ability to rename and move columns for better understandability — and we like understandability. The important part is that these Model and View LookML files are human readable, and are managed entirely via source code control in Git. You make a LookML change, check it in, and then deploy it. As an example, below is a .model file for our V2 “Scorecarding” schema. Note that this model file has been stripped down to its basics — and possibly stripped down so far that it no longer works all that well. But the stripped down version does illustrate the important concepts and is easier to understand than the more complete version.

# This Model file is fact_score.model

# Referencing a Looker named DB connection object

connection: “my_named_Snowflake_connection”

# Include all the .view files from our Project

# Views among other things define mappings to physical tables

include: “/views/*.view”

# Define an explore — which starts with a description our Snowflake schema and its joins

# fact_score is the central fact table in our star. The name fact_score references an object

# of that name in a .view file that among other things points to a physical table in Snowflake

# The join directives describe joins to our dimension tables, each dimension also being described in a .view file

explore: fact_score{

from: fact_score

label: “Scorecard Details”

join: date {

sql_on: ${fact_score.call_init_date_key} = ${date.date_key};;

relationship: many_to_one

type: left_outer

}

join: time {

sql_on: ${fact_score.call_init_time_key} = ${time.time_key};;

relationship: many_to_one

type: left_outer

}

join: agent {

sql_on: ${fact_score.agent_key} = ${agent_key};;

relationship: many_to_one

type: left_outer

}

join: caller {

sql_on: ${fact_score.caller_key} = ${caller.caller_key};;

relationship: many_to_one

type: left_outer

}

join: scorecard_question {

sql_on: ${fact_score.scorecard_question_key} = ${scorecard_question.scorecard_question_key};;

relationship: many_to_one

type: left_outer

}

}

The above LookML .model file defines the joins between a central logical fact table fact_score and five dimension tables. The labels like “fact_score”, “date”, and “caller” are actually references to Looker objects defined in .view files that are part of the Project. I our case we have view files like fact_score.view and caller.view defined in the Project and included via the “include” directive. The Model file defines joins, but says nothing about the columns. To define columns, we need those .view files. Below is a .view for our fact table, in this case stripped way down to its basics for simplicity:

# Why another include? It will make sense later

include: “/generated_files/*.view”

view: fact_score {

sql_table_name: “mydb”.”myschema”.”FACT_SCORE”;;

view_label: “Measures”

extends: [category_measures,metric_measures,metadata_measures]

dimension: call_init_date_key {

type: number

hidden: yes

sql: ${TABLE}.”CALL_INIT_DATE_KEY” ;;

}

dimension: call_init_time_key {

type: number

hidden: yes

sql: ${TABLE}.”CALL_INIT_TIME_KEY” ;;

}

dimension: agent_key {

type: number

hidden: yes

sql: ${TABLE}.”AGENT_KEY” ;;

}

dimension: caller_key {

type: number

hidden: yes

sql: ${TABLE}.”CALLER_KEY” ;;

}

dimension: call_key {

type: number

hidden: yes

sql: ${TABLE}.”CALL_GUID” ;;

}

dimension: scorecard_question_key {

type: number

hidden: yes

sql: ${TABLE}.”SCORECARD_QUESTION_KEY” ;;

}

measure: question_score {

type: average

label: “Score”

value_format_name: “percent_1”

description: “Average score earned for this question. Presented as a percentage”

sql: ${TABLE}.”QUESTION_SCORE” ;;

}

}

In this .view file you can see some dimension columns and a measure column defined. This isn’t a Looker tutorial, of course, but as you can see, LookML is just text, and can be as easily generated as typed — which is what we will do soon. But first we need just one more Looker concept — that of “Extends”. Near the top of the above file are two directives that need explaining. Here is the first few lines of the above .view file again:

include: “/generated_files/*.view”

view: fact_score {

sql_table_name: “mydb”.”myschema”.”FACT_SCORE”;;

view_label: “Measures”

extends: [category_measures,metric_measures,metadata_measures]

The extends directive says that this view “extends” three other .view definitions — which in Looker terms means that the dimension and measure columns from those three views get included/appended to this view. Each of those three views happens to be defined in a .view file in the /generated_files folder of the Project, which are brought into scope by the include directive at the top. And — spoiler alert — the LookML in those files is auto-generated by our code, which means that our Looker schema is automatically kept up to date as our JSON schema changes. For completeness, here is one of the three generated .view files:

# This file is automatically generated by our code.

# DO NOT manually edit this file in any way. You can utilize the view defined

# here by creating a Looker view that extends this one

view: categories {

extension: required

# Dimensions

dimension: AskProbingQuestions {

type: string

group_label: “Categories”

group_item_label: “AskProbingQuestions”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.AskProbingQuestions WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: Complaints {

type: string

group_label: “Categories”

group_item_label: “Complaints”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.Complaints WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: EmpathyOnDissatisfaction {

type: string

group_label: “Categories”

group_item_label: “EmpathyOnDissatisfaction”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.EmpathyOnDissatisfaction WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: GreetCaller {

type: string

group_label: “Categories”

group_item_label: “GreetCaller”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.GreetCaller WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: RecordingDisclosure {

type: string

group_label: “Categories”

group_item_label: “RecordingDisclosure”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.RecordingDisclosure WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: TakeOwnership {

type: string

group_label: “Categories”

group_item_label: “TakeOwnership”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.TakeOwnership WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

dimension: ValueProposition {

type: string

group_label: “Categories”

group_item_label: “ValueProposition”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.ValueProposition WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

}

view: category_measures {

extension: required

# Measures

measure: AskProbingQuestions_avg {

type: average

group_label: “Categories”

group_item_label: “AskProbingQuestions AVG”

sql: ${TABLE}.”CALL_JSON”:”categoriesMap”.”AskProbingQuestions”::int ;;

}

measure: Complaints_avg {

type: average

group_label: “Categories”

group_item_label: “Complaints AVG”

sql: ${TABLE}.”CALL_JSON”:”categoriesMap”.”Complaints”::int ;;

}

measure: GreetCaller_sum {

type: average

group_label: “Categories”

label: “GreetCaller AVG”

description: “Count of calls where the Cateogory GreetCaller evaluated to TRUE”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.GreetCaller WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

measure: RecordingDisclosure_sum {

type: average

group_label: “Categories”

label: “RecordingDisclosure AVG”

description: “Count of calls where the Cateogory RecordingDisclosure evaluated to TRUE”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.RecordingDisclosure WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

measure: TakeOwnership_sum {

type: average

group_label: “Categories”

label: “TakeOwnership AVG”

description: “Count of calls where the Cateogory TakeOwnership evaluated to TRUE”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.TakeOwnership WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

measure: ValueProposition_sum {

type: average

group_label: “Categories”

label: “ValueProposition AVG”

description: “Count of calls where the Cateogory ValueProposition evaluated to TRUE”

sql: CASE ${TABLE}.CALL_JSON:categoriesMap.ValueProposition WHEN ‘1’ then ‘Yes’ WHEN ‘0’ then ‘No’ ELSE null END ;;

}

}

This particular Look ML view file contains entries for each of our Categories. And remember that we know about each and every Category in Snowflake because of our JSON inspection code that auto-generates our SQL views. So it’s a rather easy step to imagine auto-creating this LookML by iterating over the View definition in Snowflake. The code that generates these .view LookML files is fairly simple and could be written in any programming language. The two versions that I created happen to use Pentaho Data Integration — aka Kettle — and Python respectively. The Kettle job refreshes the database view by running my custom equivalent of Craig’s stored procedure three times (one for each generated view) and then reads the DDL of the views created, generates the three LookML files, checks them into a git development branch, and then (assuming no conflicts) does a commit and push to Looker, which makes the new model live on the Looker server. The code could be run periodically on a schedule, but in a production system is event based and triggered when the Snowflake views change.

Stepping back a bit to summarize: When the JSON data structure changes by the alteration of attributes, process A updates the Snowflake views, and then Process B updates the Looker meta-data model. Combined we have achieved automatic maintenance of a highly dynamic schema from end to end.

At this point our analysts can update their Looker reports as needed to incorporate the new or changed data elements. After all, adding an Agent Quality Category for example is a relatively infrequent business event accompanied by lots of business discussions — which means plenty of time to make small report changes. But if we want to get really fancy, we can use code generation to maintain the reports as well — a neat trick but not demonstrated here. It is unlikely that we’ll to able to generate all or even most of our reports, given that reports encapsulate business requirements more than they mirror a mere database schema, but even the theoretical ability to generate at least some reports is none the less quite exciting.

In Review, what we have done is this:

  1. Leverage Snowflake to create a database schema that does not need to change when our source data structure changes by the addition or subtraction of data elements.
  2. Automatically maintain database views in our database schema that project the dynamic JSON schema as nice simple rows and columns for reporting, without sacrificing query performance.
  3. Automatically maintain a Looker model to match the database schema.
  4. If desired and where possible, automatically maintain Looker reports to match the Looker model.

And most importantly we have delivered a very understandable star schema AND we have maintained one database row per call (1:1) in spite of having hundreds or thousands of potentially dynamic data elements per call, ensuring very good query performance without the need for exotic hardware. Maintaining one row per call means that our queries still run in ~2 seconds, in spite of having say a thousand dynamic attributes.

Note that both halves of this solution are required for it to be an appealing one. A very flexible, dynamic, and highly performant schema that none the less required tons of manual work in the BI layer whenever the attribute set changed would be highly annoying. But taken together, it’s quite exciting.