A Hybrid Analytic Schema Using Snowflake DB and Looker — Chapter 2

Jeff Shukis
4 min readMar 15, 2021

The Contact Center Analytics Use Case

This is Chapter 2 in the series. Chapter 1, which introduced the topic, is here.

To demonstrate our “Hybrid” analytic schema we could select almost any use case in almost any industry — data is data after all. But the Contact Center is an area I know well, so I’ll use it as our example use case.

Contact centers are those places you reach out to in order to get support, place an order, or — far more likely — complain about something gone wrong. Contact Centers these days handle texts, chats, emails, and web inquiries in addition to phone calls, and contact center analytic schemas have grown to handle all of these forms of contact, usually by re-labeling “calls” to something like “interactions” or “touches”, but that generalization makes the database schema a bit difficult to understand for those outside of the industry, so we will stay simple. Our example analytic schema will analyze “Calls” in a “Contact Center”. For each call, we’ll include:

  • Metadata like the date/time of the call, lots of agent information, and even more caller information
  • Any number of “Scorecards and Questions” used to evaluate agent performance
  • Facts generated by queries, classification algorithms, and/or added manually by humans to further enrich the calls with delicious reporting dimensions — let’s call them “Metrics” and “Call Categories” since those are familiar concept in the Contact Center.

The end result is that each call includes dozens to a few thousand reportable attributes, with the exact set varying over time and even from call to call — an example of data that is sparse and very dynamic.

Example Call Data

I like to understand a complex data problem by first understanding its most basic data elements. Along that line of thought, here is an example of data from one (entirely fabricated) call — in JSON format. If some of the field names looks a bit familiar, then you are probably an Amazon Connect user, but the concepts are universal, even if the field names are specific. Note that real calls in real systems often have 10x or even 100x more data than in this simplified example:

Example data for one call:{
"call_id": "e8fb71d2-4ecd-40d5-a02f-6cd101c544ae",
"metadata": {
"call_length": 516956,
"AWSAccountId": "6818483976764",
"AWSContactTraceRecordFormatVersion": "2020-03-10",
"Agent": {
"ARN": "arn:aws:connect:us-east-1:6858483776764:instance/491e99fe-a350-45da-7026-edf09bd02c44/agent/546bc8dc-51e5-4275-b187-0b94f5bc3ddc",
"AfterContactWorkDuration": 41,
"AfterContactWorkEndTimestamp": "2018-05-19T18:43:58Z",
"AfterContactWorkStartTimestamp": "2018-05-19T18:43:58Z",
"AgentInteractionDuration": 516,
"ConnectedToAgentTimestamp": "2018-05-19T18:34:40Z",
"CustomerHoldDuration": 28.6,
"LongestHoldDuration": 28.6,
"NumberOfHolds": 1,
"Username": "Jeff Shukis",
"AgentTenure": "17",
"AgentManager": "Franz Kafka",
"ContactCenter": "Prague"
},
"AgentConnectionAttempts": 1,
"Attributes": {},
"Channel": "VOICE",
"ConnectedToSystemTimestamp": "2020-03-10T18:30:17Z",
"ContactId": "bd609dee-fbc0-8056-922b-91dcd0e2395a",
"CustomerEndpoint": {
"Address": "+5556726349",
"Type": "TELEPHONE_NUMBER"
},
"CallerName": "Chris Smythe",
"DisconnectTimestamp": "2020-03-10T18:43:17Z",
"InitiationMethod": "INBOUND",
"InitiationTimestamp": "2020-03-10T18:30:17Z",
"InstanceARN": "arn:aws:connect:us-east-1:353569984171:instance/da44dd83-a350-45da-8056-edf09bd02c44",
"LastUpdateTimestamp": "2020-03-10T18:44:00Z",
"Queue": {
"ARN": "arn:aws:connect:us-east-1:6858483776764:instance/da44dd83-a350-45da-8056-edf09bd02c44/queue/883f69f2-02a7-46af-89e0-db45904b75a1",
"DequeueTimestamp": "2020-03-10T18:34:40Z",
"Duration": 261,
"EnqueueTimestamp": "2020-03-10T18:30:19Z",
"Name": "RetentnionQueue"
},
"Recording": {
"Location": "connect-68584237176764/CallRecordings/2020/03/10/da44dd83-03a5-4aa1-a9b5-3f35f41110bd_20200310T17:16_UTC.wav",
"Status": "AVAILABLE",
"Type": "AUDIO"
},
"SystemEndpoint": {
"Address": "+5557666612",
"Type": "TELEPHONE_NUMBER"
}
},

"Scorecards": {
"Script Adherence": {
"greeting": 100,
"identification": 100,
"discovery": 90,
"verification": 65,
"wrap Up": 50
},
"Objection Handling": {
"price objection observed": 1,
"price objection handled": 0
},
"Lead Scoring": {
"initial lead score": 15
},
"Compliance": {
"required disclosures": 100,
"affirmative acknowledgement": 100
}
},
"Categories": {
"GreetCaller": 1,
"AskProbingQuestions": 0,
"TakeOwnership": 1,
"ValueProposition": 1,
"RecordingDisclosure": 1,
"Cancellation": 0,
"Complaints": 0,
"EmpathyOnDissatisfaction": 1
},
"metrics": {
"agent_intra_call_change_in_talk_rate": 1,
"caller_talk_rate": 135,
"average_streak": 66.5,
"agent_median_streak": 78,
"caller_talk_ratio": 0.292138,
"caller_intra_call_change_in_talk_rate": 0.287735,
"agent_to_caller_talk_rate_ratio": 1.182248,
"silence_incidents": 22,
"caller_median_streak": 82,
"agent_talk_ratio": 0.707862,
"silence_ratio": 0,
"agent_talk_rate": 160,
"caller_overtalk_incidents": 5,
"overtalk_ratio": 0,
"agent_overtalk_ratio": 0.108712,
"caller_overtalk_ratio": 0,
"agent_overtalk_incidents": 3,
"overtalk_incidents": 8,
"agent_intra_call_change_in_pitch": 0.470889,
"agent_intra_call_change_in_relative_voice_volume_energy": 0.547736,
"caller_intra_call_change_in_relative_voice_volume_energy": 0.646344,
"caller_relative_voice_volume_energy": 1.517835,
"agent_relative_voice_volume_energy": 3.350217,
"caller_intra_call_change_in_pitch": 0.442852,
"call_sentiment": 0.730396,
"caller_intra_call_change_in_sentiment": 0.408571,
"agent_sentiment": 0.730396,
"call_change_in_sentiment": 0.436955,
"caller_sentiment": 0.850829,
"agent_intra_call_change_in_sentiment": 0.465339
}
}

So that’s our per-call data data structure. In addition, our system will also include an audio recording of the call and a text transcript. But our analytics will be performed using the data elements above.

Just to reiterate, a given call may have exactly the fields shown in the example above, or fewer fields, or there may be additional fields. And as in many domains, the set of fields tends to change over time, especially the business-defined ones as opposed to the infrastructural ones.

Reporting and Analytic Requirements

In a real project we’d spent quite a bit of time defining our reporting requirements — and in fact those requirements would largely drive our reporting schema. For this series of articles, however, we’ll just stipulate that we plan to use BI and dashboard tools to report on various metrics and their trends over time, sliced and diced by various customer attributes, agent and contact center attributes, and the computed facts and attributes in the call — all of the data defined in the JSON — and that all of the attributes are equally valuable. This is not helpful in real life, but it is helpful to keep this part of the article shorter.

Next Chapter

Now let’s design an analytic and reporting database schema.

Chapter Three: Star schema goodness for basic reporting

--

--