Tableau & Snowflake: Flattening the Curve at Human Scale

Posted by

If the mission is to “Help people see and understand data” and to “help every organization to be data-driven” then we need to remember the people aspect of the mission for data.  An overwhelming amount of COVID-19 numbers, statistics and percentages have been coming at us from so many authorities, news agencies and fellow data geeks. It’s all too much to make sense of and most of the analysis has lost the human scale.

This morning as I was sipping on my homebrew coffee, and I was thinking about how to change a compound growth rate from a percentage into a human understandable metric.  We need to apply the Rule of 70 for population growth, if you simply take 0.70 and divide by the growth rate percentage by day you can get the number of days it will take cases to double. With the power of Tableau and Snowflake you can slice by Country and State with mere clicks of the mouse to get growth rates relative to your situation.  I am not a medical expert, but I wanted to just make this all more digestible and impactful to the people I love, work with, connect with, interact with and those in my local, state and global community.

(Corrected 26-Mar-2020 8:45PM PST to use 0.70 applying the Rule of 70 instead of 1 in the numerator, using 1 did not account for compounding.)

This one chart attempts to take the growth rate % and turn the analysis into the days cases are doubling.  The higher the number, the longer it takes to double and we flatten the curve.

Click photo to go to live Tableau Public dashboard

What Data Set Are You Using?
I am using this data set from John Hopkins University and Starchema that goes by day, country and state/province:  I will try to update my numbers daily here and on Tableau Public.

Getting and querying information from the Snowflake Data Exchange is dead simple and built on a very powerful premise. After 3 clicks of the mouse I have data that is ready for analysis and always in sync with Starschema’s latest updates. This means I have instant access to a single-source of truth on COVID-19 cases as reported by public health authorities around the world.

The COVID-19 Epidemiological Data currently includes:

  • Global COVID-19 Cases by Country/State/County
  • Daily WHO situation reports
  • U.S. testing data (positive, negative, pending)
  • U.S. hospitalization, recovery, intensive care data
  • U.S. policy data: state actions, policy changes

Thank you Kelly for inspiring work on writing up how to do this here. If you use Tableau Public to create your own dashboard and publish it to Tableau Public, be sure to tag it so we can find it:

  • #COVID19
  • #Snowflake
  • #Starschema
  • #PoweredByTheDataExchange

Thank You
Massive thank yous are in order for the #datafam and the inspiring folks at Starschema, Tableau and Snowflake that want to do something with the medium they love amongst the context we are living.  So glad to have met so many fellow data geeks over the years.

Thank you Starschema for taking the John Hopkins University Data set and pushing it into the Snowflake Data Exchange on a daily basis: GitHub for a more in-depth description of the data included:

Thank you Tableau for the toolkit you have built, Tableau Public and free software for anyone that wants to analyze this data set:

Thank you Snowflake to offering this data globally in the Data Exchange and free trial accounts for those that want to analyze this data set with any analytical tool they can bring to bear:

The Workbook & Calculations
The interactive workbook is here:

Click photo to go to live Tableau Public dashboard
Taking it to a different level with a Hans Rosling Inspired Quadrant over Time GIF

Running Days Since Case ‘N’
I wanted to level each country in the starting blocks to when they crossed ‘N’ cases. I did not want to be tied to the calendar as so many countries are in different phases of this journey.  I first added an index() calculation to Columns to get to number of days. Then created a parameter for N Cases input from 100 to 1000 by 100. The only complex calculation is for is the Day I am looking at greater than N Cases?  I then added this to the filter shelf and only kept true values.

//Is Day > N Cases
SUM([Cases]) > [N Cases]

Sum of Cases & Difference in Cases
I added the country to color and only selected the ‘United States’, ‘Italy’, and ‘United Kingdom’ due to my own affected friends and family.  You can play with filters and N Cases to your heart’s content to see your scenarios with the Dashboards on Tableau Public (see above for link). I then added SUM(Cases) and another copy of SUM(Cases) to the Rows shelf.  That second copy was for creating a Difference From table calculation for getting the daily case shift.

Running N Days % Change
My initial goal was to get to the daily change rate and be able to smooth it by a factor of ‘N’ running days.  Getting to this equation is just a few steps in Tableau. First I added yet another copy of SUM(Cases) to the Rows shelf. Then I opened the add table calculation dialogue and made it a Moving Calculation for the previous 3 Days, for Specific Dimensions Date.  Further I added a secondary calculation, selected Percent Difference From for Specific Dimensions Date. This gives me the static equation for a running 3 days.

I then added this calculation to my Measures by dragging and dropping it in from the Rows shelf.  I Created a Parameter for N Days Running and edited the new calculation I just drug to the Measures. Once in edit mode for the new calculation, I called it Running N Days % Change and changed the -3 to use the parameter for N Days Running.

//Running N Days % Change
(ZN(WINDOW_AVG(SUM([Cases]), -[N Days Running], 0))
LOOKUP(ZN(WINDOW_AVG(SUM([Cases]), -[N Days Running], 0)), -1))
ABS(LOOKUP(ZN(WINDOW_AVG(SUM([Cases]), -[N Days Running], 0)), -1))

The Title
In order to not mislead anyone on my graph I added in the N Cases and N Days Running values to the title.

The Days for Doubling Calculation
Now the goal was to turn the Running N Days % Change into a human scale metric.  Once I took 0.70 applying the Rule of 70 in population growth and divided by the growth rate percentage to get the number of days cases take to double at the current growth rate I knew I had something digestible (yes that’s the whole equation).  I used -0.70 to put the worst performing countries up and to the right in my graph as all the other visualized metrics follow the same pattern.

I instantly called on one of my colleagues working with the CA Government to show them that Social Distancing & Working from Home does help flatten the curve, you could see it right there in the data!

Bonus Calculation: Date of N Cases
Thank you to my colleague Kelly (Tableau Public Profile) for this nesting LOD Logic is brilliant. The calculation finds the date N Cases was reached for the Country. The inner LOD returns the total cases confirmed for a date for a country the outer LOD returns the minimum (first) date the country case count for confirmed cases passes the N Cases parameter.

//Date of N Cases (Country)
{FIXED [Country Region]: 
    MIN(IIF([Case Type]='Confirmed' 
    AND {FIXED [Country Region], [Date]: 
        SUM(IIF([Case Type]='Confirmed',[Cases],0))
        }>=[N Cases],[Date], null))

You too can make a difference and do more than social distancing, working from home and sheltering in place with all of these free tools and data.  Just remember who your audience is, why you are doing this and make your metrics at the human scale.  Happy Querying!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s