Snowflake: Dynamic IN() Clause & Session Variables

Posted by

Have you ever needed to program a dynamic IN() clause in Snowflake and found the process daunting in SQL? It’s fairly easy in JavaScript stored procedures, but it can be daunting in the SQL editor. I found myself needing to write and understand dynamic in lists to populate custom sql clauses from Sigma this week. I usually use the Snowsight UI to build my custom SQL then move it into Sigma as my process. I was planning on sending a comma separated list of values from a Sigma dataset parameter, but when it came time to simulate the parameter value in Snowflake with a session variable I found the task more daunting that anticipated. Well I just had to crack this SQL nut. TLDR: Gist.

Let’s Get Started with Overflow Session Variables
We are going to use “snowflake_sample_data.tpcds_sf10tcl.customer” so everyone with Snowflake can follow along, execute the SQL and understand the concept. I first grabbed a random list of 10 single quoted comma separated emails from the customer dimension and the full list is 296 bytes long. When we try to assign that list to a session variable we get a more than 255 character warning. Not an issue we can break up the list into multiple parts and concatenate it together to the full list. Please be careful with extra commas if you are typing these in.

-- 02.01 IN(List)
-- Let's use email address from customer, grab 10
select distinct char(39) || c_email_address || char(39) || char(44) from snowflake_sample_data.tpcds_sf10tcl.customer limit 10;

-- Check in list length
select length($$
'Jennifer.Hernandez@rGg8bM.edu',
'Marvin.Sullivan@TY7ObD.org',
'Brian.Murray@yiV6ibbUzb.edu',
'Michael.Morales@kA.com',
'Theresa.Turner@XK0Y42DaHK.edu',
'Rosie.Wagner@FaLJjg6Ql4RMT.edu',
'Janine.Jones@IniTA.org',
'Kay.Pearson@gSPx.edu',
'Charlene.Miller@tqBZQrBth.org',
'Tony.Burch@bsDfZOQ3XVnVGPdn.com'
$$) as char_length;

-- 255 Char Max in Session String Variable
set emails_in = $$'Jennifer.Hernandez@rGg8bM.edu','Marvin.Sullivan@TY7ObD.org','Brian.Murray@yiV6ibbUzb.edu',
'Michael.Morales@kA.com','Theresa.Turner@XK0Y42DaHK.edu','Rosie.Wagner@FaLJjg6Ql4RMT.edu','Janine.Jones@IniTA.org',
'Kay.Pearson@gSPx.edu','Charlene.Miller@tqBZQrBth.org','Tony.Burch@bsDfZOQ3XVnVGPdn.com'$$;

-- Break it up
set emails_in_pt1 = $$'Jennifer.Hernandez@rGg8bM.edu','Marvin.Sullivan@TY7ObD.org','Brian.Murray@yiV6ibbUzb.edu'$$;
set emails_in_pt2 = $$,'Michael.Morales@kA.com','Theresa.Turner@XK0Y42DaHK.edu','Rosie.Wagner@FaLJjg6Ql4RMT.edu'$$;
set emails_in_pt3 = $$,'Janine.Jones@IniTA.org','Kay.Pearson@gSPx.edu','Charlene.Miller@tqBZQrBth.org'$$;
set emails_in_pt4 = $$,'Tony.Burch@bsDfZOQ3XVnVGPdn.com'$$;;

-- Append together
select $emails_in_pt1 || $emails_in_pt2 || $emails_in_pt3 || $emails_in_pt4 as emails_in;

-- Check your work
show variables;

Let’s use our Fancy New List in SQL
We are going to grab the first 3 emails and make a quick in clause and test it out. After putting our variable “$emails_in” on the IN() clause and execute the SQL, we get a no results error! We first tried to escape the list with double dollar signs and it still errors out with Snowflake getting the IN($emails_in) as a literal. OK, we can force it work by typing the list explicitly, but what happed to the original query? Let’s go look at query history and that Filter[1] node on the query profile. Now we can see that’s the issue Snowflake gets the list as “​(​UPPER​(​CUSTOMER​.​C_EMAIL_ADDRESS​)​​)​ = ”’JENNIFER​.​HERNANDEZ@RGG8BM​.​EDU”,”MARVIN​.​SULLIVAN@TY7OBD​.​ORG”,”BRIAN​.​MURRAY@YIV6IBBUZB​.​EDU”'”, that’s too much extra string syntax.

-- 02.02 Single Quoted List
-- OK let's play with the first three to get it to work, then work the full list later
set emails_in = $$'Jennifer.Hernandez@rGg8bM.edu','Marvin.Sullivan@TY7ObD.org','Brian.Murray@yiV6ibbUzb.edu'$$;
select $emails_in;
-- Simple test query
---- !Query Produced No Results!
select c_customer_sk, c_first_name || ' ' || c_last_name as full_name, c_email_address
from snowflake_sample_data.tpcds_sf10tcl.customer
where upper(c_email_address) in(upper($emails_in));

-- Make it work type in
---- !Query Produced No Results!
select c_customer_sk, c_first_name || ' ' || c_last_name as full_name, c_email_address
from snowflake_sample_data.tpcds_sf10tcl.customer
where upper(c_email_address)
in(upper($$ $emails_in $$));

-- Make it work type in mk2
select c_customer_sk, c_first_name || ' ' || c_last_name as full_name, c_email_address
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_email_address in('Jennifer.Hernandez@rGg8bM.edu','Marvin.Sullivan@TY7ObD.org','Brian.Murray@yiV6ibbUzb.edu');

OK, Let’s Make it Work Dynamically
We know JSON payloads can behave as an array (list of values) and we can use advanced techniques in Snowflake to make arrays look like tables. So let’s break out the JSON and get cracking on this problem. We are going to append brackets to make this list of values string look like exactly like JSON payload to Snowflake. Now that our string looks like a JSON payload, we can leverage table-flatten-input-parse_json() to get a list values we can use in SQL. We can now use our JSON payload in the in clause by selecting the value from the JOSN list with: n(select upper(value) from table(flatten(input => parse_json($emails_in)))). We are using the upper function to ensure our strings match on the comparison. Now we can put it all together with a concatenated payload of all of emails with double quotes, not single, to make our string appear as JSON. With our final CTE (takes 4 mins to run on a small) we can now see the lifetime store sales stats for our 10 selected customer emails. This is a powerful technique we can apply in other use cases needing user input for dynamic in clauses.

-- 02.03 JSON Array List IN()
-- Hmmm...what can we do?  Hammer Time w/ JSON Array!
set emails_in = $$["Jennifer.Hernandez@rGg8bM.edu","Marvin.Sullivan@TY7ObD.org","Curt.Loves.Cubes@yiV6ibbUzb.edu"]$$;

-- Table flatten method to pull list of values from JSON
select value from table(flatten(input => parse_json($emails_in)));

-- Let's use it in our Customers query
select c_customer_sk, c_first_name || ' ' || c_last_name as full_name, c_email_address
from snowflake_sample_data.tpcds_sf10tcl.customer
where upper(c_email_address) in(select upper(value) from table(flatten(input => parse_json($emails_in))));

-- Put it all together
set emails_in_pt1 = $$"Jennifer.Hernandez@rGg8bM.edu","Marvin.Sullivan@TY7ObD.org","Brian.Murray@yiV6ibbUzb.edu"$$;
set emails_in_pt2 = $$,"Michael.Morales@kA.com","Theresa.Turner@XK0Y42DaHK.edu","Rosie.Wagner@FaLJjg6Ql4RMT.edu"$$;
set emails_in_pt3 = $$,"Janine.Jones@IniTA.org","Kay.Pearson@gSPx.edu","Charlene.Miller@tqBZQrBth.org"$$;
set emails_in_pt4 = $$,"Tony.Burch@bsDfZOQ3XVnVGPdn.com"$$;

-- Lifetime sales for emails in()  takes 4 mins on a small
with
customers as (
    select c_customer_sk, c_first_name || ' ' || c_last_name as full_name, c_email_address
    from snowflake_sample_data.tpcds_sf10tcl.customer
    where upper(c_email_address) in(select upper(value) from table(flatten(input => parse_json(
    '[' || $emails_in_pt1 || $emails_in_pt2 || $emails_in_pt3 || $emails_in_pt4 || ']'))))
)
select customers.*, sum(ss_quantity), sum(ss_sales_price), 
    sum(ss_quantity * ss_sales_price) as sales, to_decimal(sum(sales) over (),38,2) as total_sales
from snowflake_sample_data.tpcds_sf10tcl.store_sales ss, customers
where ss.ss_customer_sk in (select distinct c_customer_sk from customers)
group by all;

Conclusion
We got a dynamic IN() list to work as a JSON array with some really powerful table-flatten-json SQL. Snowflake can do wonders with JSON and arrays in general as showcased here. Now that we have this technique we can use this in conjunction with the BI layer to replace parameters from very dynamic SQL with a long list of items from a multi-value control. More on that later, but for now happy (dynamic in clause) querying!

One comment

Leave a comment