Snowflake: Long Running Queries Seek & Destroy

Posted by

So, you might have some long running queries in your Snowflake account? Let’s say someone is running a recursive cartesian join because they used this WHERE clause “and (created_on >= ‘2020-08-03 00:00:00’ and created_on < ‘2020-08-31 00:00:00’) or (created_on >= ‘2020-02-03 00:00:00’ and created_on < ‘2020-03-02 00:00:00’)” instead of this one “and ((created_on >= ‘2020-08-03 00:00:00’ and created_on < ‘2020-08-31 00:00:00’) or (created_on >= ‘2020-02-03 00:00:00’ and created_on < ‘2020-03-02 00:00:00’))”…maybe that was me on SnowHouse yesterday…order of operations matter people. Here I never thought 8th grade Algebra II would come in handy in life. Now let’s go back to 8th grade and harness our lessons from Metallica, time to get searching, seek and destroy!

Searching Long Running Queries
Thankfully someone at Snowflake decided to add a function in the INFORMATION SCHEMA where we can find these types of metrics on long running queries. We will go ahead and turn this function into a table function in Snowflake so we can call it with less verbose syntax for reuse with X number of seconds as a threshold. INFORMATION_SCHEMA.QUERY_HISTORY https://docs.snowflake.com/en/sql-reference/functions/query_history.html

Because the INFORMATION_SCHEMA also has functions for QUERY_HISTORY_BY_USER and QUERY_HISTORY_BY_WAREHOUSE you could easily scope this code down from the Account level to specific Users or Warehouses. That might be more useful in Production.

-- Get queries running longer than X seconds
-- Created by David A Spezia July 2019
-- Run this in SnowSQL:
-- select distinct * from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."CATALOG_SALES";>
-- select distinct * from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."CATALOG_SALES";>
-- select distinct * from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."CATALOG_SALES";
 
-- all queries in account version
create or replace function get_long_running_queries ( threshold_in_seconds number )
    returns table (
      query_id varchar,
      user_name varchar,
      warehouse_name varchar,
      start_time TIMESTAMP_LTZ(3),
      total_elasped_time_in_seconds number)
    as  'select 
            query_id,
            user_name,
            warehouse_name,
            start_time,
            datediff(second, start_time, current_timestamp) as total_elasped_time_in_seconds
        from
            table(SNOWFLAKE.INFORMATION_SCHEMA.QUERY_HISTORY(
                END_TIME_RANGE_START => dateadd(second, -threshold_in_seconds, current_timestamp)))
        where 
            EXECUTION_STATUS = ''RUNNING''
            and start_time < dateadd(second, -threshold_in_seconds, current_timestamp())';

-- use the function
-- in production use more than 100 second thresholds
select * from table(get_long_running_queries(100));

User Name Version

-- all queries for user version
create or replace function get_long_running_queries_user ( threshold_in_seconds number, user_name string )
    returns table (
      query_id varchar,
      user_name varchar,
      warehouse_name varchar,
      start_time TIMESTAMP_LTZ(3),
      total_elasped_time_in_seconds number)
    as  'select 
            query_id,
            user_name,
            warehouse_name,
            start_time,
            datediff(second, start_time, current_timestamp) as total_elasped_time_in_seconds
        from
            table(SNOWFLAKE.INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER(
                USER_NAME => user_name,
                END_TIME_RANGE_START => dateadd(second, -threshold_in_seconds, current_timestamp)))
                
        where 
            EXECUTION_STATUS = ''RUNNING''
            and start_time < dateadd(second, -threshold_in_seconds, current_timestamp())';

-- use the function
-- in production use more than 100 second thresholds
select * from table(get_long_running_queries_user(100, current_user()));

Warehouse Name Version

-- all queries for warehouse version
create or replace function get_long_running_queries_warehouse ( threshold_in_seconds number, warehouse_name string )
    returns table (
      query_id varchar,
      user_name varchar,
      warehouse_name varchar,
      start_time TIMESTAMP_LTZ(3),
      total_elasped_time_in_seconds number)
    as  'select 
            query_id,
            user_name,
            warehouse_name,
            start_time,
            datediff(second, start_time, current_timestamp) as total_elasped_time_in_seconds
        from
            table(SNOWFLAKE.INFORMATION_SCHEMA.QUERY_HISTORY_BY_WAREHOUSE(
                WAREHOUSE_NAME => warehouse_name,
                END_TIME_RANGE_START => dateadd(second, -threshold_in_seconds, current_timestamp)))
        where 
            EXECUTION_STATUS = ''RUNNING''
            and start_time < dateadd(second, -threshold_in_seconds, current_timestamp())';

-- use the function
-- in production use more than 100 second thresholds
select * from table(get_long_running_queries_warehouse(100, upper('load_wh')));

Seek and Destroy Long Running Queries
Now we can use the function “get_long_running_queries()” as a baseline for our seek and destroy function. Be careful, this can terminate ALL queries in a Snowflake account if you are playing around.

-- seek and destroy all long running queries for account, user or warehouse scope
create or replace procedure seek_and_destroy_long_running_queries ( THRESHOLD_IN_SECONDS double, TYPE string, NAME string )
    RETURNS STRING
    LANGUAGE JAVASCRIPT
    EXECUTE AS CALLER
    AS
    $$
    //Variables declared here because I am old school and dont declare on the fly like some cowboy
    var result="";
    var rowCount=0;
    
    //Build SQL
    if (TYPE == 'all') {
        var sql00 = `select * from table(get_long_running_queries(` + THRESHOLD_IN_SECONDS + `));`;
    } else if (TYPE == 'user') {
        var sql00 = `select * from table(get_long_running_queries_user(` + THRESHOLD_IN_SECONDS + `,'` + NAME + `'));`;
    } else if (TYPE == 'warehouse') {
        var sql00 = `select * from table(get_long_running_queries_warehouse(` + THRESHOLD_IN_SECONDS + `,'` + NAME + `'));`;
    }

    //Execute SQL
    try {
        var stmt00 = snowflake.createStatement( { sqlText: sql00 } );
        var rs00 = stmt00.execute();
        var rowCount = (stmt00.getRowCount()>0) ? stmt00.getRowCount() : 0;
        //Step through each query_id returned and destroy the rebel spy
        while (rs00.next()) {
            sql01 = `select SYSTEM$CANCEL_QUERY('` + rs00.getColumnValue(1) + `');`
            var stmt01 = snowflake.createStatement( { sqlText: sql01 } );
            stmt01.execute();
        }
        
        result = "Succeeded! Queries Destroyed(" + rowCount + ")";
    }
    
    catch (err)  {
        result =  "Failed: Code: " + err.code + "\n  State: " + err.state;
        result += "\n  Message: " + err.message;
        result += "\nStack Trace:\n" + err.stackTraceTxt; 
    }
    
    return result;
    $$
;

-- use the procedure
-- in production use more than 100 second thresholds
-- type as 'all' , 'user' , 'warehouse'
call seek_and_destroy_long_running_queries(100,'all','all');
call seek_and_destroy_long_running_queries(100,'user',current_user());
call seek_and_destroy_long_running_queries(100,'warehouse',upper('load_wh'));

Conclusion
Now you have some tools to identify and terminate…er…Seek and Destroy long running queries in your Snowflake accounts. This can all be automated by using a task with this code, but it would be easier to set up Account global timeouts with STATEMENT_TIMEOUT_IN_SECONDS (link). Happy Seek and Destroy Querying!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s