Snowflake: 10 Things Every Snowflake Admin Should be Doing to Optimize Credits

Posted by

Snowflake is architecturally different from almost every traditional database system and cloud data warehouse.  Snowflake has completely separated compute from storage and both tiers of the platform are real-time elastic.  The need for advanced resource planning, combing over workloads with a fine-toothed comb, and denying new workloads onto the system over fear of disk and CPU limitation just goes away with Snowflake.  Being a cloud native data warehouse, Snowflake can instantly scale to meet planned, ad-hoc or surprise growth.  This means instead of paying for a fixed limited size of storage and compute, the storage and compute grows and shrinks tightly in-line with you as your needs for Snowflake grows.

By taking advantage of a core tenet of the cloud, elasticity, compute can be dynamically scaled to workloads throughout the day as concurrency or raw compute power fluctuates to meet demand.  Storage will grow and shrink over time for databases, tables, and meta-data.  There are a few optimizations every Snowflake account administrator should be doing along with more advanced methods to be taken into consideration as your Snowflake footprint and sophistication grows.  With compute and storage separated, and completely elastic, these resources should be monitored for usage, surprise growth and resource efficiency.

With the immense power of a completely elastic data platform, comes great responsibility for the Snowflake account administrators.  Snowflake is unlimited by default, and some minor account level and resource level restrictions can be put in place to defend against rogue users or suboptimal use of resources and credits.  You can proactively control compute at the account, warehouse and ultimately user level through resource managers.  Users, databases, tables, queries and workloads can be monitored through the account usage schema shared with all Snowflake accounts.

Alright the above prose worked, or TLDR, either way you are here at the start of your journey.  What things should we be doing as responsible Snowflake Admins?

1 – Auto Resume
Make sure all warehouses are set to auto resume.  If you are going to implement auto suspend and proper timeout limits, this is a must or users will not be able to query the system and your pager will go off.

SHOW WAREHOUSES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_resume" = FALSE;

2 – Auto Suspend
Make sure all warehouses are set to auto suspend. This way when they are not processing queries your compute footprint will shrink and thus your credit burn.

SHOW WAREHOUSES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_suspend" IS NULL;

3 – Long Timeouts
All warehouses should have an appropriate timeout for the workload.
– For Tasks, Loading and ETL/ELT warehouses set to immediate suspension.
– For BI and SELECT query warehouses set to 10 minutes for suspension to keep data caches .warm for end users
– For DevOps, DataOps and Data Science warehouses set to 5 minutes for suspension as warm cache is not as important to ad-hoc and highly unique queries.

SHOW WAREHOUSES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_suspend" > 600;  --Your Threshold in Seconds

--SQL Pro, just do all 3
SHOW WAREHOUSES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "auto_resume" = FALSE OR
"auto_suspend" IS NULL OR
"auto_suspend" > 600;

4 – Account Statement Timeouts
Set warehouse, account, session and user timeout level statements to your data strategy for long running query tolerances.
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS (link)
STATEMENT_TIMEOUT_IN_SECONDS (link)

ALTER WAREHOUSE LOAD_WH SET STATEMENT_TIMEOUT_IN_SECONDS = 3600;
SHOW PARAMETERS IN WAREHOUSE LOAD_WH;

5 – Warehouse Credit Usage Greater Than 7 Day Average
Now this is a handy one that came from a direct interaction with a customer of mine that set a warehouse to a larger size to do a task, but did not put it back the way he found it. We made this query for him to run every morning to check that things are not out of whack from the 7 day average for a warehouse.

SELECT WAREHOUSE_NAME, DATE(START_TIME) AS DATE, 
SUM(CREDITS_USED) AS CREDITS_USED,
AVG(SUM(CREDITS_USED)) OVER (PARTITION BY WAREHOUSE_NAME ORDER BY DATE ROWS 7 PRECEDING) AS CREDITS_USED_7_DAY_AVG,
(TO_NUMERIC(SUM(CREDITS_USED)/CREDITS_USED_7_DAY_AVG*100,10,2)-100)::STRING || '%' AS VARIANCE_TO_7_DAY_AVERAGE
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY"
GROUP BY DATE, WAREHOUSE_NAME
ORDER BY DATE DESC;

6 – Warehouses Approaching Cloud Service Billing Threshold
I am in love with CTEs, learn to commonly express tables it will change your life and increase SQL performance. This one I am particularly fond of as writing it was fun, and useful. This query will look at warehouses where cloud services spend is a high percentage of the workload. Overall for an account, Snowflake will charge credits for a blended daily overage of 10%. These tasks in cloud services are useful for meta-data operations such as BI tool discovery queries, heartbeat queries, show commands, cache usage and several other service optimizing features. So if you use 100 compute credits in a day, but use 15 additional credits in cloud services (unlikely) you will be charged 5 credits for the day for the 5 that were over the 10% allowance. This means 105 Credits total billed for the day, just look at those 10 free credits you were able to use!

WITH
cloudServices AS (SELECT 
     WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH,
     SUM(CREDITS_USED_CLOUD_SERVICES) AS CLOUD_SERVICES_CREDITS, 
     COUNT(*) AS NO_QUERYS 
     FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
     GROUP BY WAREHOUSE_NAME,MONTH
     ORDER BY WAREHOUSE_NAME,NO_QUERYS DESC),
warehouseMetering AS (SELECT
     WAREHOUSE_NAME, MONTH(START_TIME) AS MONTH,
     SUM(CREDITS_USED) AS CREDITS_FOR_MONTH
     FROM "SNOWFLAKE"."ACCOUNT_USAGE"."WAREHOUSE_METERING_HISTORY"
     GROUP BY WAREHOUSE_NAME,MONTH
     ORDER BY WAREHOUSE_NAME,CREDITS_FOR_MONTH DESC)
SELECT *, TO_NUMERIC(CLOUD_SERVICES_CREDITS/NULLIF(CREDITS_FOR_MONTH,0)*100,10,2) AS PERCT_CLOUD 
FROM cloudServices
JOIN warehouseMetering USING(WAREHOUSE_NAME,MONTH)
ORDER BY PERCT_CLOUD DESC;

7 – Unused Tables
Gasp, you might have unused tables that are candidates to be dropped. Just make sure no one is querying these tables. I am going to include 3 ways to look at this, so you might want to check all 3 methods as a mandatory step before deletion. Good thing you have time travel setup and can undrop just in case you make an error. This is database context specific, and I am too lazy giving you an opportunity to write a UDF that creates a view with UNION on all of these across your databases. The account usage tables view is not sufficient as Last_Atered there is DDL altered, not DML. Also, be mindful of tables used only in view DDLs.

--DML from the Information Schema to identify Table sizes and Last Updated Timestamps
SELECT TABLE_CATALOG || '.' || TABLE_SCHEMA || '.' || TABLE_NAME AS TABLE_PATH, 
    TABLE_NAME, TABLE_SCHEMA AS SCHEMA,
    TABLE_CATALOG AS DATABASE, BYTES,
    TO_NUMBER(BYTES / POWER(1024,3),10,2) AS GB, 
    LAST_ALTERED AS LAST_USE,
    DATEDIFF('Day',LAST_USE,CURRENT_DATE) AS DAYS_SINCE_LAST_USE
FROM INFORMATION_SCHEMA.TABLES
WHERE DAYS_SINCE_LAST_USE > 90 --Use your Days Threshold
ORDER BY BYTES DESC;

-- Last DML on Object
SELECT (SYSTEM$LAST_CHANGE_COMMIT_TIME(
'DATABASE.SCHEMA.TABLE_NAME')/1000)::TIMESTAMP_NTZ;

-- Queries on Object in Last NN Days
SELECT COUNT(*) FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE CONTAINS(UPPER(QUERY_TEXT),'TABLE_NAME')
AND DATEDIFF('Day',START_TIME,CURRENT_DATE) < 90;

-- Last Query on Object in Last NN Days
SET TABLE_NAME = 'MY_TABLE';

SELECT START_TIME, QUERY_ID, QUERY_TEXT
FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
WHERE CONTAINS(UPPER(QUERY_TEXT),$TABLE_NAME) 
AND QUERY_ID = (
     SELECT TOP 1 QUERY_ID 
     FROM "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
     WHERE CONTAINS(UPPER(QUERY_TEXT),$TABLE_NAME)
     ORDER BY START_TIME DESC)
AND DATEDIFF('Day',START_TIME,CURRENT_DATE) < 90;

-- Object Used in a View Definition in Last NN Days
SELECT * FROM INFORMATION_SCHEMA.VIEWS
WHERE CONTAINS(VIEW_DEFINITION,'TABLE_NAME')
AND DATEDIFF('Day',LAST_ALTERED,CURRENT_DATE) < 90;

8 – Dormant Users
It’s just a good idea to purge dormant or users that never logged into Snowflake from the system.

--Never Logged In Users
SHOW USERS;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "last_success_login" IS NULL 
AND DATEDIFF('Day',"created_on",CURRENT_DATE) > 30;

--Stale Users
SHOW USERS;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE DATEDIFF('Day',"last_success_login",CURRENT_DATE) > 30;

9 – Find Warehouses Without Resource Monitors
Resource Monitors are a great way to proactively control workload budgets and unexpected resource spikes. This can help with both users and service account usage in Snowflake. First you should have separated loading, ELT, BI, Reporting, Data Science and other workloads by warehouse in Snowflake. Accounts and warehouses can have total, yearly, monthly, weekly, and daily credit quotas.

The following queries will identify all warehouses without a recourse monitor:

SHOW WAREHOUSES;
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
WHERE "resource_monitor" = 'null';

10 – Apply Resource Monitors
You can use the UI or SQL to apply your resource monitor policy. When monitors are in use they can first notify, then suspend the warehouse or account. It is a good idea to have a monitor to at least notify, if not suspend, on all warehouses. When thresholds are hit all account admins in Snowflake will get an email notification or on screen notification of the threshold breach based on account preferences for the account admins.

CREATE RESOURCE MONITOR "CURTLOVESCUBES_RM" WITH CREDIT_QUOTA = 150 
 TRIGGERS 
 ON 75 PERCENT DO NOTIFY 
 ON 90 PERCENT DO NOTIFY;
ALTER WAREHOUSE "CURT_WH" SET RESOURCE_MONITOR = "CURTLOVESCUBES_RM";

Conclusion
Snowflake’s architecture is different and powerful, but with that power comes some ability to be tempted to use too much compute or suboptimal resources for the compute task at hand. Do some of these basic monitoring and resource optimizations to avoid common pitfalls with over credit burn. Happy (efficient) Querying!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s