Snowflake: List files from Stage with Prefix or Starts With

Posted by

Have you ever wondered how to get back a list of files with a prefix from a Snowflake Stage?  I did, and then I had a customer ask me last week. I hacked and hacked for a few hours trying to get this answer, searched the web and came up Null.

Snowflake does have a LIST command as documented here: https://docs.snowflake.net/manuals/sql-reference/sql/list.html  It works well, but sometimes you need better performance from large cloud buckets or to match a desired prefix on your files.

I tried 42 different Pattern = ‘*XXXX*’ parameters.  I kept coming up with a pattern matching a contains function or nothing returned.

Then I had an idea, just put my desired prefix directly in the path…

-- SnowSQL for List Prefix 
-- Create stage from public facing S3 bucket
CREATE OR REPLACE STAGE TABLEAU_JSON
URL = 's3://snowflake-workshop-lab/VHOL_Analytics_Tableau/'; 

-- List with Prefix
LIST @TABLEAU_JSON;                      -- All Files
LIST @TABLEAU_JSON pattern = '.*.nat.*'; -- I suspect this is a contains
LIST @TABLEAU_JSON pattern = '.*.viz.*'; -- It is a contains
LIST @TABLEAU_JSON pattern = '.*.a.*';   -- 100% contains function
LIST @TABLEAU_JSON pattern = 'nat.*';    -- Fail
LIST @TABLEAU_JSON/nat;                  -- OMG Yes! prefix is path, path is prefix…
LIST @TABLEAU_JSON/Tab;                  -- Confirmed

In good news you should see an update to Snowflake’s amazing documentation reflecting this syntax example soon.

Note the path and prefix input string is case sensitive. @TABLEAU_JSON/tab will return no files, where @TABLEAU_JSON/Tab will return two files from the above sample code. 

This prefix method also works with directory constructs in your cloud buckets.  I advise most clients loading large amounts of data to snowflake to create cloud buckets with the following pattern.

-- Best Practice Staging Pattern
@STAGE/[SchemaName]/[TableName]/[Year]/[Month]/[Day]/<<[Hour]>>

With this method you can COPY INTO or LIST by Table, Table+Year or target Months and specific Days for loading.  The following command would address all Days and Hours of files under the Month specified.

-- List files for all Days and Hours under a Month
LIST @STAGE/Schema1/table1/2020/02/

Happy SnowSQL coding out there!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s