Have you ever wondered how to get back a list of files with a prefix from a Snowflake Stage? I did, and then I had a customer ask me last week. I hacked and hacked for a few hours trying to get this answer, searched the web and came up Null.
Snowflake does have a LIST command as documented here: https://docs.snowflake.net/manuals/sql-reference/sql/list.html It works well, but sometimes you need better performance from large cloud buckets or to match a desired prefix on your files.
I tried 42 different Pattern = ‘*XXXX*’ parameters. I kept coming up with a pattern matching a contains function or nothing returned.
Then I had an idea, just put my desired prefix directly in the path…
-- SnowSQL for List Prefix
-- Create stage from public facing S3 bucket
CREATE OR REPLACE STAGE TABLEAU_JSON
URL = 's3://snowflake-workshop-lab/VHOL_Analytics_Tableau/';
-- List with Prefix
LIST @TABLEAU_JSON; -- All Files
LIST @TABLEAU_JSON pattern = '.*.nat.*'; -- I suspect this is a contains
LIST @TABLEAU_JSON pattern = '.*.viz.*'; -- It is a contains
LIST @TABLEAU_JSON pattern = '.*.a.*'; -- 100% contains function
LIST @TABLEAU_JSON pattern = 'nat.*'; -- Fail
LIST @TABLEAU_JSON/nat; -- OMG Yes! prefix is path, path is prefix…
LIST @TABLEAU_JSON/Tab; -- Confirmed
In good news you should see an update to Snowflake’s amazing documentation reflecting this syntax example soon.
Note the path and prefix input string is case sensitive. @TABLEAU_JSON/tab will return no files, where @TABLEAU_JSON/Tab will return two files from the above sample code.
This prefix method also works with directory constructs in your cloud buckets. I advise most clients loading large amounts of data to snowflake to create cloud buckets with the following pattern.
-- Best Practice Staging Pattern
@STAGE/[SchemaName]/[TableName]/[Year]/[Month]/[Day]/<<[Hour]>>
With this method you can COPY INTO or LIST by Table, Table+Year or target Months and specific Days for loading. The following command would address all Days and Hours of files under the Month specified.
-- List files for all Days and Hours under a Month
LIST @STAGE/Schema1/table1/2020/02/
Happy SnowSQL coding out there!