Snowflake: Dynamic Unload Path (Copy Into Location)

Posted by

Sometimes the need arises to unload data from Snowflake tables and objects into a stage (S3 Bucket, Azure Container or GCS) to support downstream processes.  These processes are typically better served by using a SQL client or integration over Python, .Net, Java, etc to directly query Snowflake. However, sometimes we need to support file dependent processes, messaging services or just support legacy integrations requiring files to be unloaded.  This is easily done in Snowflake with a statement like “Copy Into @BLOBLocation from Object”. Please see the Snowflake documentation for copy into location and documentation for data unloading for a grounding on the topic, and syntax we are investigating here.  The @BLOBLocation variable is a string literal and would more fit for purpose if we could dynamically create the path or filenames.

I was working on a project for one of my large clients in the Bay, and this need for dynamic naming of unloaded files came up.  I spent some time creating a custom demo for the client by dynamically naming the unloaded directory with a JavaScript Stored Procedure in Snowflake.  I have seen this question asked a few times in forums and around the web. Time to document how to achieve this here and some other paths you might think work, but unfortunately do not.

Create Unload Objects
I am going to use a database most Snowflake accounts have as the account creation process from Snowflake adds these for posterity.  I will use DEMO.PUBLIC for the schema, create a stage named UNLOAD and a JSON file format for use during unloading the data. We will be playing with the STORE table from SNOWFLAKE_SAMPLE_DATA as all Snowflake accounts have this data shared with them, but you can use any table just change the FROM clauses in the SQL.

--Context
USE ROLE ACCOUNTADMIN;
USE DATABASE DEMO;
USE SCHEMA PUBLIC;
USE WAREHOUSE LOAD_WH;

--1,500 Record Sample Data Table
SELECT COUNT(*) FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE";

--Create Stage
CREATE OR REPLACE STAGE UNLOAD;
LIST @UNLOAD;

--Create File Format JSON
CREATE or REPLACE FILE FORMAT JSON
    TYPE = 'JSON'
    COMPRESSION = 'AUTO'
    ENABLE_OCTAL = FALSE
    ALLOW_DUPLICATE = TRUE
    STRIP_OUTER_ARRAY = FALSE
    STRIP_NULL_VALUES = FALSE
    IGNORE_UTF8_ERRORS = FALSE;

Testing the Unload Command
I like to code out and test my SQL before committing it to a JavaScript Stored Procedure.  Here we are unloading to the @UNLOAD stage, with the format of JSON. The list command will show the files unload, and the select $1 from will query on the document directly on BLOB.

--Unload JSON to Stage (COPY INTO OBJECT_CONSTRUCT(*))
COPY INTO @UNLOAD FROM (SELECT OBJECT_CONSTRUCT(*) FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE")
    FILE_FORMAT=(FORMAT_NAME='JSON') MAX_FILE_SIZE=1024 OVERWRITE=TRUE;
LIST @UNLOAD;
SELECT $1 FROM @UNLOAD (FILE_FORMAT => JSON) LIMIT 1;

Can’t You Just Use Identifier()?
Spoiler, no.  I did try it out, and a few other dynamic string methods directly in the SQL.  I kept getting reference errors even with UDFs and session variables. I got fancy with it, still nothing.  We will come back to the cascading replace and UDF_NOWASSTRING() functions. The return Date.now().toString() (line 15) will prove most useful later.

--Session Variable Dynamic Pathing
SET SUFFIX = (SELECT LEFT(REPLACE(REPLACE(REPLACE(REPLACE(CURRENT_TIMESTAMP::STRING,' '),':'),'-'),'.'),17));
SELECT $SUFFIX;
SET PATH = '@UNLOAD/' || $SUFFIX || '/';
SELECT $PATH;
COPY INTO IDENTIFIER($PATH) FROM (SELECT OBJECT_CONSTRUCT(*) FROM "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."STORE")
    FILE_FORMAT=(FORMAT_NAME='JSON') MAX_FILE_SIZE=1024 OVERWRITE=TRUE;

--JS UDF Now() as a String
CREATE OR REPLACE FUNCTION UDF_NOWASSTRING()
    RETURNS STRING
    LANGUAGE JAVASCRIPT
    AS
    $$
    return Date.now().toString();
    $$;
SELECT UDF_NOWASSTRING();
SELECT UDF_NOWASSTRING()::TIMESTAMP_LTZ(3);
SELECT TO_TIMESTAMP(SELECT UDF_NOWASSTRING()); 
SELECT LEFT(REPLACE(REPLACE(REPLACE(REPLACE(TO_TIMESTAMP(UDF_NOWASSTRING())::STRING,' '),':'),'-'),'.'),14);

Using JavaScript Stored Procedures
With JSSPs completely dynamic SQL can be written and executed.  Please see the Snowflake documentation for more information and a grounding in Snowflake JSSPs.  We combine OBJECT_CONSTRUCT() with COPY INTO LOCATION to get a JSON payload.  With using the code var dynamicpath = Date.now().toString(); var sql00 = `copy into @unload/` + dynamicpath + `\ (line 19) we can use a string representation of the Java epoch milliseconds in the path.  We also leveraged a ternary operator `? :` (line 32) for IF ELSE in one line avoiding 0 row unloads breaking our code. 

--Put it Into a Stored Procedure
CREATE OR REPLACE PROCEDURE SP_UNLOAD_DYNAMIC()
    RETURNS STRING
    LANGUAGE JAVASCRIPT
    EXECUTE AS CALLER
    AS
    $$
    //Unload JSON to Stage
    //COPY INTO OBJECT CONSTRUCT(*)
    //With Dynamic Pathing with JS MS TS for different unload naming convention
    //Copy into does not load file with same name but different timestamps
    
    //Result Place Holder
    var result="";
    
    //Build SQL
    var dynamicpath = Date.now().toString();
    
    var sql00 = `copy into @unload/` + dynamicpath + `/ from (select object_construct(*) from snowflake_sample_data.tpcds_sf10tcl.store)
        file_format = (format_name='json') max_file_size=1024 overwrite=true;`;
    
    var sql01 = `select count(*) from @unload/` + dynamicpath + `/ (file_format => json);`;

    //Execute SQL
    try {
        var stmt00 = snowflake.createStatement( { sqlText: sql00 } );
        stmt00.execute();
        
        var stmt01 = snowflake.createStatement( { sqlText: sql01 } );
        var rs01 = stmt01.execute();
        rs01.next();
        var rowCount = (stmt01.getRowCount()>0) ? rs01.getColumnValue(1) : 0;
         
        result = "Succeeded! Rows Unloaded(" + rowCount + ")";
    }
    
    catch (err)  {
        result =  "Failed: Code: " + err.code + "\n  State: " + err.state;
        result += "\n  Message: " + err.message;
        result += "\nStack Trace:\n" + err.stackTraceTxt; 
    }
    
    return result;
    $$;

Now testing and calling the Stored Procedure is a breeze.

--Call SP
CALL SP_UNLOAD_DYNAMIC();

--Inspect Results
LIST @UNLOAD;
--Clean Up After Yourself
REMOVE @UNLOAD;

JavaScript Stored Procedure Taking a Dynamic Input
You might not always want the Java epoch in the Path.  I wanted to reuse my session variable and UDF from early experiments when calling the JSSP.  Easily coded into the JSSP for making things fully dynamic and callable as you see fit with constants, session variables or any constructed string.

--SP for Feeding in Dynamic Path Variable
CREATE OR REPLACE PROCEDURE SP_UNLOAD_DYNAMIC_PATH(PATH STRING)
    RETURNS STRING
    LANGUAGE JAVASCRIPT
    EXECUTE AS CALLER
    AS
    $$
    //Unload JSON to Stage
    //COPY INTO OBJECT CONSTRUCT(*)
    //With my path input variable unload naming convention
    //Copy into does not load file with same name but different timestamps
    
    //Result Place Holder
    var result="";
    
    //Build SQL
    var sql00 = `copy into @unload/` + PATH + `/ from (select object_construct(*) from snowflake_sample_data.tpcds_sf10tcl.store)
        file_format = (format_name='json') max_file_size=1024 overwrite=true;`;
    
    var sql01 = `select count(*) from @unload/` + PATH + `/ (file_format => json);`;

    //Execute SQL
    try {
        var stmt00 = snowflake.createStatement( { sqlText: sql00 } );
        stmt00.execute();
        
        var stmt01 = snowflake.createStatement( { sqlText: sql01 } );
        var rs01 = stmt01.execute();
        rs01.next();
        var rowCount = (stmt01.getRowCount()>0) ? rs01.getColumnValue(1) : 0;
         
        result = "Succeeded! Rows Unloaded(" + rowCount + ")";
    }
    
    catch (err)  {
        result =  "Failed: Code: " + err.code + "\n  State: " + err.state;
        result += "\n  Message: " + err.message;
        result += "\nStack Trace:\n" + err.stackTraceTxt; 
    }
    
    return result;
    $$;
    
--Call SP 3 Ways
--String Constant
CALL SP_UNLOAD_DYNAMIC_PATH('CurtLovesCubes');
LIST @UNLOAD;
--Variable from UDF_NOWASSTRING()
SET PATH = (SELECT UDF_NOWASSTRING());
CALL SP_UNLOAD_DYNAMIC_PATH($PATH);
LIST @UNLOAD;
--Pro Variable from Current_TimeStamp
SET PATH = (SELECT LEFT(REPLACE(REPLACE(REPLACE(REPLACE(CURRENT_TIMESTAMP::STRING,' '),':'),'-'),'.'),17));
CALL SP_UNLOAD_DYNAMIC_PATH($PATH);
LIST @UNLOAD;

--Clean Up After Yourself
REMOVE @UNLOAD;

Conclusion
Now you can go out and create dynamically named subdirectories to your heart’s content when unloading data.  This logic can also be used to create dynamic names for files other than data_0_0_0.ext.compression too. Remember to clean up your unloaded files after use with a remove command to avoid storage growth.  Happy Querying!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s