Save Time and Compute Costs with Snowflake Streams
by Samuel Lockshin on Feb 16, 2023
Yes Energy® continuously collects energy market data. Knowing when Yes Energy publishes and updates this data helps optimize data pipelines and analytic processes – a feature now possible with the Snowflake Steams functionality in the DataSignals Cloud (DSC) product.
Snowflake Streams provide change data capture (CDC) by recording data manipulation language (DML) transactions (inserts, updates, and deletes) to database objects. With the recent release of underlying tables on DSC, you can create streams on either the existing eligible secure views or tables to:
- Save time and compute resources
- Track and capture “point in time” data
- Maintain full price history
- Optimize your data pipeline.
If You’re New to Snowflake
Snowflake’s Data Cloud is powered by an advanced data platform provided as a self-managed service, enabling data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.
The Snowflake data platform isn’t built on any existing database technology or “big data” software platforms such as Hadoop. Instead, Snowflake combines a completely new SQL query engine with an innovative architecture natively designed for the cloud. To the user, Snowflake provides all of the functionality of an enterprise analytic database, along with many additional special features and unique capabilities. We’re excited to share the recent release of underlying tables on DSC, allowing you to create streams on either the existing eligible secure views or the tables – a complimentary feature for DSC customers.
DataSignals Cloud Improvement
Streams allow for smarter data consumption by enabling your DataSignals Cloud® (DSC) processes to query only data that has changed since the last time these processes checked (Figure 1). This saves time and Snowflake compute resources. You can set up a consumer object, such as a table, to consume the stream content (see our stream documentation for an example). While you can use the ROW_ID column to track changes to a specific row over time, you can also add additional functionality like the example INGESTIONDATE column (Figure 1). This column captures when the data entered the consumer object, which functions as a proxy for the publication time of the data (depending on how fast and frequently you consume the data out of the stream) and captures the relative timing of when changes affect that row. For more information on the stream-specific metadata columns in this example table, see Snowflake’s Stream documentation.
Figure 1. Example table that stores stream content with an additional INGESTIONDATE timestamp column to capture when the stream content enters the table. With Snowflake streams, your processes can track and/or consume only records that have changed since the last time these processes checked.
In addition to data pipeline optimization, streams also provide a way to track and capture data revisions of non-forecast time series not currently vintaged by Yes Energy (e.g., prices). This is often called "point in time" data (Figure 2).
Figure 2. Example of capturing data revisions for the same time interval across two records. Streams capture non-forecast data revisions. The bottom record is the (hypothetical) revision of the top record. Streams therefore help you maintain the full price history.
Streams also grant more visibility into Yes Energy’s data collection processes. For example, you can back into the publication frequency of a data type using streams, which allows you to further optimize your pipelines by only checking for new data at the publication time.
Stream functionality relates to the broader topic of data collection volume and frequency by Yes Energy. Since Yes Energy understands the value of knowing when data is published, we’re constantly researching how to provide more of this information in our data products – for example, by exposing more data publication timestamps. We’re leaders in adopting new technology for data delivery so you can focus your attention on what matters most.
How Snowflake Can Benefit You
Save time and compute resources with Snowflake – a complementary feature for Yes Energy’s DataSignals Cloud customers.
Yes Energy provides documentation and sample code for implementing streams. Snowflake’s Streams documentation is linked in our documentation. Streams are available for major objects on AWS us-east-1 and Azure US East 2 regions.
To see how Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings, schedule a demo.
At Yes Energy, we understand the complexity and unique challenges of nodal power markets. It’s why we’re committed to helping you Win the Day Ahead™ by delivering superior data how and when you want it.
At Yes Energy, we excel at running data at scale – critical when you’re working with terabytes of data. Because you need answers within hours, not days – and sometimes faster.