Share this
Creating Energized Data
This is the second article in a four-part series on Energized Data, written by Yes Energy’s Director of Data Products, Sonya Gustafson. In this series, Sonya explains what Energized Data is, how to create it, analyze it, and empower it within your organization.
As a reminder, Energized Data is a standard of best practices by which data is not only collected and engineered but also integrated into systems and processes throughout the organization. It requires a deeper understanding of the data and use case - leveraging technical skills as well as domain expertise. You can amplify the use of Energized Data by collaborating across the organization to understand and execute data plans.
Collecting and Engineering Data
In this article, we’ll cover the first attribute associated with Energized Data - collection and engineering. I’ve seen a number of funny internet memes that compare data engineers to magicians, wizards, and ninjas, and I’d be lying if I didn’t think those were hidden elements of success. But more honestly, I believe Yes Energy is successful because we have a team of experts. Some of our team are incredibly technical with advanced database, SQL, and Python skills, others have years of experience with power markets. We connect the dots with expertise in product and project management. Here are more specifics associated with the best practices Yes Energy applies to its data.
Yes Energy follows an Agile framework methodology for our data development processes. This allows us to move quickly but also ensures we’re always considering the end-user and the use case. Before we collect data, we identify what problems it may solve, both for our customers and ourselves. This allows us to consider not only what we collect, but how quickly it needs to be loaded, any transformations that can occur within the collection, and the downstream data work that needs to be considered when we store the data. By identifying the user needs early, we can avoid situations where we may need to collect more than once as a result of scope changes. And we can avoid loading data that may not have a fully formed end goal.
That’s not to say that a sometimes greedy approach to data collection won’t work - gathering as much data as possible may yield results as well. But Energized Data requires consideration of a long-term data strategy, specifically, will you be able to maintain this data collection? For example, in one hour our collection system processed over 18,000 collection events. This loaded 5.7 million rows of data through 390 unique load mechanisms. We obviously don’t say no to data! Each successful automated collection requires monitors in place to ensure we’re doing right by that data (and our customers) before it lands in our systems. Haphazard data collection can create a maintenance nightmare. Be thoughtful with data collections.
Maintaining Data Collection
This leads us to the next best practice for data collections: maintaining the system and process. These are the questions we ask ourselves before productionizing data:
-
Do we expect the data source to change?
-
How will we monitor for said changes?
-
Will the data change and how will we capture the revisions?
-
Are there versions of data that should be prioritized?
-
What happens if this collection creates 2500 errors, how will we fix it as quickly as it’s needed for analysis?
-
Are we looking for (and potentially cleaning around) gaps, outliers, and delays?
We’ve found it helps to have a team dedicated to monitoring sources and markets. They provide early warning before data breaks so we aren’t waiting until the last minute to fix problems. We also have embraced data operations and grown both manual and automated quality assurance functions to keep up with the multitude of ways data can evolve over time. The old mantra of garbage in, garbage out is very true here – so quality must be a priority for Energized Data.
Making Data Useable
Often, you need to do more than just collect data to make it useful. After quality, our data engineers prioritize creating useful information from these rows of collected data. Data engineering can sometimes be simple – clean, standardized time series data is immensely valuable and rarely the format you’ll see within every raw data source. And standardized data is more than just sequential data that are labeled with a timestamp –- Can our customers quickly use ten-second data in a model that relies on hourly data?
- How will we accurately aggregate that gappy semi-five-minute series into a daily average?
Comparing Data
Another value-add in data engineering to consider is how the data relates to other data. A row of data in San Diego, California, probably doesn’t need to be compared to a row of data in Portland, Maine, but it should be compared to a row of data in Los Angeles. Geospatial relationships and geographic information allow users to quickly build sets of relevant data.
Reduce data complexity by using hierarchical structures to formulate data relationships. Like the data type example from the prior paragraph, we’ve learned relationship mapping helps when our customers need to solve a problem in ERCOT (the Texas power market) that they already solved in NYISO (the New York power market). The customer’s time-to-insight is significantly reduced if they don’t have to “get on the ground” to learn the ins and outs of the local geography.
Converting Data
We also perform repeated transformations early. One example here: many weather data providers report data in Celsius and the easiest way to collect this data would be to throw the data into a weather data table in its originally published format. However, the bulk of our data is centered around US markets and the default unit of temperature measurement here is Fahrenheit. Storing the data in Celsius could lead to inaccurate data analysis and confusion. Furthermore, if every downstream user of the data is ultimately converting this to Fahrenheit – why not do that early on to create more efficiency?
Yes Energy converts it once so everyone can save time. This is also true for unit conversions (like $/MWh to $/MW-month) and other common calculations. In the next part of this series, the pipeline discussion, I’ll provide more insight into these conversions – including how to evaluate when it makes sense to store the conversion versus having it easily accessible and available on the fly.
Telling a Story with Data
It’s important to layer in domain expertise throughout this process. Some of this is natural if you’re following best Agile practices for the data engineering side of the business. We bring in customers and context experts to help create the development stories. We say “Yes” to our customers here at Yes Energy and a big part of that is asking a lot of questions –
- What data do you want to use alongside Yes Energy data?
- What’s the business problem you are trying to solve?
- How is our roadmap aligned with your roadmap?
Some of our best practices were learned through mistakes – we’ve gotten deep into data engineering projects and realized at the end phase that our entire calculation was incorrect because we missed a component early on in the design. Leveraging our team’s domain expertise earlier would have allowed us to put these requirements in the initial project spec.
This gets us to the final step in the data side of Energized Data – embrace flexibility and be ready to pivot. We specifically hire curious people who want to roll up their sleeves and solve extremely complex data problems. It requires confidence within the team to create the space and flexibility to solve problems in entirely new ways. These nimble, creative teams create faster, better results.
The next step in Energized Data builds off of this – finding the right tools for the job, which requires that same level of flexibility. But we’ll save that for our next blog post, Analyzing Energized Data.
Want to receive our latest blog insights?
These are best practices we’ve applied to Yes Energy’s data. Do you want to use this data to energize your business? Arrange a consultation with our team.
Share this
- Industry News & Trends (98)
- Power Traders (72)
- Data, Digital Transformation & Data Journey (44)
- Asset Managers (42)
- Market Events (30)
- Asset Developers (28)
- Utilities (28)
- Market Driver Alerts - Live Power (25)
- ERCOT (24)
- ISO Changes & Expansion (22)
- Renewable Energy (21)
- PowerSignals (20)
- Infrastructure Insights Dataset (18)
- Energy Storage / Battery Technology (17)
- Live Power (17)
- DataSignals (16)
- Risk Management (16)
- TESLA Forecasting (16)
- Data Scientists (13)
- CAISO (12)
- PJM (9)
- Power Markets 101 (9)
- QuickSignals (9)
- MISO (8)
- Position Management (8)
- SPP (8)
- EnCompass (7)
- Financial Transmission Rights (6)
- Snowflake (6)
- Submission Services (6)
- Powered by Yes Energy (5)
- Asset Developers/Managers (4)
- Data Centers (4)
- Solutions Developers (4)
- Commercial Vendors (3)
- FTR Positions Dataset (3)
- Geo Data (3)
- Battery Operators (2)
- Independent Power Producers (2)
- PeopleOps (2)
- AI and Machine Learning (1)
- Crypto Mining (1)
- Europe (1)
- FERC (1)
- ISO-NE (1)
- Japanese Power Markets (1)
- Natural Gas (1)
- Western Markets (1)
- hydro storage (1)
- November 2024 (3)
- October 2024 (6)
- September 2024 (5)
- August 2024 (7)
- July 2024 (9)
- June 2024 (5)
- May 2024 (7)
- April 2024 (8)
- March 2024 (6)
- February 2024 (9)
- January 2024 (7)
- December 2023 (4)
- November 2023 (5)
- October 2023 (6)
- September 2023 (2)
- August 2023 (6)
- July 2023 (3)
- May 2023 (4)
- April 2023 (2)
- March 2023 (2)
- February 2023 (2)
- January 2023 (5)
- December 2022 (2)
- November 2022 (1)
- October 2022 (3)
- September 2022 (5)
- August 2022 (5)
- July 2022 (3)
- June 2022 (3)
- May 2022 (1)
- April 2022 (3)
- March 2022 (3)
- February 2022 (6)
- January 2022 (3)
- December 2021 (2)
- November 2021 (4)
- October 2021 (4)
- September 2021 (3)
- August 2021 (2)
- July 2021 (4)
- June 2021 (5)
- May 2021 (3)
- April 2021 (3)
- March 2021 (4)
- February 2021 (3)
- December 2020 (3)
- November 2020 (4)
- October 2020 (2)
- September 2020 (5)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- May 2020 (9)
- November 2019 (1)
- August 2019 (2)
- June 2019 (2)
- May 2019 (2)
- January 2019 (1)