Share this
Exploratory Data Analysis: using Pandas Profiling for Easier & More Precise Data Science Projects
by Gaby Flores on May 01, 2020
Exploratory Data Analysis (EDA) is an approach to data analysis that can help shed more light on your data set before moving onto the rest of your data science project. Adding EDA to your process can make your analysis easier, more precise, and can suggest new hypothesis to test. EDA can take on many forms, but it often includes analysis around datatypes, statistical summaries, histograms, correlation plots, and missing data analysis.
The data scientists here at Yes Energy that use Python are using pandas to perform EDA. Pandas is a package that allows you to create dataframes and perform analysis. It provides great information by running functions like describe, info, corr and isnull within the pandas package to start your EDA. Using pandas you can also leverage histograms and other charts to visualize the data relationships and distribution as part of your EDA.
We’ve taken things even further with our EDA by starting to rely on pandas_profiling for our basic EDA. Pandas_profiling returns a report with everything mentioned above - and more!
In this blog post we’ll walk you through an EDA example using panda_profiling on PJM DART Spreads, Wind Generation and Load.
Start your analysis with a simple installation of the library:
pip install pandas-profiling
Once installed, you can run a profile against any dataframe. Here’s an example of the report returned based upon our dataset -we’ve named this dataframe “data”- that we’re preparing for some machine learning.
pandas_profiling.ProfileReport(data)

In the first section of the report we can learn about the size of our data. It is helpful to identify the severity of missing data within the dataframe. We also have a summary of the variable types. The Warnings section is useful to identify data concerns and potential areas where you might want to do additional processing and engineering. This can include constants, potential date type transformations needed and identification of highly correlated variables.

The next section of the report includes valuable statistical reports on each variable. The example above includes statistics on wind data. It allows us to understand not only the mean, min, and max, but also the skewness of the data and shape via histograms.

The next section will include correlations on the various items in the dataframe. Here we can see correlations between schedules and actual data. We can also see relationships between wind and capacity as well as offline capacity and scheduled capacity.
The other sections of the profile report include deeper analysis around missing data and snapshots of the first and last rows.
Once you’ve run a profile of the data, it’s much easier to identify where you might need to review inputs to improve your models.
With over 1400 datatypes in DataSignals, the use cases of the type of EDA are truly endless. But here are a few examples of other data science projects that you could apply panda_profiling to:
-
Reviewing data drivers associated with DART Spreads
-
Analyzing indicators of Real-Time pricing events
-
Identifying variables that result in stronger constraint binding intervals
-
Any data prep for an econometric model
If you want to see an example of pandas_profiling in action, let us know. We’re happy to share more examples of this using our DataSignals API!
Share this
- Industry News & Trends (73)
- Data, Digital Transformation & Data Journey (45)
- Market Events (30)
- Market Driver Alerts - Live Power (23)
- ISO Changes & Expansion (21)
- Renewable Energy (17)
- Energy Storage / Battery Technology (16)
- Power Markets 101 (9)
- Risk Management (7)
- Utilities (7)
- Financial Transmission Rights (5)
- Live Power (3)
- Snowflake (3)
- PeopleOps (2)
- September 2023 (1)
- August 2023 (6)
- July 2023 (2)
- June 2023 (2)
- May 2023 (6)
- April 2023 (3)
- March 2023 (2)
- February 2023 (2)
- January 2023 (5)
- December 2022 (2)
- November 2022 (1)
- October 2022 (3)
- September 2022 (5)
- August 2022 (5)
- July 2022 (3)
- June 2022 (3)
- May 2022 (1)
- April 2022 (3)
- March 2022 (3)
- February 2022 (4)
- January 2022 (4)
- December 2021 (2)
- November 2021 (4)
- October 2021 (4)
- September 2021 (4)
- August 2021 (3)
- July 2021 (5)
- June 2021 (5)
- May 2021 (3)
- April 2021 (3)
- March 2021 (4)
- February 2021 (3)
- December 2020 (3)
- November 2020 (4)
- October 2020 (3)
- September 2020 (5)
- August 2020 (2)
- July 2020 (2)
- June 2020 (1)
- May 2020 (9)
- April 2020 (1)
- November 2019 (1)
- August 2019 (2)
- June 2019 (2)
- May 2019 (2)
- January 2019 (1)
- December 2018 (1)