Pivot tables are a powerful and accessible tool for business users to analyze data with a drag and drop user interface. By dragging and dropping fields from a dataset into the pivot table configuration, hundreds of different configurations of tables and charts can be created.
This allows users to easily analyze and present data, and end users to easily consume and drill into data without either having to be technically skilled in writing SQL or complex excel formulas.
In this article we are going to break down how a pivot table is structured and how to use a pivot table to analyze your data with a real data example from Kaggle on Avocado Prices.
The structure of a pivot table is relatively simple as they’re made up of 4 different components.
Similar to any data table, rows in pivot tables are used to display values horizontally across columns. If you add a field as a row to your pivot table, it will create row labels in new rows for every unique value in that field from your dataset.
These row labels will then serve as the basis for pivot calculations to be performed on. In the example below, Year was added as a row to the pivot table. It automatically placed each unique Year into the pivot table as a row label, so that any values added will be grouped together and displayed horizontally in the corresponding row.
Again, similar to any data table, columns in pivot tables are used to display values vertically across rows. If you add a field as a column to your pivot table, it will create column labels in new columns for every unique value in that field from your dataset.
These column labels will then serve as the basis for pivot calculations to be performed on. In the example below, Year was added as a column to the pivot table. It automatically placed each unique Year into the pivot table as a column label, so that any values added will be grouped together and displayed vertically in the corresponding column.
Values are the fields that the pivot table will perform aggregate functions on and the result of these functions will represent the data displayed. Typically numeric fields are used for values in order to perform calculations such as sum, average, max, min, and standard deviation. Non-numeric values can also be used to count values in pivot tables but the more common use case will involve numeric values.
If you have rows and/or columns added to your pivot table, then the values will be calculated as aggregates according to the rows and columns used. In the example above where we have Year as our row labels, any values will be calculated as aggregates within that Year. In this example, if we add a field from the Avocado Prices data, titled Total Bags, as our value to be summed, then the pivot table will automatically calculate the sum of Total Bags per Year.
Most common options to summarize values: Sum, Count, Average, Max, Min, StdDev, Var
Most common options to display values as: Percent of Row Total, Percent of Column Total, Percent of Grand Total
Simply put, filters are used to apply filters to the pivot table, limiting the data displayed. Think of filters as the If portion of excel formulas where you are specifying certain criteria to be met. The benefit of using filters in a pivot table is that you can easily point and click to limit your data displayed, whereas in excel or sql you would have to write lengthy, complex, and error-prone formulas.
Continuing with our example above of Total Bags per Year, we can apply a filter to limit the data to only display bags of Organic Avocados. By dragging in the Type field into the filters area and then de-selecting Conventional, we can easily display the Total Organic Bags of Avocados Sold per Year.
Let’s assume you are an analyst that has been asked by your company to answer some questions regarding their avocado sales data. You’ve been asked to answer the following:
If you were asked to answer these questions, SQL is a great option to analyze the data, but this assumes that you have the technical skills to write queries and that you have the database infrastructure in place to run the queries. Another option is to write excel formulas, but as mentioned, this can lead to long, complex formulas that may contain errors.
If this is the case, we recommend leaning on pivot tables. Each of these questions can be easily answered with a few simple clicks.
To answer this question, perform the following steps:
With a few steps you can now view the total volume per month over the past 12 months as well as the month-over-month difference. At this point, you can easily spot outliers, identify trends, and determine the story from the data to present to your stakeholders.
For example, you can quickly discern from this data that total avocado sales volume has been relatively flat for the past 12 months despite a dip from August 2017 through November of 2017. This may prompt you to look at past years’ data to determine if this is a consistent trend that may be seasonal.
As a next step, once you determine the insights to present, you can choose to present the data in a chart if that helps better tell the story vs. a table.
To answer this question, perform the following steps:
(for display purposes I have filtered down to only include the top 5 regions with the highest average price, and the bottom 5 regions with the lowest average price)
From this pivot table, you quickly determine the top regions with the highest average price and the bottom regions with the lowest average price. You’re now setup to explore more interesting analyses by combining other data sources to answer other potential questions such as “how does the average price compare to the cost of living in that region?”, “what are our margins in each region?”, “is there enough demand to shift production from the bottom regions to the top regions?”.
By adding Year as a column to this table, it also presented other interesting insights that may have not been immediately clear by solely answering the question asked. The pivot table makes it clear that the average price in 2018 is over 10% lower than the previous year. This may prompt further analysis that will benefit your stakeholders.
To answer this question, perform the following steps:
Clearly the conventional avocados accounted for the majority of volume sold in Q4 2017. Now you may be prompted to explore if this is a new trend or if conventional avocados have always been the primary driver of sales. Or perhaps the next step is comparing margins and average sales price.
Pivot tables are an excellent tool at your disposal to quickly analyze data to answer questions, spot trends and outliers, and arm end users with powerful insights.
As seen in the examples, pivot tables are also a great means to explore your data and continue to ask more questions that may be helpful to answer. At Superchart, we harness the power of pivot tables for ad-hoc reporting, sales and marketing analyses, people analytics, and much more.
If you're looking for more great tips and tricks, check out these helpful resources:
Also, if you're looking to embed or show the results of a pivot table (or another Airtable visualization or chart on Google Sheets data), be sure to give Superchart a try or free.