This tutorial shows how to use the XLSTAT intelligent pivot table feature, which has substantial advantages compared to the classic Excel pivot tables.
Dataset to create an intelligent pivot table
An Excel sheet (zipped file) with both the data and the results can be downloaded by clicking here. The data were collected during the 1994 Census by the American Census Bureau (http://www.census.gov).
This dataset has been used several times by statisticians to evaluate the predictive performance of new algorithms. Each record contains 8 descriptors about an individual, like age, occupation, education, sex, etc. The number of records has been limited to 8000. The weight variable (allowing each individual to represent a certain percentage of the population) is not used in the example below.
Goal of this tutorial
The goal is here to quickly build a pivot table and a contribution chart that will help the user to understand which factors and combination of factors most influence the fact that an individual has a revenue greater or lower than 50k$ (the corresponding variable is in column J). XLSTAT enables to quickly and easily do this.
Generating an intelligent pivot table
Once XLSTAT is open, select the XLSTAT / Describing data / Pivot command, or click on the corresponding button of the Describing data menu (see below).
Once you have clicked on the button, the Pivot dialog box appears.
Select the data on the Excel sheet. As the first row corresponds to the labels, and as the next rows correspond to data, it is possible to use the quickest selection mode of XLSTAT: select directly columns by clicking on the corresponding letters.
Select the Labels included option as the first row corresponds to the name of the variables.
Note that the explanatory and response variables can be either qualitative or quantitative variables.
As the variable to explain is a qualitative variable, select qualitative for the type of variable. Then select the target modality to be used in the pivot table. In our case, we focus on the ">50K" case.
Then click on Ok so that XLSTAT-Pivot can start the computations. Pivot algorithm is based on classifiaction trees and the CHAID algorithm.
The next dialog box displays the options for creating the optimal pivot tables. Select the variables which you want to use in the pivot tables. The contributions of the variables to the model are displayed next to the variable name (the higher the contribution, the more information it brings to explain the variability of the explanatory variable).
Once you are satisfied with the selection (in this example we did not change anything to the default options), click on Continue.
Interpreting an intelligent pivot table
A new sheet is displayed with a histogram of the contributions of the variables, and a dynamic pivot table.
The chart confirms that the variables that have the highest effect on the revenue are the Marital status followed by the number of years of education.
The dynamic pivot table can display up to 4 values for each combination of categories:
- Target average: Percentage of the cases where the target category of the response variable is present in the case of a qualitative variable; average of the target variable calculated on the sub-population corresponding to the combination in the case of continuous variable;
- Target size: Count of the occurrences of the target category for the response variable in the case of qualitative variable;
- Population size %: Percentage of the overall population corresponding to the combination;
- Population size: Population size corresponding to the combination.
The pivot table is as follows:
We should now analyze the dynamic pivot table, to identify the combinations that most influence the fact that the people earn more than 50k$.
Note that once you have a pivot table, it might be interesting to do a correspondence analysis o to see how the categories of the various explanatory variables are related to each other. To build the input table, keep only the "Target size" values.