Filtering observations and variables in PCA charts

This tutorial shows how to improve the readability of Principal Component Analysis (PCA) charts by removing variables or observations that are less important.

Dataset for running a Principal Component Analysis

The data are from the US Census Bureau and describe the changes in the population of 51 states between 2000 and 2001. The initial dataset has been transformed to rates per 1000 inhabitants, with the data for 2001 serving as the focus for the analysis.

Goal of this tutorial

Our goal is to analyze the correlations between the variables and to find out if the changes in population in some states are very different from the ones in other states. A general tutorial on how to run a principal component analysis with XLSTAT is available here.

We focus on the charts displayed in a PCA. XLSTAT offers an option to filter the displayed observations and variables with respect to the squared cosines (cos²). This measure is a measure of the quality of the representation of the observation in the obtained maps.

Setting up a Principal Component Analysis

Once XLSTAT-Pro is activated, select the XLSTAT / Analyzing data / Principal components analysis command, or click on the corresponding button of the Analyzing Data toolbar (see below).

XLSTAT Analyzing Data menu / PCA

The Principal Component Analysis dialog box will appear.

Select the data on the Excel sheet. The Data format chosen is Observations/variables because of the format of the input data.

The PCA type that will be used during the computations is the Pearson's correlation matrix, which corresponds to the classical correlation coefficient.

XLSTAT principal component analysis dialog box general tab

In the Charts tab, we wish to filter observations and variables with a squared cosines sum greater than 0.5. To do so, activate the filter option and select >cos² and enter the value 0.5.

XLSTAT Principal Component Analysis dialog box variables charts tab

XLSTAT Principal Component Analysis dialog box observations charts tab

XLSTAT Principal Component Analysis dialog box biplots charts tab

The computations begin once you have clicked on OK. You are asked to confirm the number of rows and columns.

Then you should confirm the axes for which you want to display plots. In this example, the percentage of variability represented by the first two factors is not very high (67.72%); to avoid a misinterpretation of the results, we recommend investigating the third axis as well. To do this, run the analysis once again and select an axis configuration that includes PC3.

XLSTAT Principal Component Analysis axis selection PC1 and PC2 ’’’’

Filtered charts for Principal Component Analysis

All the classical results are displayed. For a detailed tutorial on the interpretation of these results, see here.

We are interested in the squared cosines tables and the PCA charts.

Regarding the variables, we can see that two variables have squared cosines smaller than 0.5 for the 2 first factors (Federal/Civilian mode... / Net Int Migration). These variables are badly represented by the PCA. They will not appear in the PCA correlation chart.

squared cosines table

The obtained PCA correlation plot is as follows.

squared cosines chart

Regarding the observations, we can see that some states are badly represented.

squared cosines table

For example, District of Columbia and Hawaii will not appear in the observation chart since they have a squared cosine smaller than 0.5 for the two first factors.

squared cosines plot

This simple tool allows you to filter observations on the PCA charts to have more readable charts.

Was this article useful?

Filtering observations and variables in PCA charts

Dataset for running a Principal Component Analysis

Goal of this tutorial

Setting up a Principal Component Analysis

Filtered charts for Principal Component Analysis

Similar articles