This tutorial will show you how to set up and interpret a Canonical Correlation Analysis in Excel using the XLSTAT statistical software.
Not sure if this is the right multivariate data analysis tool you need? Check out this guide.
Dataset for canonical correlation analysis
An Excel sheet with both the data and the results can be downloaded by clicking here.
The data used in this tutorial are measurements done on middle-aged men in a health fitness club (Dr. A. C. Linnerud, NC State University).
There are two sets of data about the men:
- The physiological data:
- The exercises the men did:
Set up of the canonical correlation analysis
Go to the Multiblock Data Analysis menu, and select the function Canonical Correlation Analysis.
In the General tab, determine the two datasets. Y1 corresponds to the physiological data stored in the columns B to D. Y2 corresponds to the exercise data which are in the columns E to G.
The columns have a label so leave the option Column labels ticked. You can add the Observation labels by ticking the corresponding option and selecting the column A.
In the Options tab, verify that both datasets will be centered and reduced.
For the Outputs select them all.
Choose as well to display the unique plot available in the Charts tab.
Press OK once these selections are done.
When prompted, opt for the display of the plot with Factor 1 and Factor 2.
Notice that the explained variance is 99.22%.
Results of the canonical correlation analysis
The first result after the descriptive statistics is the correlation matrix.
Note the strong correlation between the weight and the waist (0.870) in the first table, and sit-up’s and jumps (0.669) and sit-up’s and chins (0.696) in the second table. The correlations between both tables are rather small except Waist and sit-up’s (-0.646) and chins (-0.552).
The Eigenvalues show that the first factor alone explains 93% of the variability.
The Wilks’ lambda test allows to test whether the canonical variables are linked to the initial tables or not. We see here that while F1 is somehow linked although not significantly, F2 and F3 and poorly related to the initial tables.
The canonical correlations on factor 1 show that the two tables Y1 and Y2 are correlated. Note that this value is greater than the correlations between the two tables.
The redundancy coefficients show that a small proportion of the variability of the input variables is predicted by the canonical variables.
The correlations between input variables and canonical variables (also called Structure correlation coefficients, or Canonical factor loadings) allow understanding how the canonical variables are related to the input variables.
We can see that the variables waist and weight are correlated and correlated negatively with factor 1 and 2. They are also anti-correlated with the exercises sit-ups and chins. This means that people with a higher weight and a larger waist don’t do as many sit-up’s and chins as the other persons.