Your data analysis solution

Canonical Correlation analysis in Excel tutorial


This tutorial will show you how to set up and interpret a Canonical Correlation Analysis in Excel using the XLSTAT statistical software.
Not sure if this is the right multivariate data analysis tool you need? Check out this guide.

Dataset for canonical correlation analysis

An Excel sheet with both the data and the results can be downloaded by clicking here.

The data used in this tutorial are measurements done on middle-aged men in a health fitness club (Dr. A. C. Linnerud, NC State University).

There are two sets of data about the men:

  1. The physiological data:
    • Weight
    • Waist
    • Pulse
  2. The exercises the men did:
    • Chins
    • Situps
    • Jumps

Set up of the canonical correlation analysis

Go to the Multiblock Data Analysis menu, and select the function Canonical Correlation Analysis.

XLSTAT Multiblock Menu Canonical Correlation Analysis

In the General tab, determine the two datasets. Y1 corresponds to the physiological data stored in the columns B to D. Y2 corresponds to the exercise data which are in the columns E to G.

The columns have a label so leave the option Column labels ticked. You can add the Observation labels by ticking the corresponding option and selecting the column A.

canonical correlation analysis software

In the Options tab, verify that both datasets will be centered and reduced.

canonical correlation software package

For the Outputs select them all.

canonical correlation package

Choose as well to display the unique plot available in the Charts tab.

canonical correlation in excel

Press OK once these selections are done.

When prompted, opt for the display of the plot with Factor 1 and Factor 2.

Notice that the explained variance is 99.22%.

canonical correlation software

Results of the canonical correlation analysis

The first result after the descriptive statistics is the correlation matrix.

Note the strong correlation between the weight and the waist (0.870) in the first table, and sit-up’s and jumps (0.669) and sit-up’s and chins (0.696) in the second table. The correlations between both tables are rather small except Waist and sit-up’s (-0.646) and chins (-0.552).

canonical correlation

The Eigenvalues show that the first factor alone explains 93% of the variability.

canonical correlation eigenvalues

The Wilks’ lambda test allows to test whether the canonical variables are linked to the initial tables or not. We see here that while F1 is somehow linked although not significantly, F2 and F3 and poorly related to the initial tables.

canonical correlation wilks lambda

The canonical correlations on factor 1 show that the two tables Y1 and Y2 are correlated. Note that this value is greater than the correlations between the two tables.

The redundancy coefficients show that a small proportion of the variability of the input variables is predicted by the canonical variables.

canonical correlation redundancy

The correlations between input variables and canonical variables (also called Structure correlation coefficients, or Canonical factor loadings) allow understanding how the canonical variables are related to the input variables.

We can see that the variables waist and weight are correlated and correlated negatively with factor 1 and 2. They are also anti-correlated with the exercises sit-ups and chins. This means that people with a higher weight and a larger waist don’t do as many sit-up’s and chins as the other persons.

canonical correlation plot
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
Invalid characters found