This tutorial will help you set up and interpret a Correspondence Analysis (CA) on raw data in Excel using the XLSTAT software.
Not sure if this is the right multivariate data analysis tool you need? Check out this guide.
Dataset for running a Correspondence Analysis from a raw data table
An Excel sheet with both the data and the results can be downloaded by clicking here. The data correspond to the list of foreigner soccer players in premier league and their nationality. We want to study the distribution of the foreign players in the English clubs.
Setting up a Correspondence Analysis from a raw data table
Once XLSTAT is open, select the Analyzing data / Correspondence analysis command, or click on the corresponding button of the Analyzing Data toolbar (see below).
Once you have clicked on the button, the Correspondence analysis dialog box appears.
In the field Observations/variables table, select the columns Club and Region on the Excel sheet.
The data are in an Observations/variables format, tick the corresponding option.
As the names of the columns are included, the Variable labels option should be selected as well.
Choose the Sheet option for the output.
On the tab Options tick the Test of independence and leave the significance level to 5.
In the Outputs section, select the following options:
- Contingency table
- Principal coordinates
- Standard coordinates
- Squared cosines
Go to the last tab Charts and enable the:
- Symmetric plots
- Asymmetric plots
Click on OK.
As the model needs more than two factors. Click first on Select to select the plot F1-F2. Then change the Abscissa to F2. It will change the Ordinates to F3. Click again on Select. This way we will have two plots: F1-F2 and F2-F3. Click on Done.
Interpreting the results of a a Correspondence Analysis
The first result is the contingency table and then the test of independence between the rows and columns.
The p-value of 0.008 is inferior to 5% thus the null hypothesis should be rejected. This means that the distribution of nationality is not random in the UK clubs.
Then you have the symmetric plots. From the first plot you can see that the clubs such as Aston Villa and Stoke City have more North-American players than the rest of the teams. In the same way, Burney have a lot of Northern European players.
Creating a 3-D plot for the a Correspondence Analysis results
We will now do a plot in 3-dimensions to have a better representation of the points.
First we will make a table containing both the first 3 principal coordinates for the clubs and geographic areas and the sum of the cosines for those 3 factors.
The sum of the squared cosines for the 3 factors, obtained from the squared cosines table, give an idea of how well is represented the sample in the 3-D space.
Add a last column to have the information about the rows and columns. The rows are the clubs and the columns the regions. Make a category variable with R and C to describe each sample.
Select the full table and go to the menu Visualizing data and select the option XLSTAT-3DPlot.
When prompt select the format of your data as Table.
You will need to specify the axes. Do so by a right click and select in the dropbox the appropriate variable to use. For the 3 axes we utilize: F1, F2 horizontally and F3 vertically. You also need to set the size of the axis so as to have an orthonormal plot. For example use for all the axes : -1.5 and 1.5 as limits.
For the color and size of the dot you can use the sum of cosines. Go to the tab Objects and modify the color and size sections.
Finally we can add the labels by going into the tab Annotations and selcting "Column1" as the label.
Here is your 3-D representation.