This tutorial will help you set up and interpret a **Correspondence Analysis** (CA) on raw data in **Excel** using the XLSTAT software.

Not sure if this is the right multivariate data analysis tool you need? Check out this guide.

#### Included in

XLSTAT-Base

XLSTAT-Sensory

XLSTAT-Marketing

XLSTAT-Forecast

XLSTAT-Biomed

XLSTAT-Ecology

XLSTAT-Psy

XLSTAT-Quality

XLSTAT-Premium

## Dataset for running a Correspondence Analysis from a raw data table

An Excel sheet with both the data and the results can be downloaded by clicking here. The data correspond to the list of foreigner soccer players in premier league and their nationality. We want to study the distribution of the foreign players in the English clubs.

## Setting up a Correspondence Analysis from a raw data table

Once XLSTAT is open, select the **Analyzing data / Correspondence analysis** command, or click on the corresponding button of the **Analyzing Data** toolbar (see below).

Once you have clicked on the button, the Correspondence analysis dialog box appears.

In the field **Observations/variables table**, select the columns **Club** and **Region** on the Excel sheet.

The data are in an Observations/variables format, tick the corresponding option.

As the names of the columns are included, the **Variable labels** option should be selected as well.

Choose the **Sheet** option for the output.

On the tab **Options** tick the **Test of independence** and leave the significance level to 5.

In the **Outputs** section, select the following options:

- Contingency table
- Eigenvalues
- Principal coordinates
- Standard coordinates
- Contributions
- Squared cosines

Go to the last tab **Charts** and enable the:

- Symmetric plots
- Asymmetric plots
- Labels

Click on **OK**.

As the model needs more than two factors. Click first on **Select** to select the plot F1-F2. Then change the **Abscissa** to F2. It will change the **Ordinates** to F3. Click again on **Select**. This way we will have two plots: F1-F2 and F2-F3. Click on **Done**.

## Interpreting the results of a a Correspondence Analysis

The first result is the contingency table and then the test of independence between the rows and columns.

The p-value of 0.008 is inferior to 5% thus the null hypothesis should be rejected. This means that the distribution of nationality is not random in the UK clubs.

Then you have the symmetric plots. From the first plot you can see that the clubs such as Aston Villa and Stoke City have more North-American players than the rest of the teams. In the same way, Burney have a lot of Northern European players.

## Creating a 3-D plot for the a Correspondence Analysis results

We will now do a plot in 3-dimensions to have a better representation of the points.

First we will make a table containing both the first 3 principal coordinates for the clubs and geographic areas and the sum of the cosines for those 3 factors.

The sum of the squared cosines for the 3 factors, obtained from the squared cosines table, give an idea of how well is represented the sample in the 3-D space.

Add a last column to have the information about the rows and columns. The rows are the clubs and the columns the regions. Make a category variable with R and C to describe each sample.

Select the full table and go to the menu **Visualizing data** and select the option **XLSTAT-3DPlot**.

When prompt select the format of your data as **Table**.

You will need to specify the axes. Do so by a right click and select in the dropbox the appropriate variable to use. For the 3 axes we utilize: F1, F2 horizontally and F3 vertically. You also need to set the size of the axis so as to have an orthonormal plot. For example use for all the axes : -1.5 and 1.5 as limits.

For the color and size of the dot you can use the sum of cosines. Go to the tab **Objects** and modify the color and size sections.

Finally we can add the labels by going into the tab **Annotations** and selcting "Column1" as the label.

Here is your 3-D representation.