This tutorial will help you set up and interpret a Multiple Correspondence Analysis in Excel using the XLSTAT software.
Not sure if this is the right multivariate data analysis tool you need? Check out this guide.
What is Multiple Correspondence Analysis?
Multiple Correspondence Analysis (MCA) is a method that allows studying the association between two or more qualitative variables.
Multiple Correspondence Analysis is to qualitative variables what Principal Component Analysis is to quantitative variables. One can obtain maps where it is possible to visually observe the distances between the categories of the qualitative variables and between the observations. For detailed information on the method, we recommend the recent book by Michael Greenacre and Jörg Blasius.
(click the cover to order it on Amazon.com).
Dataset to run a Multiple Correspondence Analysis
An Excel sheet containing both the data and the results used in this tutorial can be downloaded by clicking here.
The data correspond to a survey conducted by a car dealer where 28 customers were asked five questions, one week after they had picked up their car after a mechanical repair. The questions were:
- Are you globally satisfied by the service? (Yes/No)
- Do you consider the problem is solved? (Yes/No/Don't know)
- How good was the welcome? (1 to 5)
- Is the quality/price ratio satisfactory? (Yes/No)
- Will you use our services again? (Yes/No/Don't know)
By running a Multiple Correspondence Analysis (MCA), we want to identify the relationships between the various possible answer to the questions.
Setting up the Multiple Correspondence Analysis dialog box
After opening XLSTAT, select the XLSTAT / Analyzing data / Multiple Correspondence Analysis command, or click on the corresponding button of the Analyzing data toolbar (see below).
Once you've clicked on the button, the Multiple Correspondence Analysis dialog box appears.
The format of the data is here Observations/Variables.
We select the data on the Excel sheet, using the column selection method: just click on the name of the columns you want to select (see the tutorial on how to select data for more information on this topic).
The Observations labels are selected in the corresponding field, and the Variable labels option is left activated as the first row of the table contains the name of the variables.
In the Options tab we activate the Supplementary data option and then go to the corresponding tab: the "Come back" variable is used as a supplementary variable because we don't want it to influence the computations; however, we want to know how the categories of this variable are positioned on the correspondence map.
The 1/p option is our filtering choice: the detailed results corresponding to factors which eigenvalue is less than 1/p (where p is the number of active qualitative variables), will not be displayed.
The following Outputs and Charts options have been activated.
The computations begin once you have clicked on OK. The results will then be displayed.
Interpreting the results of a Multiple Correspondence Analysis
The first results displayed are the tables used for the computations (full disjunctive table, Burt's table).
The total inertia is equal to 2. It depends only on the number of variables and categories and not on the linkage between the variables. Therefore, there is no possible statistical interpretation.
The next table shows the eight non null eigenvalues and the corresponding % of inertia. However, unlike with CA (correspondence analysis performed on only 2 variables), the % of inertia are here pessimistic estimates of the quality of the representation, the latter being for the user "how close is the representation to the reality".
Greenacre et al (2005) suggested an adjusted inertia which gives a better idea of the quality of the maps. We see here that while the usual computation gives us only 46.6% with the first two axes, the method based on the adjusted inertia gives us 87.3%.
The % displayed on the scree plot is based on the adjusted inertia.
Then, a table displays the coordinates of the categories in the factors space. The results that correspond to the supplementary variable are displayed in blue color.
The coordinates of the observations are displayed further down.
The contributions, the test values and the squared cosines help in the interpretation of the results. Before interpreting that two categories are close on the map, one should check that their contribution to the axes of the map, or that their squared cosines are high.
The following chart corresponds to the correspondence map where both the categories and the observations are displayed on the first two axes.
In order to better visualize the relative positions of the categories, we have built with XLSTAT-3DPlot a visualization in the F1/F2/F3 space.
From these charts we confirm that a customer will come back only if he is satisfied with the intervention, the welcome and the price. We also notice that there seems to be a link between the fact that the repair was not satisfactory, and the fact that the welcome was bad. This should be investigated further: has the customer described the problem not precisely enough because he had been badly welcome or has the person called back to mention that the problem was still there and has been badly welcome by the representative?
The following video shows you how to run this tutorial.