CATA Check-All-That-Apply analysis tutorial in Excel tutorial
This tutorial will help you set up and interpret a CATA analysis in Excel using the XLSTAT statistical software.
Dataset for running a CATA analysis in XLSTAT
For this tutorial, we use data provided by Ares et al. (2014). They correspond to the assessment of 6 products (5 regular and 1 ideal) by 119 consumers over 15 attributes. Data is recorded in a binary format (0: attribute not checked; 1: attribute checked). Moreover, every product (except the ideal one) is rated overall (0-10) by each consumer.
Data is in a vertical format, which means that we have one row per combination of consumer and product.
Goal of this tutorial
This tutorial aims to conduct a CATA (Check-All-That-Apply) analysis to characterize products tested by consumers.
Setting up a CATA analysis in XLSTAT
-
To conduct a CATA analysis, click on Sensory / CATA data / CATA data analysis.
-
In the General tab, first make sure you select the Vertical data format.
-
In the CATA data field, select the attribute table.
-
Then select the Consumer, Sample and liking columns in the Assessors, Products and Preference data fields, respectively.
-
Capture the ideal product identifier in the Ideal product field.
Important: The dataset must be balanced (One assessor for each product).
In the Options(1) tab, select the Chi-square distance for the Correspondence Analysis, CATA data validation and Independence of attributes.
-
Click the OK button.
-
A dialogue box appears to select and validate the axes to be displayed on the graphical representation of the correspondence analysis, just click the Done button.
Interpreting the results of a CATA analysis in XLSTAT - First part
The first two tables and graphs relate to the validation of CATA data. First of all, a detection of the assessors who checked much more or less than the others is performed. In our case, most of the judges checked between 20% and 35% of the time, but some of them have a particular behaviour. For example, assessor 27 checked only 7% of the time! A similar attribute analysis is then performed to detect over- or under-used attributes.
Then, an analysis combining the two previous ones is carried out. It indicates the percentage of checks per assessor and per attribute. This analysis makes it possible to determine if the attributes are checked in a consensual way or not. The Juicy attribute is subject to contrasts, with some assessors checking it more than 80% of the time and others less than 20%.
For a given attribute, Cochran’s Q test allows to testing of the effect of an explanatory variable (Products) on whether the consumers feel the attribute or not. A low p-value beyond a significance threshold indicates that products significantly differ from each other. If the p-value is significant, the user may be interested in examining multiple pairwise comparisons, represented by small letters inside table cells:
-
two products sharing the same letter(s) do not differ significantly.
-
two products having no letter in common differ significantly.
We can see that all of the attributes except two related to smelling (odourless and intense odour) are associated to significant p-values at 0.05. For example, if we consider the crispy attribute, product 257 is the most checked. However, it is not significantly crispier than the 548 (check the letters). Products 992 and 366 are the least crispy and do not differ significantly from each other.
For each of the products, an attribute independence test is performed to determine if these attributes are not redundant. Thus, we can see that for product 106, the attributes Juicy and Firm are redundant.
The following contingency table is the sum of attribute tables across assessors. It is used to construct a correspondence analysis (CA).
The independence between the rows and columns is tested (this result is currently only available for the classic CA (using the Chi-square distance)). As the p-value is lower than the significance level (0.05) we conclude that it is very likely that real differences exist between the products in terms of their sensory profiles.
The table of the eigenvalues and the corresponding plot allow to verify the quality of the analysis. The quality of the analysis is good (92.17% of explained total inertia on the first two dimensions).
According to the map of the analysis, an ideal product should be relatively tasty, juicy, crispy, firm and sweet and have an apple flavour.
On the other hand, it should not be relatively too sour, bitter, astringent, grainy, soft, mealy, or tasteless. Product 548 seems to be the closest to the ideal product whereas product 106 is far away because of its relative bitterness, sourness and astringency. Products 366 and 992 are also relatively far from the ideal product.
More information about correspondence analysis is available here.
Then, a correlation matrix including attributes (tetrachoric correlation) and liking scores (biserial correlation, last row) is displayed. We see some strong correlations. The negative correlation between sweet and sour indicates that when people tick sour, they do not tick sweet, and vice versa. Liking scores seem to be positively –although weakly- correlated to attributes that were linked to the ideal product in the correspondence analysis (juicy, tasty, apple flavour).
Principal Coordinates Analysis (PCoA) is applied to the correlation coefficients and results are visualized in a two dimensional map. The scree plot indicates that the two first dimensions are sufficient to interpret relationships between attributes. Here again, we see that liking is associated to the attributes juicy, tasty and apple flavour.
More information about principal coordinate analysis is available here.
Interpreting the results of a CATA analysis in XLSTAT - Second part
When liking data is available, the next results are related to the penalty analysis.
A first analysis based on incongruence in which the attribute is missing in the real but not the ideal product allows to identify the must-have attributes. A summary table indicates the frequencies with which P(No)|(Yes) and P(Yes)|(Yes) occur for each attribute. The graphical representation that follows shows these frequencies as well as the percentage of records for these occurrences.
Mean drops in liking between the two situations are then presented for each attribute and their significances tested. For example, the firm attribute implies an increase of 1.5 Liking points between the tested products and the ideal product. This increase is significant at 0.05 (p < 0.0001).
Note: In the case where there is no ideal product, this analysis is substituted by an analysis of presence and absence of the attributes.