This tutorial will help you evaluate the quality of a sensory panel in Excel using the XLSTAT statistical software.
Dataset for sensory panel analysis
An Excel workbook with both the data and the results can be downloaded by clicking here.
The data used in this tutorial correspond to the evaluation of 14 different ski shoes by 15 skiers (hereunder referred to as assessors) with experience in sensory tests for the clothing industry. 6 descriptors have been used by the 15 assessors to rate the shoes.
Setting up a panel analysis
Once XLSTAT is activated, select the XLSTAT- Sensory data analysis / Panel analysis command (see below), or click the corresponding button of the XLSTAT-Sensory data analysis toolbar.
Once you have clicked on the button, the dialog box appears.
Select the data on the Excel sheet. As a session factor is available here, we activate the option and select the corresponding column.
Several models are possible depending on whether you select a session, you want a model with or without interactions between the factors, and whether you consider that the assessor and session (repetition) are random or fixed effects. Random factors are considered as random variables with mean 0 and a given variance. This means that you consider that, once the product effect is taken into account, the other effects are purely due to randomization. This is only valid if you can consider that there is no structural difference between assessors or between sessions. These assumptions can be checked in the analyses that follow.
Interpreting the results of a sensory panel analysis
After you have clicked on the OK button, the computations start. As many charts are created it can take some time before you can move in the results sheet. We recommend that you do not click in Excel until the arrow cursor is back.
The first table corresponds to basic summary statistics for the various input variables. You can use the information on the minimum and maximum to make sure there are no absurd values for the descriptors.
The first step of the analysis consists in running on ANOVA on the whole dataset for each descriptor one after the other, in order to identify descriptors for which there is no product effect. For each descriptor, the table of Type III SS of the ANOVA is displayed for the selected model. If there is no product effect for a descriptor, meaning if the p-value is higher than a given threshold, that descriptor can be removed from the analysis, as long as the corresponding option has been checked in the Options tab of the dialog box. The table below corresponds to the table for the Smoothness.
Then, a summary table allows comparing the p-values of the product effect for the different descriptors. The analyses that follow will only be conducted for the descriptors that allow discriminating the products, meaning for all descriptors for which the p-value is lower than 0.05 as we entered that threshold in the dialog box. In our particular case, there is a product effect for all descriptors so all the descriptors remain the next steps.
The second step consists of a graphical analysis. For the 6 descriptors, box plots and strip plots are displayed. We can thus see how, for each descriptor, different assessors use the rating scale to evaluate the different products. On the box plot for Smoothness we can see that assessors 9 and 15, while having a similar mean, use the rating scale differently. We can also see that assessors 3,4,5,6 and 7, while using similar ranges of rating tend to rate differently in terms of position. Of course, such plots do not tell you anything about the agreement between assessors: you could see a case where, while the box plots look very much the same, the product corresponding to the minimum for an assessor (minimum and maximum values are displayed with blue dots on the box plots) might correspond to the maximum of the other assessor.
We now want to check whether the assessors agree for the different descriptors, and how the descriptors bring different rating possibilities (are they correlated or not). The third step starts with the restructuring of the data table, in order to have a table containing one row per product and one column per pair of assessor and descriptor - if there are several sessions, the table contains averages - followed by a PCA on this same table. The PCA is performed on standardized data.
The chart displayed below corresponds to the same PCA correlation plot replicated for each descriptor, highlighting in red the dots the 15 (assessor,descriptor) pairs corresponding to the descriptor mentioned in the title. This allows checking in one step the extent to which assessors agree or not for all descriptors, once the effect of position and scale is removed (because the PCA is performed on standardized data).
The following chart gives for each pair of (assessor,descriptor) the % of variance carried by the PCA plot. In dark grey you can see the % carried by the first axis and in light grey the % of variance carried by the second axis. We see that for smoothness, there are different groups of assessors, with assessors (8,9,10) that are more related to the second axis but still badly represented. We can also confirm that Elasticity and Closing are close and carried by the second axis.
To study more precisely the relationship between descriptors, an MFA (multiple factor analysis) plot of the descriptors is displayed. The MFA is based on a table in which there are as many subtables as there are descriptors, where each subtable contains the averages for each product (rows) by each judge (columns).
During the fourth step, an ANOVA is performed for each assessor separately, and for each of the 6 descriptors, in order to check whether there is a product effect or not, to verify for each assessor if he/she is able to distinguish the products using the available descriptors. A table is displayed for each assessor to show if there is a product effect or not for the various descriptors. The p-values are displayed in bold font if they are lower than the threshold defined in the Options tab of the dialog box. The p-values displayed in bold correspond to descriptors for which the assessor was able to differentiate the products. The table below corresponds to that table for assessor 1. We can see that this assessor was able to differentiate the products using Feet feeling and Elasticity.
A summary table is then used to count for each assessor the number of descriptors for which he was able to differentiate the products. The corresponding percentage is displayed. This percentage is a simple measure of the discriminating power of assessors. The percentages are then displayed on a bar chart.
For the fifth step, a global table initially presents ratings (averaged over the sessions if available) for each assessor in rows, and each pair (product,descriptor) in columns. It is followed by a series of tables and charts to compare, product by product, assessors (averaged over the possible repetitions) for the set of descriptors. These charts can be used to identify strong trends and possible atypical ratings for some assessors. The red line corresponds to the average value over all assessors for the product of interest and the blue line the assessor selected in the list at the top left of the chart. In the example below we can see that assessor 10 rated product 7 below the average for Smoothness and Touch, and close to the average for the other descriptors.
The sixth step allows identifying atypical assessors through the measure for each product of the Euclidean distance of each assessor to an average for all assessors in the space of all the descriptors. A table showing these distances for each product and the minimum and maximum computed over all assessors, allows identifying assessors that are close to or far from the consensus. The following chart allows visualizing these distances. The lower the distance, the closer the assessor to the consensus (the centroid). Value 0 corresponds to the average over all assessors. If for a given product, all assessors would give the same rating for all descriptors, the Min and Max would be 0 for that product. If an assessor would give exactly the value corresponding to the average obtained over the other descriptors, we would have the minimum equal to zero for that product. In the example below, we see that assessor 4 does not agree with the other assessors except for product 10 where his rating is closer to the average.
As a “session” variable was selected, the seventh step checks if for some assessors there is a session effect, typically an order effect. This is assessed using a Wilcoxon signed rank test as there are only two sessions (in the case of 3 or more sessions, a Friedman test is used). The test is calculated on all products, descriptor by descriptor. We can see in the table below that for 4 descriptors out of 6, there is a session effect for assessor 1, based on the non parametric test. We can also see that for Feet feeling there is a session effect for 9 assessors out of 15.
Then, for each assessor and each descriptor, we calculate which is the maximum observed range between sessions across products. You can see the product that corresponds to the maximum range by leaving the mouse over the red triangle that is displayed in each cell. For example we see that there is a high range for assessor 15 on Smoothness and that it corresponds to product 8. In our particular case we see here that there are high ranges for most pairs of (assessor,descriptor). This makes the validity of this survey questionable.
As for each triple (assessor,product,descriptor) there exists at least one rating, the eighth step consists of a classification of the assessors. The classification is first performed on the raw data, then on the standardized data to eliminate possible effects of scale and position.
Finally a table preformatted for Generalized Procrustean Analysis (GPA) is displayed in case you want to run a GPA.