What is Reliability analysis?Reliability analysis allows to study the properties of the scales of measurement and the elements that constitute them. The reliability analysis procedure provides several results to evaluate the internal consistency (ability of the items to measure the same phenomenon or the same dimensions of a scale) and also provides information on the relationships between the different elements composing the scale.
Reliability analysis is often used to check if questions (items) in a set of questions (test or questionnaire) are consistent with each other.
High reliability suggests strong relationships between the measures/items within the measurement procedure.
There are two families of internal consistency estimation methods:
- The split-half reliability, which is based on the assumption that the measurement procedure can be divided into two halves (two parts test) having the best correlation score (Guttman index L4 and Spearman-Brown).
- The internal consistency itself (internal model), based on the scores between each measure/item and the sum of all the others (Cronbach’s Alpha, Guttman indices L1 and L6) which assumes a good homogeneity among the items.
In this tutorial, we will focus on the computation and interpretation of indices relative to the internal model, while mentioning indices from the split-half reliability.
Dataset for running a reliability analysis in ExcelAn Excel sheet with both the data and the results can be downloaded by clicking on the button below:
Download the data
In this tutorial, we will use data from the Personality Tests website (the original file can be obtained at http://personality-testing.info/_rawdata/BIG5.zip).
They correspond to the test of the Big Five which measures five main personality traits. In this tutorial, only the first 2000 observations will be analyzed; and in order to estimate the internal consistency, only the personality trait corresponding to neuroticism (representing emotional stability) has been retained in the analysis (unifactorial model).
This personality trait is evaluated by means of 10 items (questions) for which each person expresses his degree of agreement or disagreement with a statement (Likert scale). We refer to the set of items as the test or the questionnaire.
Setting up a Reliability analysis in Excel using XLSTATOnce XLSTAT is activated, select the XLSTAT / Describing data / Reliability analysis command (see below).
After clicking on the button, the dialog box for the Reliability analysis appears.
You can then select the data on the Excel sheet with the Observations / Items field.
The Variable labels option is enabled because the first row of data contains the variable names.
The Type of reliability selected is Internal Model, which means that we will study the contribution of each item assuming a single independent test.
This type of reliability also assumes the equality of the true scores of each item measured (Tau-equivalence hypothesis) so that the different estimators of the internal coherence of the test have a minimal bias (best lower bound estimator).
In the Outputs tab, activate the Guttman statistics and Display the best split-half options, which will display the results for the Guttman indices and the partition that corresponds to the calculated Guttman L4 (the one that maximizes the correlation of the scores between the two halves).
In the Options tab, choose the method for calculating the best split corresponding to the maximum of Guttman L4.
Since the number of items in the questionnaire is low, we choose the Enumeration method, which performs an exhaustive search among all the possible partitions.
Otherwise, the Fast method will be more appropriate and will provide a value of Guttman L4 in an acceptable time by performing a reduced optimized search.
On the Charts tab, activate the Correlation maps option.
The computations begin once you have clicked on OK.
Interpreting the results of a Reliability analysis in Excel using XLSTAT
Cronbach's alpha coefficient, also known as α coefficient, is used to evaluate the internal consistency (and hence reliability) of the questions asked in this test (answers to questions on the same topic that need to be correlated). Its value generally lies between 0 and 1 and is considered as acceptable when it’s higher than 0.70.
When test items meet the assumptions of the tau-equivalent model, alpha approaches a better estimate of reliability.
It should be noted that Cronbach’s alpha can sometimes be negative due to a small number of questions in association with strongly negative correlations of some items compared to others.
To overcome this, the removal of these questions from the questionnaire as well as the addition of new ones are part of the solutions to this problem.
In the proposed study, the alpha coefficient obtained is 0.869, which indicates a good ability of the items of the questionnaire to evaluate the same latent factor in subjects, neuroticism in our case.
In addition, the standardized Cronbach’s alpha (score of each item with zero mean and unit variance) is almost identical to Cronbach's alpha.
This confirms the Tau-equivalence hypothesis (equality of mean and variance of the true scores for each item) of the questionnaire which is required in order to obtain an alpha coefficient as representative of reliability as possible (lowest bias).
Although it is more appropriate to use Cronbach's alpha in this case (Tau-equivalence), other reliability estimators such as the L1-L6 indices proposed by Guttman allow to obtain a much more accurate estimate of the internal consistency of a test. This is true under certain conditions, though.
Here is a summary of the conditions necessary for the use of these indices:
L1: An intermediate coefficient used in the calculation of other indices (conservative).
L2: Estimation of the inter-score correlation in the case of parallel measurements. It is more complex than Cronbach's alpha and better represents the true reliability of the test in case of composite reliability (multifactorial).
L3: Equivalent to Cronbach's alpha.
L4: Guttman’s split-half reliability. Useful when the Tau-equivalence hypothesis is not respected. This situation occurs very frequently in reliability analysis and could be seen in the case of our example as different answers according to the people to characterize the trait of neuroticism. The internal consistency of the test would therefore be difficult to estimate under these conditions.
To overcome this problem, the test is divided into two parts (ideally equal) with a maximum correlation. Each of these two fractions of the test is then presented to the same persons at the same time.
The Reliability analysis feature provides the partition corresponding to the maximum of Guttman L4.
L5: Recommended when a single item strongly covaries with others, which do not exhibit high covariance with each other.
L6: Recommended when inter-element correlations are small relative to the item vs items determination coefficients (becomes a better reliability estimator when the number of items becomes larger).
In our example we want to select questions that mainly measure neuroticism (emotional stability).
Remember that a reliable scale consists of questions proportionally measuring the most important amount of true outcome. Let’s consider the following table:
The results of the ten questions are given above. The last four columns of this table are the most interesting for us.
- The correlation between the question and the result of the total sum of the remaining questions.
- The multiple squared correlation between the question and all other questions.
- The internal coherence of the scale (Alpha and Guttman L6 coefficients) assuming the respective question deleted.
Questions 2 and 4 are much less consistent with the rest of the scale.
Their correlations with the total scale are respectively 0.451 and 0.342, while all other questions are correlated at least to a value of 0.55.
In the internal consistency indices columns (Alpha/Guttman L6), we can see that the reliability of the scale would be slightly increased (around 0.1) if either one of the two questions were removed. Therefore, we should probably keep these two in our scale.
An interesting chart in the reliability analysis is the correlation map, which allows for the identification of possible structures in the correlations, or to quickly identify elements with interesting correlations.
We see clearly on this map that the questions N2 to N4 are less correlated with the remaining items of the test.