# Repeated measures ANOVA with unbalanced data using mixed models

This tutorial explains how to set up and interpret a repeated measures analysis of variance using mixed models when the data is unbalanced with Excel using XLSTAT.

## Dataset

The data correspond to the evaluation of a depression score measured at different times in 24 patients separated into two groups: a treatment group and a control group that did not undergo treatment. We have the following 4 variables:

• Subject: This qualitative variable contains the patient's identifier.
• Group: This qualitative variable makes it possible to know the group to which the patient belongs (1: control / 2: treatment).
• Time: This qualitative variable indicates when the measurement of the score was carried out, it includes 5 modalities (0: before the beginning of the treatment, 1: 1 month after the beginning of the treatment, 3: 3 months after and 6: 6 months after).
• dv: The quantitative dependent variable represents a score to assess the state of depression of the patient.

The data are unbalanced i.e. the numbers of all the modalities of at least one of the factors are not equal. For example, we see here that subject 1 was not measured at time 6 while subject 2 yes.

Repeated measures ANOVA is based on the same model as a classical ANOVA, in our case the equation of the model will be written: We will therefore have two fixed factors (the variables "group" and "time") and an interaction factor ("group * time"). The difference with conventional variance analysis is that eijk errors can be correlated. Indeed, we cannot assume that measurements on the same subject taken at different times are independent. In mixed models, the model equation is given by: where Y is the quantitative variable to be explained, X collects the factors associated with the fixed effects (these are the classical variables of the linear regression), β is a vector of coefficients associated with the fixed effects, Z is a matrix gathering the random effects (this are variables that cannot be assumed to be fixed), γ is a vector of coefficients associated with the random effects and ε is a vector gathering the errors associated with each observation. In order to reproduce a repeated measures anova with mixed models, one solution is to include the subject factor as a random variable (with a variance component covariance structure) and to declare an error covariance structure with the same properties.

For mixed models with complex covariance structures and / or unbalanced data sets as in our example, the test statistics for the fixed effects follow unknown distributions and is no longer an exact F (Fisher) tests. Assuming that the test statistics approximately follow an F distribution, XLSTAT implements an approximation of residual errors (Satterthwaite -1946) by finding the appropriate linear combination of random error sources of the error term to test the significance of each fixed effect of the model in question.

## Setting up the repeated measures ANOVA using the mixed models

After opening XLSTAT, select the XLSTAT / Modeling data / Mixed Models command, or click on the corresponding button of the Modeling data toolbar (see below). Once you've clicked on the button, the Mixed Models dialog box appears. Select the data on the Excel sheet. The dependent variable corresponds to the variable to be explained (or variable to be modeled), which is in this case the score (dv). All the remaining variables are qualitative explanatory variables, select the 3 variables "subject", "time" and "group". The Variable Labels option is activated because the first row of columns contains the names of the variables. In the Options tab, you can select the covariance matrix structure for errors e and random effects g (see XLSTAT help for more details). Select the variance components structure for the random effects (the residuals have the same default covariance structure). We choose the constraint a1 = 0, meaning that we want the model to be built on the assumption that the control group has the standard effect on the score. Applying a constraint in ANOVA is essential for theoretical reasons, but this does not change the results or the quality of the analysis. In addition, we will take into account interactions, so we must activate the interactions option. We also enable the Satterthwaite’s t-tests option to use the Satterthwaite approximation for the model error term. Once you click OK, a new window for selecting fixed effects and random effects is displayed: We want to run a repeated measures ANOVA using "time", "group" and the interaction "time*group" as fixed effects and the "subject" variable as random effects Once you have clicked on the OK button, the computation starts. The results will then be displayed.

## Interpreting the results of an unbalanced repeated measures ANOVA using the mixed models

The first results displayed by XLSTAT are the goodness of fit coefficients. Model parameters are obtained using the restricted maximum likelihood (REML) method and will be different as when a classical ANOVA model is applied. All indexes are used to compare models with different covariance structures.

The following table gives the parameters of the covariance matrices of the random effect and the errors (variance components structures). Z-tests p-values (Pr> Z) given in the tables below indicate a good reliability of standard error estimations for the model random parameters. However, we will note that parameters standard errors are high, no doubt because of the small sample size. The type III of fixed effect table (see below) allows us checking significance of fixed effects using Fisher's F-test. So, we see here that all the fixed effects are significant. The elapsed time has the most important effect. The Satterthwaite approximation method was used to calculate the degrees of freedom of the denominator as well as the F statistic in order to calculate a more accurate value (Pr> F) and thereby reduce the Type I error on the tested hypothesis (equality of “Least square means” difference). In unbalanced cases, the tests for the higher order terms are always the same (here the "group time" interaction effect), whereas the assumptions differ between types for the lower order terms ("time" & "group"). In our example, the typical hypotheses I and II become dependent on the number of observations (experimental units) at each factor-level combination, so that the assumptions for these types become hard to interpret.

On the contrary, the Type III hypothesis test, whether the data are balanced or not, always tests the fixed effect considered by controlling the effect of other factors (via orthogonal decomposition) which gives it good properties for studying unbalanced datasets. Thus, we can conclude that the treatment (group variable) and the time elapsed have effects on the level of depression of the patients.

The following table summarizes the model parameters with their standard deviation and confidence interval. Parameters interpretations is similar as in the case of a classical analysis of variance. Hereunder, Student's t-tests on non-nullity of regression coefficients do not exactly follow a Student's law. However, results presented in the table below approximate a Student t distribution under the null hypothesis based on the Satterthwaite approximation to the effective degrees of freedom of error term. When we look at the model parameters (see below), we can see that time 1, 3 and 6 have a negative impact on the depression score. Patients are less depressed as time passes. Being in the treatment group has also a negative impact on the depression score.

Thus, we were able to perform a repeated measures ANOVA with mixed models despite the fact that our dataset does not meet the required conditions (strictly identical number of repetitions per subject). In addition, Satterthwaite's approximation to model error term ensures even more reliable results for the different computed statistical tests. (Type I, II, III and t-tests). 