This tutorial will help you set up and interpret a

RDA works by building a Principal Component Analysis (PCA) on the

RDA is often used in ecology to explain a matrix of sites / species abundances by a set of environmental variables that often capture gradients (ranges of temperatures or soil depths or soil nutrient contents…).

In a RDA context, relationships between the Y and X variables are assumed to be linear.

XLSTAT allows for the selection of conditioning variables both in RDA and CCA under the

Download the data

The data corresponds to the abundances of 30 plant species (in columns) at 20 sites (in rows) in a dune environment. Species names are the combinations of the first 4 letters of their genus and the first four letters of their species names in Latin. Sites are also described by two environmental variables in columns:

Select the Site / Species matrix in the

In the

In the

Click on

After descriptive statistics on both the response and the explanatory variables, the

Here we are investigating the inertia or variability of the response variables. It is split into:

Then a

Then the details of the constrained inertia carried by each RDA axis are provided in a table:

Axis F1 carries 45% of the constrained inertia, which is 18% of the total inertia. Together, as we have seen, axes F1 & F2 carry 80% of the constrained inertia, which corresponds to 32% of the total inertia.

An equivalent table with the

The

Observations 15, 16 and 20 seem to have widely contributed to the construction of axis F1 and are well represented on it.

Scores and squared cosines are also shown for the response variables (species in this example).

Likewise, species squared cosines reflect the representativeness of each species on each axis. Species such as

Scores of the explanatory variables coefficients are also provided.

At the bottom of the report, we see the

On a given RDA axis, only observations and response variables with high squared cosines should be interpreted.

Axis one seems to be well related to both explanatory variables, with Nature Conservation Management and important soil depth on the left and Biological and Hobby farming on the right.

Nature conservation management thus seems to be linked to important A1 horizon soil depth. These environments are characterized by

On the other hand, thinner soils are linked to biological farming and hobby farming with an important relative abundance of species such as

Axis two is also related to soil depth as well as to the Standard Farming category, characterized by

**Redundancy Analysis**or**RDA**in**Excel**using the XLSTAT statistical software.## What is Redundancy Analysis?

**Redundancy Analysis (RDA)**can be thought of as a**multivariate approach of linear models**, which means a linear model with many dependent variables.RDA works by building a Principal Component Analysis (PCA) on the

**response variables**(Y matrix) under the constraint that the produced axes – the*canonical axes*– are also a linear combination of the**explanatory variables**(X matrix).RDA is often used in ecology to explain a matrix of sites / species abundances by a set of environmental variables that often capture gradients (ranges of temperatures or soil depths or soil nutrient contents…).

In a RDA context, relationships between the Y and X variables are assumed to be linear.

## What is the difference between Redundancy Analysis and Canonical Correspondence Analysis?

Canonical Correspondence Analysis (CCA)is a method related to Redundancy Analysis. While in**RDA**we assume that the**relationships between the Y and X matrices are linear**, in**CCA**we assume that they are**unimodal**. In ecology, CCA is closer to the concept of**species niche**for which environmental gradients should be sampled at their entire scales for the analysis. RDA may be more adapted in cases where smaller parts of environmental gradients are captured.## What are conditioning variables and what is partial RDA?

Similar to Canonical Correspondence Analysis (CCA), RDA includes the possibility of**removing the effect of undesired constraining X variables**in order to focus the attention on effects of interest. Undesired variables include block effects or any other environmental constraint that may hide the effects of explanatory variables relevant to the question under investigation. This produces what we call a**partial RDA**(or**partial CCA**in the case of CCA). Undesired variables are called**conditioning variables**.XLSTAT allows for the selection of conditioning variables both in RDA and CCA under the

**General tab**in the features dialog boxes (activate Partial CCA or Partial RDA).## Dataset for this tutorial on Redundancy Analysis in Excel

An Excel sheet with both the data and the results can be downloaded by clicking on the button below:Download the data

The data corresponds to the abundances of 30 plant species (in columns) at 20 sites (in rows) in a dune environment. Species names are the combinations of the first 4 letters of their genus and the first four letters of their species names in Latin. Sites are also described by two environmental variables in columns:

*Soil thickness*: quantitative variable, depth of the A1 soil horizon.*Management*: qualitative variable with levels BF (Biological farming), HF (Hobby farming), NM (Nature Conservation Management) and SF (Standard Farming).

## Goal of this tutorial on Redundancy Analysis

The goal of this tutorial is to investigate the linear relationships between management type and A1 soil horizon depth on dune plant communities using Redundancy Analysis.## Setting up a Redundancy Analysis in XLSTAT

After opening XLSTAT, select the**Advanced Features / Multiblock Data Analysis / Redundancy****Analysis**command:Select the Site / Species matrix in the

**Response Variables**field. Under**Explanatory Variables**, select Soil Thickness in the**Quantitative**field and Management type in the**Qualitative**field.In the

**Options**tab, activate the**Permutation test**option, and set the**number of permutations**to 1000. If your response variables or your quantitative explanatory variables are not on the same scale, consider activating the**Reduce**options for the corresponding matrix.In the

**Charts**tab, deactivate the**Observations**option under**Display**.Click on

**OK**. The computations are launched and the results displayed in a new spreadsheet.## Interpreting the outputs of a Redundancy Analysis

One first dialog box pops up allowing you to select the axes to represent in the RDA charts. With axes F1 & F2, we are able to represent 80% of the**constrained inertia**(explanation further below). Select those two axes and click**Done**.After descriptive statistics on both the response and the explanatory variables, the

**Inertia**summary**table**appears:Here we are investigating the inertia or variability of the response variables. It is split into:

**Constrained inertia**, which is the part that is explained by the explanatory variables matrix.**Unconstrained inertia**, which is the remaining part.

Then a

**permutation test**allows to test the null hypothesis that the response and explanatory variables are not linearly related. Here the p-value is far lower than the risk threshold alpha. We may thus reject the null hypothesis while taking a very small risk of being wrong. This is an important step as it validates the reliability of subsequent RDA results.Then the details of the constrained inertia carried by each RDA axis are provided in a table:

Axis F1 carries 45% of the constrained inertia, which is 18% of the total inertia. Together, as we have seen, axes F1 & F2 carry 80% of the constrained inertia, which corresponds to 32% of the total inertia.

An equivalent table with the

**unconstrained inertias**is displayed after:The

**standardized canonical coefficients**allow to assess the effect strength of each coefficient from the explanatory variables on every axis.**Observation scores**are the coordinates of observations (sites) on the RDA chart.**Contributions (Observations)**are the extent to which every observation contributed to the construction of every axis.**Squared cosines (Observations)**are the representation quality of each observation on each axis.Observations 15, 16 and 20 seem to have widely contributed to the construction of axis F1 and are well represented on it.

Scores and squared cosines are also shown for the response variables (species in this example).

Likewise, species squared cosines reflect the representativeness of each species on each axis. Species such as

*Elymus repens*,*Poa pratensis*,*Ranunculus flammula*and other species seem to be well presented on axis F1.Scores of the explanatory variables coefficients are also provided.

At the bottom of the report, we see the

**RDA chart**, also called RDA triplot if it also includes observations (which have been removed here to ensure a better readability).On a given RDA axis, only observations and response variables with high squared cosines should be interpreted.

Axis one seems to be well related to both explanatory variables, with Nature Conservation Management and important soil depth on the left and Biological and Hobby farming on the right.

Nature conservation management thus seems to be linked to important A1 horizon soil depth. These environments are characterized by

*Eleocharis palustris*,*Ranunculus flammula*,*Salix repens*and more.On the other hand, thinner soils are linked to biological farming and hobby farming with an important relative abundance of species such as

*Lolium perenne*and*Poa pratensis*.Axis two is also related to soil depth as well as to the Standard Farming category, characterized by

*Alopecurus geniculatus*.