# Which multivariate data analysis method to choose?

2017-05-04

## Choosing an appropriate multivariate data analysis technique

Here we define multivariate (or multidimensional) datasets as data tables containing more than 2 variables (usually stored in columns) measured on more than 2 statistical units (individuals, patients, sites…) usually stored in rows. Multidimensional data analysis techniques are used to extract interesting information in large datasets that can hardly be read in their raw format. Those tools are often referred to as data mining tools.

The following grid will guide you through the choice of an appropriate data mining method according to the type of questions you want to investigate using your data (exploratory or decisional) as well as the structure of your data. The list is not exhaustive. However, it contains the most commonly used methods, all available in XLSTAT.

We divided the questions into two types:

• Exploratory questions allow the investigation of multivariate datasets without considering any particular hypothesis to validate. Exploratory multivariate data analysis tools often imply a reduction of the dimensionality of large datasets making data exploration more convenient.
• Decisional questions imply testing the relationship between two sets of variables (correlation), or explaining a variable or a set of variables by another set (causality).

 Question Number of tables Data description Tool Remarks Exploratory 1 Quantitative variables only Principal Component Analysis(PCA) Considers all the variance in the data; components do not necessarily reflect real phenomena Exploratory 1 Quantitative variables only Factor analysis (FA) Considers only the covariance between variables; latent factors reflect real phenomena Exploratory 1 Proximity matrix Multidimensional scaling (MDS) /Principal Coordinate Analysis(PCoA) Exploratory 1 Contingency table (2 qualitative variables) Correspondence Analysis (CA) Exploratory 1 Qualitative variables only Multiple Correspondence Analysis(MCA) Exploratory ≥2 Qualitative variables tables and-or quantitative variables tables Multiple Factor Analysis (MFA) Exploratory ≥2 Quantitative variables tables Generalized Procrustes Analysis(GPA) Could include an inferential part: the consensus test Exploratory (clustering) 1 Quantitative variables only Clustering tools (AHC, k-means...) Classical clustering methods could be applied on a qualitative variables table indirectly, using row scores on the dimensions of a Multiple Correspondence Analysis decisional (causality) 1 One dependent variable and several quantitative and-or qualitative explanatory variables Statistical modelling tools(regression, ANCOVA…) decisional (correlation) or exploratory 2 Two quantitative variables tables Canonical correlation analysis Linear relationships between the two tables decisional (causality) or exploratory 2 One contingency table Y (often a site-species data matrix) and one explanatory quantitative and-or qualitative variables table (X) Canonical correspondence analysis Unimodal relationships between X and Y; could be used to depict species niches along environmental gradients decisional (causality) 2 One dependent quantitative variables table (Y) and one quantitative and-or qualitative explanatory variables table (X) Redundancy analysis (RDA) Linear relationships between X and Y decisional (causality) 2 One dependent quantitative variables table (Y) and one quantitative and-or qualitative explanatory variables table (X) Partial Least Square regression(PLS) Especially used for prediction decisional (causality) ≥2 Several tables of manifest variables, each table representing a latent variable Partial Least Square Structural Equation Modelling (PLS-PM)

1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk