Which multivariate data analysis method to choose?
Choosing an appropriate multivariate data analysis technique
Here we define multivariate (or multidimensional) datasets as data tables containing more than 2 variables (usually stored in columns) measured on more than 2 statistical units (individuals, patients, sites…) usually stored in rows. Multidimensional data analysis techniques are used to extract interesting information in large datasets that can hardly be read in their raw format. Those tools are often referred to as data mining tools.
The following grid will guide you through the choice of an appropriate data mining method according to the type of question you want to investigate using your data (exploratory or decisional) as well as the structure of your data. The list is nonexhaustive. However, it contains the most commonly used methods, all available in XLSTAT.
We divided the questions into two types:

Exploratory questions allow the investigation of multivariate datasets without considering any particular hypothesis to validate. Exploratory multivariate data analysis tools often imply a reduction of the dimensionality of large datasets making data exploration more convenient.

Decisional questions imply testing the relationship between two sets of variables (correlation), or explaining a variable or a set of variables by another set (causality).
Question  Number of tables  Data description  Tool  Remarks 

Exploratory  1  Quantitative variables only  Principal Component Analysis(PCA)  Considers all the variance in the data; components do not necessarily reflect real phenomena 
Exploratory  1  Quantitative variables only  Factor analysis (FA)  Considers only the covariance between variables; latent factors reflect real phenomena 
Exploratory  1  Proximity matrix  Multidimensional scaling (MDS) /Principal Coordinate Analysis(PCoA)  
Exploratory  1  Contingency table (2 qualitative variables)  Correspondence Analysis (CA)  
Exploratory  1  Qualitative variables only  Multiple Correspondence Analysis(MCA)  
Exploratory  1  Quantitative and qualitative variables  Factorial analysis of mixed data (PCAmix)  Contrary to MFA, the dataset is not structured in groups 
Exploratory  ≥2  Qualitative variables tables andor quantitative variables tables andor frequency table  Multiple Factor Analysis (MFA)  
Exploratory  ≥2  Quantitative variables tables  Generalized Procrustes Analysis(GPA)  Could include an inferential part: the consensus test 
Exploratory (clustering)  1  Quantitative variables only  Clustering tools (AHC, kmeans...)  Classical clustering methods could be applied on a qualitative variables table indirectly, using row scores on the dimensions of a Multiple Correspondence Analysis 
decisional (causality)  1  One dependent variable and several quantitative andor qualitative explanatory variables  Statistical modelling tools(regression, ANCOVA…)  
decisional (correlation) or exploratory  2  Two quantitative variables tables  Canonical correlation analysis  Linear relationships between the two tables 
decisional (causality) or exploratory  2  One contingency table Y (often a sitespecies data matrix) and one explanatory quantitative andor qualitative variables table (X)  Canonical correspondence analysis  Unimodal relationships between X and Y; could be used to depict species niches along environmental gradients 
decisional (causality)  2  One dependent quantitative variables table (Y) and one quantitative andor qualitative explanatory variables table (X)  Redundancy analysis (RDA)  Linear relationships between X and Y 
decisional (causality)  2  One dependent quantitative variables table (Y) and one quantitative andor qualitative explanatory variables table (X)  Partial Least Square regression(PLS)  Especially used for prediction 
decisional (causality)  ≥2  Several tables of manifest variables, each table representing a latent variable  Partial Least Square Structural Equation Modelling (PLSPM) 
Was this article useful?
 Yes
 No