Comparison of Supervised Machine Learning Algorithms

The following two grids compare the main Supervised Machine Learning algorithms available in XLSTAT. One grid is for classification tasks (qualitative Y), the other is for regression tasks (quantitative Y). For a short introduction to the principles of Supervised Machine Learning, check out this article.

Algorithms are compared with regards to several criteria

Can they work with more variables than observations?
Do they easily adapt to non-linear relationships between the predictors and the outcome?
Can the algorithm be used for explanatory purposes? In other words, can it be used to describe the relative impacts of predictors on the outcome?
Can they automatically detect and learn interactions among predictors?
What are the main hyperparameters to tune?

Classification algorithms

Algorithm	Works with more variables than observations?	Adapts to non-linear situations?	Explanatory intelligibility	Automatically learns relevant interactions among predictors?	Main Hyperparameters	XLSTAT menu	Remarks
Logistic Regression	No	-	+++	No	none	Modeling data	Good option for explanatory intelligibility (provides log-odds coefficients and p-values)
Penalized regression (Ridge, Lasso, Elastic Net)	Yes	-	++	No	lambda, alpha	XLSTAT-R, glmnet	Select Binomial or Multinomial family
Linear Discriminant Analysis	No	-	+	No	none	Analyzing data / Discriminant Analysis; Activate Equality of Covariance Matrices in the Options tab
Quadratic Discriminant Analysis	No	+	+	No	none	Analyzing data / Discriminant Analysis; Deactivate Equality of Covariance Matrices in the Options tab
Partial Least Squares Discriminant Analysis (PLS-DA)	Yes	-	+	No	number of components	Modeling data	Typically used with few observations & many variables (chemometrics)
General Additive Models	No	++	+	No	Method, add extra penalty	XLSTAT-R, gam
Naive Bayes	Yes	-	-	No	Smoothing parameter	Machine Learning	Fast computations on large data sets
Support Vector Machines (SVM)	Yes	++ (RBF kernel recommended for non-linear situations)	-	No	C, kernel and kernel-specific hyperparemeters	Machine Learning	Computationally intensive on large data sets
K Nearest Neighbors (KNN)	Yes	++	-	No	Number of neighbors	Machine Learning
Classification trees (C&RT)	Yes	++	++	Yes	CP	Machine Learning	Binary splits at each node
Classification trees (CHAID)	Yes	++	++	Yes	CP	Machine Learning	Multiple splits at each node
Classification Random Forests	Yes	++	+	Yes	CP, mtry	Machine Learning	Better predictive performance compared to classification trees
Neural networks	Yes	++	-	Yes	Network architecture, error function, activation functions	XLSTAT-R, neuralnet	Requires advanced expertise

Regression algorithms

Algorithm	Works with more variables than observations?	Adapts to non linear situations?	Explanatory intelligibility	Automatically learns relevant interactions among predictors?	Main Hyperparameters in XLSTAT	XLSTAT menu	Remarks
Linear regression	No	-	+++	No	none	Modeling data	Good option for explanatory intelligibility (slope coefficients and p-values)
Penalized regression (Ridge, Lasso, Elastic Net)	Yes	-	++	No	lambda, alpha	XLSTAT-R, glmnet	Select Gaussian family
Quantile Regression	Yes	-	+	No	none	Modeling data
General Additive Models	No	++	+	No	Method, add extra penalty	XLSTAT-R, gam
Partial Least Squares (PLS)	Yes	-	+	No	number of components	Modeling data	Typically used with few observations & many variables (chemometrics)
Principal Component Regression (PCR)	Yes	-	+	No	Standardize variables	Modeling data
K Nearest Neighbors (KNN)	Yes	++	-	No	number of neighbors	Machine Learning
Regression trees (C&RT)	Yes	++	++	Yes	Minimum parent size, minimum son size, maximum depth, CP	Machine Learning	Binary splits at each node
Regression trees (CHAID)	Yes	++	++	Yes	Minimum parent size, minimum son size, maximum depth, CP	Machine Learning	Multiple splits at each node
Random Forests	Yes	++	+	Yes	CP, mtry	Machine Learning	Better predictive performance compared to regression trees
Neural Network	Yes	++	-	Yes	Network architecture, error function, activation functions	XLSTAT-R, neuralnet	Requires advanced expertise

Was this article useful?

Comparison of Supervised Machine Learning Algorithms

Algorithms are compared with regards to several criteria

Classification algorithms

Regression algorithms

Similar articles