A guide to choose a descriptive statistics tool according to the situation
Describing data is an essential part of statistical analysis aiming to provide a complete picture of the data before moving to advanced methods. The type of statistical methods used for this purpose are called descriptive statistics. They include both numerical (e.g. mean, mode, variance…) and graphical tools (e.g. histogram, boxplot…) which allow to summarize a set of data and extract important information such as central tendencies and dispersion. Moreover, we can use them to describe the association between several variables.
In order to choose the right descriptive statistics tool, we need to consider the types and the number of variables we have as well as the objective of the analysis. Based on these three criteria we have generated a grid that will help you decide which tool to use according to your situation.
The first column of the grid refers to data types:
- Quantitative: containing variables that describe quantities of the objects of interest. The values are numbers. The weight of an infant is an example of a quantitative variable.
- Qualitative: containing variables that describe qualities of the objects of interest. These values are called categories, also referred as levels or modalities. The gender of an infant is an example of a qualitative variable. The possible values are the categories male and female.
- Mixed: containing both types of variables.
The second column indicates the number of variables. The proposed tools can handle either the description of one (univariate analysis) or the description of the relationships between two (bivariate analysis) or several variables. The grid also includes a column with an example for each situation.
Please note that the list below is not exhaustive. However, it contains the most commonly used descriptive statistics, all available in XLSTAT.
|Data description||Objective||Example||Numerical tool||Graphical tool|
|Quantitative||One variable (univariate analysis)||Estimate a frequency distribution||How many people per age class attended this event? (here the investigated variable is age in a quantitative form)||Frequency table||Histogram|
|Measure the central tendency of one sample||What is the average grade in a classroom?||Mean, median, mode||Box plot
|Measure the dispersion of one sample||How widely or narrowly are the grades dispersed around the mean grade in a classroom?||Range, standard deviation, variance, coefficient of variation, quartiles||Box plot
|Characterize the shape of a distribution||Is the employee wage distribution in a company symmetric?||Skewness and kurtosis coefficients||Histogram|
|Visually control wether a sample follows a given distribution||What is the theorical percentage of students who obtained a better note than a given threshold||Probability plot|
|Measure the position of a value within a sample||What data point can be used to split the sample into 95% of low values and 5% of high values?||Quantiles or Percentiles||Box plot|
|Detect extreme values||Is the height of 184cm an extreme value in this group of students?||Box plot|
|Two variables (bivariate analysis)||Describe the association between two variables||Does plant biomass increase or decrease with soil Pb content?||Correlation coefficients
|Several variables||Describe the association between multiple variables||What is the evolution of the life expectancy, the fertility rate and the size of population over the last 10 years in this country?||Correlation coefficients
(up to 3 variables to describe over time)
(up to 3 variables to describe)
|Describe the association between three variables under specific conditions||How to visualize the proportions of three ice cream ingredients in several ice scream samples?||Ternary diagram|
|Two matrices of several variables||Describe the association between two matrices||Does the evaluation of a series of products differ from a panel to another?||RV coefficient|
|Qualitative||One variable (univariate analysis)||Compute the frequencies of different categories||How many clients said they are satisfied by the service and how many said they were not?||Frequency table
|Detect the most frequent category||Which is the most frequent hair color in this country?||Mode||Bar chart
|Two variables (bivariate analysis)||Measure the association between two variables||Does the presence of a trace element change according to the presence of another trace element?||Contingency table (or cross-tab)
||3D graph of contingency table
Stacked or clustered bars
(quantitative & qualitative)
|Describe the relationship between a binary and a continuous variable||Is the concentration of a molecule in rats linked to the rats' sex (M/F)?||Biserial correlation||Boxplot|
|Describe the relationship between a categorical and a continuous variable||Does sepal length differ between three flower species?||Univariate descriptive statistics for the quantitative variable within each category of the qualitative variable||Boxplot|
||Describe the relationship between one categorical and two quantitative variables||Does the amount of money spent on this commercial website change according to the age class and the salary of the customers?||Scatterplot