Which descriptive statistics tool should you choose?
This article will help you choose the right descriptive statistics tool for your data. Each tool is available in Excel using the XLSTAT software.
The purpose of descriptive statistics
Describing data is an essential part of statistical analysis aiming to provide a complete picture of the data before moving to exploratory analysis or predictive modeling. The type of statistical methods used for this purpose are called descriptive statistics. They include both numerical (e.g. central tendency measures such as mean, mode, median or measures of variability) and graphical tools (e.g. histogram, box plot, scatter plot…) which give a summary of the dataset and extract important information such as central tendencies and variability. Moreover, we can use descriptive statistics to explore the association between two or several variables (bivariate or multivariate analysis).
For example, let’s say we have a data table which represents the results of a survey on the amount of money people spend on online shopping on a monthly average basis. Rows correspond to respondents and columns to the amount of money spent as well as the age group they belong to. Our goal is to extract important information from the survey and detect potential differences between the age groups. For this, we can simply summarize the results per group using common descriptive statistics, such as:
The mean and the median, that reflect the central tendency.
The standard deviation, the variance, and the variation coefficient, that reflect the dispersion.
In another example, using qualitative data, we consider a survey on commuting. Rows correspond to respondents and columns to the mode of transportation as well as to the city they live in. Our goal is to describe transportation preferences when commuting to work per city using: - The mode, reflecting the most frequent mode of commuting (the most frequent category).
The frequencies, reflecting how many times each mode of commuting appears as an answer.
The relative frequencies (percentages), which is the frequency divided by the total number of answers.
Bar charts and stacked bars, that graphically illustrate the relative frequencies by category.
A guide to choose a descriptive statistics tool according to the situation
In order to choose the right descriptive statistics tool, we need to consider the types and the number of variables we have as well as the objective of the study. Based on these three criteria we have generated a grid that will help you decide which tool to use according to your situation.
The first column of the grid refers to data types:
Quantitative dataset: containing variables that describe quantities of the objects of interest. The values are numbers. The weight of an infant is an example of a quantitative variable.
Qualitative dataset: containing variables that describe qualities of the objects of interest (categorical or nominal data). These values are called categories, also referred as levels or modalities. The gender of an infant is an example of a qualitative variable. The possible values are the categories male and female. Qualitative variables are referred as nominal or categorical.
Mixed dataset: containing both types of variables.
The second column indicates the number of variables. The proposed tools can handle either the description of one (univariate analysis) or the description of the relationships between two (bivariate analysis) or several variables. The grid provides intuitive example for each situation as well as a link of a tutorial explaining how to apply each XLSTAT tool using a demo file.
Descriptive Statistics grid
Please note that the list below is not exhaustive. However, it contains the most commonly used descriptive statistics, all available in Excel using the XLSTAT add-on.
|Data description||Objective of the analysis||Example||Numerical tool||Graphical tool|
|Quantitative||One variable (univariate analysis)||Estimate a frequency distribution||How many people per age class attended this event? (here the investigated variable is age in a quantitative form)||Frequency table|
|Measure the central tendency of one sample||What is the average grade in a classroom?||Mean, median, mode||Box plot Scattergram Strip plot|
|Measure the dispersion of one sample||How widely or narrowly are the grade scores dispersed around the mean score in a classroom?||Range, standard deviation, variance, coefficient of variation , quartiles||Box plot Scattergram Strip plot|
|Characterize the shape of a distribution||Is the employee wage distribution in a company symmetric?||Skewness and kurtosis coefficients||Histogram|
|Visually control wether a sample follows a given distribution||What is the theorical percentage of students who obtained a better note than a given threshold||Probability plot|
|Measure the position of a value within a sample||What data point can be used to split the sample into 95% of low values and 5% of high values?||Quantiles or Percentiles||Box plot|
|Detect extreme values||Is the height of 184cm an extreme value in this group of students?||Box plot|
|Two variables (bivariate analysis)||Describe the association between two variables||Does plant biomass increase or decrease with soil Pb content?||Correlation coefficients||Correlation Map Scatterplot|
|Several variables||Describe the association between multiple variables||What is the evolution of the life expectancy, the fertility rate and the size of population over the last 10 years in this country?||Correlation coefficients||Motion charts (up to 3 variables to describe over time) Scatterplot or 3D Scatterplot (up to 3 variables to describe)|
|Describe the association between three variables under specific conditions||How to visualize the proportions of three ice cream ingredients in several ice scream samples?||Ternary diagram|
|Two matrices of several variables||Describe the association between two matrices||Does the evaluation of a series of products differ from a panel to another?||RV coefficient|
|Qualitative||One variable (univariate analysis)||Compute the frequencies of different categories||How many clients said they are satisfied by the service and how many said they were not?||Frequency table|
|Detect the most frequent category||Which is the most frequent hair color in this country?||Mode||Bar chart Pie chart|
|Two variables (bivariate analysis)||Measure the association between two variables||Does the presence of a trace element change according to the presence of another trace element?||Cross-tabulations and Contingency tables||3D graph of contingency table Stacked or clustered bars|
|Mixed (quantitative & qualitative)||Two variables (bivariate analysis)||Describe the relationship between a binary and a continuous variable||Is the concentration of a molecule in rats linked to the rats' sex (M/F)?||Biserial correlation|
|Describe the relationship between a categorical and a continuous variable||Does sepal length differ between three flower species?||Univariate descriptive statistics for the quantitative variable within each category of the qualitative variable||Boxplot|
|Several variables (multivariate analysis)||Describe the relationship between one categorical and two quantitative variables||Does the amount of money spent on this commercial website change according to the age class and the salary of the customers?||Scatterplot (with groups)|
How to run descriptive statistics in XLSTAT?
In XLSTAT, you will find a large variety of descriptive statistics tools in the Describing data menu. The most popular feature is Descriptive Statistics. All you have to do is select your data on the Excel sheet, then set up the dialog box and click OK. It's simple and quick. If you do not have XLSTAT, download for free our 14-Day version.
Many outputs are available for both quantitative and qualitative data. Check the one you wish to display in your output sheet.
Outputs for quantitative data
Statistics: Min./max. value, 1st quartile, median, 3rd quartile, range, sum, mean, geometric mean, harmonic mean, kurtosis (Pearson), skewness (Pearson), kurtosis, skewness, CV (standard deviation/mean), sample variance, estimated variance, standard deviation of a sample, estimated standard deviation, mean absolute deviation, standard deviation of the mean.
Graphs: box plots, scattergrams, strip plots, Q-Q plots, p-p plots, stem and leaf plots. It is possible group together the various box plots, scattergrams and strip plots on the same chart, sort them by mean and color by group to compare them.
Outputs for qualitative data
Statistics: No. of categories, mode, mode frequency, mode weight, % mode, relative frequency of the mode, frequency, weight of the category, percentage of the category, relative frequency of the category
Graphs: Bar charts, pie charts, double pie charts, doughnuts, stacked bars, multiple bars
XLSTAT has developed a series of statistics tutorials that will provide you with a theorical background on inferential statistical, data modeling, clustering, multivariate data analysis and more. These guides will also help you in choosing an appropriate statistical method to investigate the question you are asking.
Source: Introductory Statistics: Exploring the World Through Data: Robert Gould and Collen Ryan**
Was this article useful?