Your data analysis solution

Mean, median, standard deviation & more in Excel

2017-05-24
This tutorial shows how to compute and interpret the mean, the median, the standard deviation and other descriptive statistics for quantitative data in Excel using the XLSTAT software.        

Dataset for describing quantitative data

An Excel sheet with both the data and the results can be downloaded by clicking on the button below:

Download the data

The data represent the results of a survey on the amount of money people spend in online shopping on a monthly average basis. Rows correspond to respondents and columns to the amount of money spent as well as the age group they belong to.

Our goal here is to summarize the results per age group using common descriptive statistics, such as:
  1. The mean and the median, that reflect the central tendency
  2. The standard deviation, the variance and the variation coefficient, that reflect the dispersion.
This will allow us to extract important information from the survey and detect potential differences between the groups.

Setting up the dialog box for descriptive statistics

1. Once XLSTAT is open, select the XLSTAT / Describing data / Descriptive statistics command as shown below.
XLSTAT function for computing descriptive statistics

2. The Descriptive Statistics dialog box appears.

XLSTAT dialog box for descriptive statistics (general tab)
3. In the General tab, select the column corresponding to the money spend on online shopping in the Quantitative data field.

Then select the column corresponding to the age class in the Subsamples field.

We also want to display Variable-Category labels in the output. These include the variable name as a prefix and the category name as a suffix.  

Finally, select the Sheet option in order to display the results on a new sheet and the Sample labels to consider the first row of the data table as labels.

4. In the Options tab, activate the following options:

XLSTAT dialog box for descriptive statistics (options tab)

In this tutorial, we focus on descriptive statistics so we do not activate Charts.

Since we are only dealing with one single variable we do not choose the Normalize or Rescale option. That is more interesting in the case of several variables measured on different scales.

5. In the Outputs tab, click the All button to select all the statistics for quantitative data. 

XLSTAT dialog box for descriptive statistics (outputs tab)

Interpreting descriptive statistics for quantitative data

The results are displayed on the new sheet named Desc. A full set of descriptive statistics is displayed for all respondents (column B) as well as per age class (columns C-E).

The most commonly used basic statistics are the measures of the central tendency (e.g. mean, median) that give information around the center of the data, and the measures of dispersion (e.g. standard deviation, range, variance, variation coefficient), that describe data variability.

Definitions and formulas can be found in the XLSTAT Help menu (click on Help button in the dialog box). 

Descriptive statistics output in XLSTAT

a. Interpreting the central tendency

The mean shows that respondents globally spend 156 USD on average per month in online shopping.

The median value is 139 USD. This means that half of the respondents spend more than 139 USD and the other half less than 139 USD. The data is likely to be skewed to the right since the median is slightly lower than the mean value.

Looking across the three age groups, we observe that consumers aged between 30 and 45 spend on average more than the two other groups (mean and median). One interpretation could be that people of this age category have more money to spend than younger categories while they are keener on technology than the older classes.

b. Interpreting the dispersion

Although young consumers seem to spend more on average than the oldest ones (>45), they have a higher sample standard deviation (SD = 62). This means there is a wider range of amounts spent by category (30-45) compared to category (>45). Similarly, we may say that older people tend to spend similar amounts of money (low SD). Different explanations can be given. For example, we might say that young buyers include more diverse socio-professional categories (students, professionals, unemployed) than older ones.

These conclusions can be also drawn based on the variance. The benefit of using the SD instead of the variance is that SD uses the units of the original scale.

Another measure of spread is the variation coefficient. This measure is particularly interesting to use when comparing samples with different units because it is unitless. The higher the variation coefficient, the greater the level of dispersion around the average.

Standard deviation (n) or standard deviation (n-1)?

In this tutorial, we are using the 150 respondents as a sample in order to study the whole population. This explains our choice to use the sample SD [Standard Deviation (n-1)]  for the interpretation. In the ideal but impossible case where we could collect data for all people who shop online (i.e. online shoppers population), the population SD [Standard Deviation (n)] or population variance [variance (n)] would be more appropriate. 

What’s next: Describing a quantitative variable with box plots

Looking for a quick and easy way to graphically describe quantitative variables? Box plots are the way to go. This tool provides five basic statistics (minimum, first quartile, median, third quartile, and maximum) at a glance, either for the overall data either by group. Check out our tutorials on box plots and notched box plots here.

The following video talks about quantitative descriptive statistics, with illustration using Excel and XLSTAT
 

1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/portal/articles/autocomplete
9283