## Dataset for describing quantitative data

An Excel sheet with both the data and the results can be downloaded by clicking on the button below:

Download the data

Our goal here is to summarize the results per age group using common descriptive statistics, such as:

- The
**mean**and the**median**, that reflect the central tendency - The
**standard deviation**, the**variance**and the**variation coefficient**, that reflect the dispersion.

## Setting up the dialog box for descriptive statistics

1. Once XLSTAT is open, select the XLSTAT / **Describing data / Descriptive statistics** **command as shown below.
**

2. The

**Descriptive Statistics**dialog box appears.

3. In the

**General tab**, select the column corresponding to the money spend on online shopping in the

**Quantitative data**field.

Then select the column corresponding to the age class in the Subsamples field.

We also want to display **Variable-Category labels** in the output. These include the variable name as a prefix and the category name as a suffix.

Finally, select the **Sheet **option in order to display the results on a new sheet and the **Sample labels** to consider the first row of the data table as labels.

4. In the **Options** tab, activate the following options:

In this tutorial, we focus on descriptive statistics so we do not activate Charts.

Since we are only dealing with one single variable we do not choose the** ****Normalize or Rescale** option. That is more interesting in the case of several variables measured on different scales.

5. In the **Outputs **tab, click the **All**** **button to select all the statistics for quantitative data.

## Interpreting descriptive statistics for quantitative data

The results are displayed on the new sheet named **Desc.** A full set of descriptive statistics is displayed for all respondents (column B) as well as per age class (columns C-E).

The most commonly used basic statistics are the measures of the central tendency (e.g. mean, median) that give information around the center of the data, and the measures of dispersion (e.g. standard deviation, range, variance, variation coefficient), that describe data variability.

Definitions and formulas can be found in the XLSTAT **Help menu** (click on Help button in the dialog box).

### a. Interpreting the central tendency

The **mean** shows that respondents globally spend 156 USD on** average** per month in online shopping.

The **median** value is 139 USD. This means that half of the respondents spend more than 139 USD and the other half less than 139 USD. The data is likely to be skewed to the right since the median is slightly lower than the mean value.

Looking across the three age groups, we observe that consumers aged between 30 and 45 spend on average more than the two other groups (mean and median). One interpretation could be that people of this age category have more money to spend than younger categories while they are keener on technology than the older classes.

### b. Interpreting the dispersion

Although young consumers seem to spend more on average than the oldest ones (>45), they have a higher **sample standard deviation** (SD = 62). This means there is a wider range of amounts spent by category (30-45) compared to category (>45). Similarly, we may say that older people tend to spend similar amounts of money (low SD). Different explanations can be given. For example, we might say that young buyers include more diverse socio-professional categories (students, professionals, unemployed) than older ones.

These conclusions can be also drawn based on the variance. The benefit of using the SD instead of the variance is that SD uses the units of the original scale.

Another measure of spread is the variation coefficient. This measure is particularly interesting to use when comparing samples with different units because it is unitless. The higher the **variation coefficient**, the greater the level of dispersion around the mean.

### Standard deviation (n) or standard deviation (n-1)?

In this tutorial, we are using the 150 respondents as a sample in order to study the whole population. This explains our choice to use the **sample SD [Standard Deviation (n-1)]** ** **for the interpretation. In the ideal but impossible case where we could collect data for all people who shop online (i.e. online shoppers population), the **population SD [Standard Deviation (n)]** or **population variance [variance (n)]** would be more appropriate.

## What’s next: Describing a quantitative variable with box plots

Looking for a quick and easy way to graphically describe quantitative variables? Box plots are the way to go. This tool provides five basic statistics (minimum, first quartile, median, third quartile, and maximum) at a glance, either for the overall data either by group. Check out our tutorials on** box plots **and **notched box plots** **here**.