Your data analysis solution

Variables characterization tutorial in Excel

2016-05-06

This tutorial shows how to quickly characterize variables using other variables in Excel using the XLSTAT software.

Dataset for running a variables characterization

An Excel sheet with both the data and results used in this tutorial can be downloaded by clicking here.

The data have been collected for the medical supervision of people from Angers (France). More accurately, doctors of the teaching hospital have studied the snoring condition among 100 patients between 2000 and 2002.

Goal of this tutorial on variables characterization

Using variables characterization, the goal is to study the possible links between the various variables at our disposal, namely: Age, Weight, Size, Gender, Tobacco and Alcohol consumption and of course Snoring.

In this tutorial, two introductory studies are conducted, focusing first on the dependance between the two categorical variables Gender and Tobacco before evaluating the correlation between the continuous variables Size and Weight.

Next, leaving this classical framework, our interest will be turned on the characterization of the  category  1 of the variable Snoring, first with the categories of the variables Gender and Tobacco and then with the continuous variables Age, Weight, Alcohol and Size.

In the latest study, we want to discriminate snoring people with a set of descriptors. An easy way to achieve it is to compare the parameter values of the variables in the whole population (the initial sample) and in the subpopulation (the group with the category). When the variable is quantitative, means are compared whereas when it’s qualitative, proportions are.

Setting up the variable characterization

After opening  XLSTAT, select the XLSTAT / Describing data / Variables characterization (see below).

 menu

Once you’ve clicked the button, the Variables characterization dialog box appears.

Then you can select the data on the Excel sheet.

First study: Dependence between  Tobacco consumption and Gender

The Element to characterize corresponds to the categorical variable Tobacco.The Characterizing Element corresponds to the categorical variable Gender.

As we selected the column title for the variables, we leave the option Variable labels activated. The other options have been left at their default value.

 dialog box

The computations begin once you have clicked on OK. The results will then be displayed.

Interpreting the results of this first study

Here are the  results displayed in XLSTAT :

 results 1

The p-value is 0,004 (<=0,05). Thus, the null hypothesis of independence can be rejected and a significant dependence  between Tobacco consumption and Gender exists in this sample.

Second study: Correlation between the continuous variables Size and Weight

In this second part, the relation between Size and Weight in the sample is briefly investigated.

The Element to characterize corresponds to the categorical variable Weight.

The Characterizing Element corresponds to the categorical variable Size.

As we selected the column title for the variables, we leave the option Variable labels activated. The other options have been left at their default value.

Interpreting the results of this second study 

XLSTAT displays the following results :

 results 1

The p-value is inferior to 0,0001 (correlation coefficient is equal to 0,927). Thus a strong correlation exists between the Weight and the Size in this sample.

Third study: Characterization of the category Snoring=1 with the categories of the variables Gender and Tobacco and then with the continuous variables Weight, Size, Age and Alcohol.

This final step consists in identifying the best categories and continuous variables to  characterize people who snore.

The Element to characterize corresponds to the category: Snoring=1.

The characterizing elements first correspond to the categories: Gender=0, Gender=1, Tobacco=0, Tobacco=1 and then to the continuous variables : Weight, Size, Age and Alcohol.

As we selected the column title for the variables, we leave the option Variable labels activated. The other options have been left at their default value.

Interpreting the results of the third study 

The first table corresponds to the characterizing categories :

 results 1

The best category to characterize people who snore is the male condition (Gender=0). Paradoxically, in this sample and among the people who snore, the  non-smokers are over-represented.

The second table corresponds to the characterizing continuous variables:

 results 1

Older people (than the mean in the sample) and people who drink more alcohol (than the mean in the sample) are overexposed to the snoring condition.

Conclusion for the Variables characterization

In this sample, the typical snorer is a 56 years old male who drinks about 4 drinks a day. 

1c26995d494fb3061dd0ae8571ffc0a4@xlstat.desk-mail.com
https://cdn.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/portal/articles/autocomplete
9283