This tutorial shows how to easily draw scatter plots with confidence ellipses in Excel using the XLSTAT software.
Dataset for creating scatter plots with confidence ellipses in Excel
In this tutorial we use a data table where rows correspond to customers of a commercial website and columns include the number of connections of each customer to the website’s Facebook page, the money they spent on the website, as well as the age class to which they belong (15 – 30; 30 – 45; > 45).
An Excel sheet with both the data and the results can be downloaded by clicking here.
Goal of this tutorial
The aim of this tutorial is to use the XLSTAT scatter plot function with the 95% confidence ellipse option to explore customer profiles of a commercial website.
Setting up a scatter plot with confidence ellipses in XLSTAT
Click on the XLSTAT menu / Visualizing data / Scatter plots.
In the General tab, we will assign the Nb of Facebook connections variable to the scatter plots’ X axis and the money spent to the Y axis. Select the data accordingly. Furthermore, we will ask XLSTAT to color the scatter plot’s points according to the age class variable. We will thus activate the Groups option and select the age class column in the corresponding field.
In the Options tab, make sure you select the Legend and Confidence ellipses options.
Interpreting a scatter plot with confidence ellipses in XLSTAT
Globally, we see a positive relationship between the number of Facebook connections and the money spent on the website. This conclusion is quite trivial.
Things become more interesting when we compare age classes with each other.
- Customers older than 45 (violet points at the bottom-left corner of the chart) seem to connect less to Facebook and to spend less money compared to the two other age classes. The ellipse associated to this age class does not overlap with the two other ellipses. We may say that this group is relatively different from the other two with regards to the money spent and the number of Facebook connections.
- Customers aged 35 – 45 (green points) are those who spend the most money. They also connect to Facebook far more than people aged > 45, but a bit less than younger customers.
- The youngest class (15 – 30; blue points) is characterized by the highest Numbers of Facebook connections and relatively high amounts of money (although less than the intermediate class). As their confidence ellipses overlap, the young and intermediate age classes are relatively similar to each other.
Many interpretations may be proposed. For example, we may say that the youngest customers love spending time on Facebook but do not have enough money to invest. Customers belonging to the oldest class are less keen on social networks and prefer spending their money in “real” shops. Intermediately-aged customers have more money to spend than younger people and grew with the internet technology, which may explain the fact that they connect to Facebook almost as often as the youngest customers.
Going further: Increasing the number of dimensions with PCA
Imagine performing this kind of exploratory analyses with a far higher number of variables. The Principal Component Analysis is a very popular tool which will reduce the dimensionality of your data table to let you interpret patterns on 2-dimensional graphs.