What is a statistical test?
A statistical test is a way to evaluate the evidence the data provides against a hypothesis. This hypothesis is called the null hypothesis and is often referred to as H0. Under H0, data are generated by random processes. In other words, the controlled processes (the experimental manipulations for example) do not affect the data. Usually, H0 is a statement of equality (equality between averages or between variances or between a correlation coefficient and zero, for example).
H0 is usually opposed to a hypothesis called the alternative hypothesis, referred to as H1 or Ha. Most of the time, the alternative hypothesis is the one the user would like to demonstrate. It involves a statement of difference (difference between averages for example).
If the data does not provide enough evidence against H0, H0 is not rejected. If instead, the data shows strong evidence against H0, H0 is rejected and Ha is considered as true with a quantified (low) risk of being wrong. A statistical test allows to reject / not to reject H0.
Suppose you're comparing two varieties of apples and you're wondering whether the average size of apples from variety 1 differs from the average size of apples from variety 2. Here's how we would write down the null and alternative hypotheses:
- H0: average size of apple from variety 1 = average size of apple from variety 2.
- Ha: average size of apple from variety 1 ≠ average size of apple from variety 2.
After looking at the above charts, a statistical test can be used to answer the question: how different is my data from equivalent data under the null hypothesis? In other words, how different is my data from equivalent data under a hypothesis where apple size does not change according to variety? This looks like our original question, more or less.
Other examples of null hypotheses versus alternative challenging hypotheses
- H0: the insulin rate of patients receiving a placebo is equal to the insulin rate of patients receiving a medication.
- Ha: the insulin rate of patients receiving a placebo is different from the insulin rate of patients receiving a medication.
- H0: the presence of attribute A does not affect consumer preference toward this product.
- Ha: the presence of attribute A affects consumer preference toward this product.
- H0: there is no trend in this time series.
- Ha: there is a trend in this time series.
- H0: Corn fields submitted to fertilizers A, B, C or D produce equivalent yields.
- Ha: at least one fertilizer induces a difference in corn yield.
How to interpret the output of a statistical test: the significance level alpha and the p-value
When setting up a study, a risk threshold above which H0 should not be rejected must be specified. This threshold is referred to as the significance level alpha and should lay between 0 and 1. Low alpha’s are more conservative. The choice of alpha should depend on how dangerous it is to reject H0 while it is true. For example, in a study aiming at demonstrating the benefits of a medical treatment, alpha should be low. On the other hand, when screening the effects of many attributes on the appreciation of a product, alpha’s could be more moderate. Very often, alpha is set at 0.05 or 0.01 or 0.001.
The statistical test produces a number called p-value (that is also bounded between 0 and 1). The p-value is the probability of obtaining the data or more extreme data under the null hypothesis.
More practically, the p-value should be compared to alpha:
- If p-value < alpha, we reject H0 and accept Ha with a risk proportional to p-value of being wrong.
- If p-value > alpha, we do not reject H0, but this does not necessarily imply that we should accept it. It either means that H0 is true, or that H0 is false but our experiment and statistical test were not “strong” enough to lead to a p-value lower than alpha.
What is statistical power and in what case can we accept H0?
Statistically speaking, the ability of an experiment/a test to lead to a rejection of the null hypothesis is called statistical power. The power of an experiment increases with alpha, with the precision of the measurements and with the number of repetitions. Power also changes according to the type of statistical tests being used (see the last section of this tutorial). Power may be computed before or after an experiment. It equals 1 minus the risk of being wrong when accepting H0 (also called risk beta). So the higher the power, the lower the risk of being wrong when accepting H0 (when p-value > alpha, of course).
In summary, if p > alpha AND if statistical power is high enough (usually higher than 0.95), then we may accept H0 with a risk proportional to (1 – Power) of being wrong.
Different types of statistical tests
A statistical test can be:
So what test should we choose?
Here is a grid which will help you choose an appropriate test according to your question.