Introduction

Articles in scientific journals routinely report the results of studies. Based on these results, important decisions are made. For example:

New drugs are often tested against standard drugs to determine if the new drug is more effective; or
Several methods of manufacturing may be compared to select the best technique for fabricating a component; or
Evidence may be examined to determine if there is a possible link between one activity and a result (e.g. asbestos and lung cancer).

In a majority of these types of studies the results are commonly summarised by a statistical test, and a decision about the significance of the result is based on a p-value. It is important that fitness professionals can understand what a p-value actually means and the voracity of its value.

The Courage of Your Convictions

The reader of the article must, like a juror on a criminal case, decide if the evidence is strong enough to believe. Assuming that the study was designed according to good scientific practice and principles, the strength of the evidence is contained in the p-value. Therefore, it is important for the reader to know what the p-value is telling them.

How P-Values Work

To describe how the p-value works, a common statistical test can use utilised as an example – the Student’s t-test for independent groups.

For this test, subjects are randomly assigned to one of two groups. Some treatment is performed on the subjects in one group (the intervention group), and the other group acts as a control (the control group, where no treatment or a standard treatment is given). For this example, suppose group one is given a new headache medicine and group two is given the standard headache medicine. Time to relief is measured for both groups. The outcome measurement is assumed to be a continuous variable which is normally distributed, and it is assumed that the population variance for the measure is the same for both groups.

For this example the sample mean for group one is 10 and the sample mean for group two is 12. The sample standard deviation for group one is 1.8 and the sample standard deviation for group two is 1.9. The sample size for both groups is 12. Entering this data into a statistical program will produce a t-statistic and a p-value. In this example the Statistical Data Analysis programme reports a calculated t = -2.65 with 22 degrees of freedom, and a p-value of 0.0147. This means that you have evidence that the mean time to relief for group one was significantly different than for group two.

P-Values and the Null Hypothesis

To interpret this p-value, you must first know how the test was structured. In the case of this two-sided t-test, the hypotheses are:

H_o: u₁ = u₂ (Null hypothesis: means of two groups are equal)
H_a: u₁ <> u₂ (Alternative: means of the two groups are not equal)

A low p-value for the statistical test points to rejection of the null hypothesis because it indicates how unlikely it is that a test statistic as extreme as or more extreme than the one given by this data will be observed from this population if the null hypothesis is true. Since p=0.015, this means that if the population means were equal as hypothesised (under the null), there is a 15 in 1000 chance that a more extreme test statistic would be obtained using data from this population. If you agree that there is enough evidence to reject the null hypothesis, you conclude that there is significant evidence to support the alternative hypothesis.

The researcher decides what significance level to use – that is, what cut-off point will decide significance. The most commonly used level of significance is 0.05. When the significance level is set at 0.05, any test resulting in a p-value under 0.05 would be significant. Therefore, you would reject the null hypothesis in favour of the alternative hypothesis.

Since you are comparing only two groups, you can look at the sample means to see which is largest. The sample mean of group one is smallest, so you conclude that medicine one acted significantly faster, on average, than medicine two. Using standard journal format, this would be reported in an article using a phrase akin to: “The mean time to relief for group one was significantly smaller than for group two. (two sided t-test, t(22) = -2.65, p=0.015).”

Yes, No or Maybe

P-values do not simply provide you with a Yes or No answer, they provide a sense of the strength of the evidence against the null hypothesis; the lower the p-value, the stronger the evidence. Once you know how to read p-values, you can more critically interpret journal articles, and decide for yourself if you agree with the conclusions of the author(s).

Definitions

Normal Distribution: Data can be ‘distributed’ (spread out) in different ways. It can be spread out more on the left, more on the right or it can be all jumbled up. There are many cases where the data tends to be around a central value with no bias left or right, and it gets close to a ‘normal distribution. For a fun, simple and colourful explanation view: http://www.mathsisfun.com/data/standard-normal-distribution.html.