The terms sensitivity and specificity are easily confused, but refer to very different things. Although no medical test is perfect, many of a test’s attributes can be measured so that the best possible, and appropriate, test can be chosen for a given question.
Besides sensitivity and specificity, other attributes of tests used in clinical medicine include its predictive value and its accuracy.
For this article, the term ′diagnostic tests′ will include everything medical professionals do to diagnose disease. This includes assessing symptoms and signs, as well as what is conventionally referred to as tests, such as laboratory investigations and X-rays.
- A positive test result means the test says the disease is present. For example, a patient’s x-ray appears to show lung cancer.
- A true-positive test result means not only that the test says the disease is present, but that the disease really is present. The patient’s symptoms, medical history, and biopsy results prove they have cancer. The x-ray was a true positive.
- A negative test result means the test says the disease is not present. For example, a patient checks their blood pressure and it reads in the normal range, telling them that they do not have hypertension (high blood pressure).
- A true-negative test result means not only that the test says the disease is absent, but that the disease really is absent. The patient wears a 24-hour blood pressure monitor and it does not pick up a high blood pressure, nor does the doctor when they check the patient’s blood pressure repeatedly over the years. The patient does not have hypertension, and the test result was a true negative.
The gold standard is the best single test (or a combination of tests) that is considered the current preferred method of diagnosing a particular disease (X). All other methods of diagnosing X, including any new test, need to be compared against this ′gold standard’. The gold standard is different for different diseases. The gold standard for X may be considered outdated or inadequate, but any new test designed to replace the gold standard has to be, initially, validated against the gold standard. If the new test is indeed better, there are ways to prove that; following which the new test may become the gold standard.
This is the extent to which a test measures what it is supposed to measure; in other words, it is the accuracy of the test. Validity is measured by sensitivity and specificity. These terms, as well as other jargon, are best illustrated using a conventional two-by-two (2 x 2) table. The information obtained by comparing a new diagnostic test with the gold standard is conventionally summarised in a two-by-two table (Table 1).
- In cell ′a′ we enter those in whom the test in question correctly diagnosed the disease (as determined by the gold standard). In other words, the test is positive, as is the ‘gold standard’. These are the true positives (TP).
- In cell ′b′ we enter those who have positive results for the test in question but do not have disease according to the ′gold standard’ test. The newer test has wrongly diagnosed the disease: These are false positives (FP).
- In cell ′c′ we enter those who have disease on the ′gold standard’ test but have negative results with the test in question. The test has wrongly labelled a diseased person as ′normal’. These are false negatives (FN).
- In cell ′d′ we enter those who have no disease as determined by the ′gold standard’ test and are also negative with the newer test. These are true negatives (TN).
Sensitivity (Positive in Disease)
= a / a+c
= a (true positive) / a+c (true positive + false negative)
= Probability of being test positive when disease present.
Specificity (Negative in Health)
The ability of a test to correctly classify an individual as disease-free is called the test′s specificity (Table 2).
= d / b+d
= d (true negative) / b+d (true negative + false positive)
= Probability of being test negative when disease absent.
Sensitivity and specificity are inversely proportional, meaning that as the sensitivity increases, the specificity decreases and vice versa.
Positive Predictive Value (PPV)
It is the percentage of patients with a positive test who actually have the disease. In a 2 x 2 table (Table 1), cell ′a′ is ′true positives′ and cell ′b′ is ′false positives′. In a real life situation, we do the new test first and we do not have results of ′gold standard′ available. We want to know how this new test is doing. PPV tells us about this – how many of test positives are true positives; and if this number is higher (as close to 100 as possible), then it suggests that this new test is doing as good as ′gold standard′.
= a / a+b
= a (true positive) / a+b (true positive + false positive)
= Probability (patient having disease when test is positive)
Negative Predictive Value (NPV)
It is the percentage of patients with a negative test who do not have the disease. In 2 x 2 table (Table 1), cell ′d′ is ′true negatives′ and cell ′c′ is ′false negatives.′ NPV tells us how many of test negatives are true negatives; and if this number is higher (should be close to 100), then it suggests that this new test is doing as good as ′gold standard.′
= d / c+d
= d (true negative) / c+d (false negative + true negative)
= Probability (patient not having disease when test is negative)
Positive and negative predictive values are directly related to the prevalence of the disease in the population (Figure 1). Assuming all other factors remain constant, the PPV will increase with increasing prevalence; and NPV decreases with increase in prevalence.
Everything discussed so far assumes that the sensitivity and specificity do not change as one deal’s with different groups of people. Sensitivity and specificity, however, can change if the population tested is dramatically different from the population to be treated, especially if the spectrum of the disease is different. In more severe disease, medical professionals are more likely to be able to make a diagnosis; and thus sensitivity goes up.
What if the new test is actually better than the gold standard? There is no shortcut to the process of comparing it to the existing gold standard. The new (presumably better) test will detect more disease than the ′gold standard.′ In the 2×2 table, the subjects labelled as ′diseased′ by the new test (but still ′normal′ on the ′gold standard′) will go in cell ′b′ (false positives). If, on follow-up, a significant number of these patients actually develop disease (gold standard positive), then the new test is in fact detecting disease earlier than, and is better than, the gold standard. In some instances, there may be other strategies available to determine straight away whether the new test is in fact better.
Why Highly Sensitive Tests Are Important
Tests with a high sensitivity are often used to screen for disease. Screening tests tend to cast a wide net in order to pick up all cases of a disease and not miss anyone, but they often include some accidental positive results in people who do not actually have the disease.
Why Highly Specific Tests Are Important
Tests with a high specificity are used to confirm the results of sensitive, but less specific screening tests. People who came up positive on a very sensitive screening test may come up negative on a specific confirmatory test, which means the medical professional can reassure them they do not actually have the disease.
For example, if 100 people were to be tested for HIV, one would start by testing them with ELISA, a test that is very sensitive (picks up almost all cases) but not very specific (may show HIV even in some people who do not have the virus, a false-positive result). Then, all the people who had positive ELISA tests would be retested with a Western blot, which is so specific that it would almost never falsely show HIV. The people with false-positive results could then be reassured that they almost certainly do not have HIV, because the very specific Western blot can accurately tell the difference.
Tests can be very specific without being sensitive, which is why it is important to take both factors into consideration. For example, if one were to screen 100 at-risk people for HIV with a Western blot, everyone who tested positive would be a true positive, but some people who actually had HIV would test negative – their disease would be missed, a false-negative test result.
Tests Need to Be both Sensitive and Specific
A test that is completely sensitive without being specific is not helpful, and neither is a test that is specific but not sensitive.
For instance, suppose a medical professional is considering using a new test to measure hematocrit (the percentage of blood volume that is taken up by red blood cells). The normal value of hematocrit is about 46% in men and about 38% in women. If the hematocrit is too low, that means the person has anaemia.
Suppose the XYZ Company has devised a test for anaemia that they say is 100% sensitive. Every single person with anaemia will be found with this test. The test is such that, if the person’s hematocrit is under 40%, the test labels them as anaemic.
It is true that this test will detect everyone who truly is anaemia. People with an anaemic – range hematocrit of 25% will be picked up, since they fall below 40%. But many, many people with normal hematocrits will be lumped in with the anaemic people, because the cut-off value is so high that it includes many normal values (38%-40%). Obviously this test will not be of much help, because although it finds everyone who is anaemic – it is 100% sensitive – it finds a lot of people who are not, too. There are too many false positives. Not all those people need treatment.
Let us say that a competing company, the ABC Company, then devises a test for anaemia that they advertise is 100% specific. In this test, there won’t be any false positives. The test labels everyone with a hematocrit of 5% or below with anaemia. If the person tests positive in ABC’s test, unlike with XYZ’s, a doctor can be absolutely certain they are anaemia.
Unfortunately, that cut-off is so low that it will miss many people with milder degrees of anaemia. In fact, it wil not pick up many cases at all, since people with hematocrits of 5% are extremely rare—that level is so anaemia that they would be near death. People with the dangerously low hematocrit of 10% will not be picked up with this test. It may be 100% specific, but it is also essentially useless.
An Ideal Combination
Tests, then, should be both sensitive and specific as much as possible. Neither one is less important than the other.
Parikh, R., Mathai, A., Parikh, S., Sekhar, G.C. & Thomas, R. (2008) Understanding and Using Sensitivity, Specificity and Predictive Values. Indian Journal of Ophthalmology. 56(1), pp.45-50.
Simon. D. & Boring III, J.R. (1990) Sensitivity, Specificity, and Predictive Value. (3rd ed) In: Clinical Methods: The History, Physical, and Laboratory Examinations, Walker, H.K., Hall, W.D. & Hurst, J.W. (eds). London: Butterworths.
- What are sensitivity and specificity? (mcmasterevidence.wordpress.com)
- Clinical Decision-Making: Part II (sciencebasedmedicine.org)
- What are pre-test probability, post-test probability and likelihood ratios? (mcmasterevidence.wordpress.com)