2. Quantifying disease in populations
What is a case?
One approach, therefore, is to use measures that take into account the quantitative nature of disease. For example, the distribution of blood pressures in a population can be summarised by its mean and standard deviation. For practical reasons, however, it is often helpful to dichotomise the diagnostic continuum into "cases" and "non-cases". In defining the cut off point for such a division, four options may be considered:
Statistical - "Normal" may be defined as being within two standard deviations of the age specific mean, as in conventional laboratory practice. This is acceptable as a simple guide to the limits of what is common, but it must not be given any other importance because it fixes the frequency of "abnormal" values of every variable at around 5% in every population. More importantly, what is usual is not necessarily good.
Clinical- Clinical importance may be defined by the level of a variable above which symptoms and complications become more frequent. Thus, in a study of hip osteoarthritis cases were defined as subjects with a joint space of less than 2 mm on xray, as this level of narrowing was associated with a clear increase in symptoms.
Prognostic- Some clinical findings such as high systolic blood pressure or poor glucose tolerance may be symptomless and yet carry an adverse prognosis. Sometimes, as with glucose tolerance, there is a threshold value below which level and prognosis are unrelated. "Prognosticate abnormal" is then definable by this level.
Operational- For some disorders, none of the above approaches is satisfactory. In men of 50, a systolic pressure of 150 mm Hg is common (that is, "statistically normal"), and it is clinically normal in the sense of being without symptoms. It carries an adverse prognosis, with a risk of fatal heart attack about twice that of a low blood pressure, but there is no threshold below which differences in blood pressure have no influence on risk. Nevertheless, practical people require a case definition, even if somewhat arbitrary, as a basis for decisions. An operational definition might be based on a threshold for treatment. This will take into account symptoms and prognosis but will not be determined consistently by either. A person may be symptom free yet benefit from treatment or alternatively may have an increased risk that cannot be remedied.
Each of these four approaches to case definition is suitable for a different purpose, so an investigator may need to define the purposes before cases can be defined.
Whatever approach is adopted, the case definition should as far as possible be precise and unambiguous. A standard textbook of cardiology proposes these electrocardiographic criteria for left bundle branch block: "The duration of QRS commonly measures 0.12 to 0.16 seconds... V5 or V6 exhibits a large widened R wave..." (our italics). As a basis for epidemiological comparisons this is potentially disastrous, because each investigator could interpret the italicised words differently. By contrast, the epidemiological "Minnesota Code" defines it like this: "QRS duration 0.l2 seconds in any one or more limb leads and R peak duration 0.06 seconds in any one or more of leads, I, II, aVL, V5, or V6; each criterion to be met in a majority of technically adequate beats." If different studies are to be compared, case definitions must be rigorously standardised and free from ambiguity. Conventional clinical descriptions do not meet this requirement.
It is also essential to define and standardise the methods of measuring the chosen criteria. An important feature in diagnosing rheumatoid arthritis, for example, is early morning stiffness of the fingers; but two interviewers may emerge with different prevalence estimates if one takes an ordinary clinical history whereas the other uses a standard questionnaire. Cases in a survey are defined not by theoretical criteria, but in terms of response to specific investigative techniques. These, too, need to be defined, standardised, and reported adequately. As a result, epidemiological case definitions are narrower and more rigid than clinical ones. This loss of flexibility has to be accepted as the price of standardisation.
Measures of disease frequency
When the population at risk is roughly constant, incidence is measured as:
Population at risk x time during which cases were ascertained
Sometimes measurement of incidence is complicated by changes in the population at risk during the period when cases are ascertained, for example, through births, deaths, or migrations. This difficulty is overcome by relating the numbers of new cases to the person years at risk, calculated by adding together the periods during which each individual member of the population is at risk during the measurement period. Thus incidence is defined as:
Number of new cases
Total person years at risk
It should be noted that once a person is classified as a case, he or she is no longer liable to become a new case, and therefore should not contribute further person years at risk. Sometimes the same pathological event happens more than once to the same individual. In the course of a study, a patient may have several episodes of myocardial infarction. In these circumstances the definition of incidence is usually restricted to the first event, although sometimes (for example in the study of infectious diseases) it is more appropriate to count all episodes. When ambiguity is possible reports should state whether incidence refers only to first diagnosis or to all episodes, as this may influence interpretation. For example, gonorrhoea notification rates in England and Wales increased dramatically during the 1960s, but no one knows to what extent this was due to more people getting infected or to the same people getting infected more often.
Even in a chronic disease, the manifestations are often intermittent. In consequence, a "point" prevalence, based on a single examination, at one point in time, tends to underestimate the condition's total frequency. If repeated or continuous assessments of the same individuals are possible, a better measure is the period prevalence defined as the proportion of a population that are cases at any time within a stated period. Thus, the 12 month period prevalence of low back pain in a sample of British women aged 30-39 was found to be 33.6%.
Interrelation of incidence, prevalence, and mortality
If recovery and death rates are low, then chronicity is high and even a low incidence will produce a high prevalence:
Prevalence = incidence x average duration
In studies of aetiology, incidence is the most appropriate measure of disease frequency. Mortality is a satisfactory proxy for incidence if survival is not related to the risk factors under investigation. However, patterns of mortality can be misleading if survival is variable. A recent decline in mortality from testicular cancer is attributable to improved cure rates from better treatment, and does not reflect a fall in incidence.
Prevalence is often used as an alternative to incidence in the study of rarer chronic diseases such as multiple sclerosis, where it would be difficult to accumulate large numbers of incident cases. Again, however, care is needed in interpretation. Differences in prevalence between different parts of the world may result from differences in survival and recovery as well as in incidence.
Crude and specific rates
Mortality from lung cancer in men in England and Wales, 1950-89, by five year age groups
It is often helpful to break down results for the whole population to give rates specific for age and sex, but it is frustrating if results are given for 35-44 years in one report, 30-49 in another, and 31 to 40 in another. When feasible, decade classes should be 5-14, 15-24, and so on, and quinquennia should be 5-9, 10-14, and so on. Overlapping classes (5-10, 10-15) should be avoided.
Extensions and alternatives to incidence and prevalence
Some health outcomes do not lend themselves to description by an incidence or prevalence, because of difficulties in defining the population at risk. For these outcomes, special rates are defined with a quasi population at risk as denominator.
Sometimes the population at risk can be satisfactorily defined, but it cannot be enumerated. For example, a cancer registry might collect information about the occupations of registered cancer cases, but not have data on the number of people in each occupation within its catchment area. Thus, the incidence of different cancers by occupation could not be calculated. An alternative in these circumstances would be to derive the proportion of different types of cancer in each occupational group. However, care is needed in the interpretation of proportions. A high proportion of prostatic cancers in farmers could reflect a high incidence of the disease, but it could also occur if farmers had an unusually low incidence of other types of cancer. Incidence and prevalence are preferable to proportions if they can be adequately measured.
Return to main page
Go to previous page
Go to next page
Home | Current issue | Past issues | Classified ads | Career Focus | Feedback
Collections | About this site | About the BMJ | BMA | Medline