|Coffee beans, courtesy Elvis John Ferrao on Flickr.|
A prospective cohort study, or longitudinal study, is an epidemiological study that defines two or more groups of people with various exposures (e.g hormone replacement therapy, coffee drinking), and then follows this cohort to measure any differences in outcomes (disease) between the groups in order to infer a causal association (see figure below). Ideally, researchers ascertain a breadth of exposures and characteristics to discover associations and to improve the validity of such discoveries. If an exposure is rare in the general population, a "special exposure cohort" can be used to follow a uniquely exposed group, such as vegetarianism in Seventh-day Adventists, and compare the special group's outcomes to a similar non-exposed group or the general population. There are a few major prospective studies in nutritional epidemiology that warrant some attention.
Source: Wikipedia. Note that the investigator ascertains the exposures (black/white) prior to the unknown outcome.
One of the most influential diet studies is the Nurse's Health Study. This study is technically composed of two phases, NHS I and NHS II. NHS I began in 1976 to identify potential long-term complications of oral contraceptives that many women had begun to take. It was later expanded to include diet and quality of life data. NHS II began in 1989 and recruited younger nurses for the purpose of collecting data on oral contraception, diet, and lifestyle factors that began earlier in life. Major findings from this study include: smoking has a strong positive assocation with cardiovascular disease that reduces with smoking cessation, obesity increases the risks of several chronic diseases, and a Mediterranean-type diet appears protective. However, the spurious idea that hormone replacement therapy would prevent coronary heart disease in all post-menopausal women was also produced by this study.
The Health Professionals Follow-Up Study (HPFS) began in 1986 as the male complement to the NHS. And it produced the coffee-prostate cancer study above. It is comprised of roughly 51,000 non-medical doctor health practitioners; over half of them are dentists and the vast majority are white. Here is an example of the long form survey sent to participants. Both the NHS and HPFS recruited motivated healthcare practitioners because this population is expected to accurately report disease outcomes and has the occupational commitment to maintain follow-up. In fact, the NHS has retained a 90% response rate.
The Eurpoean Prospective Investigation into Cancer and nutrition, or EPIC, is a European equivalent. This study has recruited over half of a million people from ten European countries, and studies the general population rather than healthcare practitioners. Here's a neat infographic depicting the reported diet of "health conscious" and general population groups; it's a nice example of how types of people do not just aggregate around one food choice or the other, but rather a whole pattern of eating.
Cohort studies can be prohibitively expensive and are generally restricted to relatively common diseases or outcomes. But they offer substantial benefits over other types of observational studies for establishing a causal assocation. If putative exposures and outcomes are measured at the same time, such as in cross-sectional studies, one cannot say with absolute certainty which one preceeded the other. Cohort studies are better capable of determing this information, or more technically, establishing the direction of causality. Additionally, cohort studies often use real-time medical records, physical examination, or biological tests, and sometimes all three, to provide valid measurements of the exposures rather than relying on subjective recall. However, unlike randomized controlled trials, the exposure status is chosen by the subjects and not the researchers.
In a prospective cohort study, the investigator ascertains the exposure status of the subjects and then groups them accordingly. If science was easy, then these populations would just so happen to be the same with the exception of the exposure of interest. But science can be cruel, and there is myriad reasons why individuals "choose" different exposures, which biases the results. In our coffee study, it is possible that men who were developing lethal prostate cancer avoided coffee due to subclinical symptoms related to the impending prostate cancer diagnosis, which would bias cancer prone individuals away from coffee exposure. This is called self-selection bias and can only be avoided by assigning exposure. The investigators attempted to correct for this reverse causation by doing a sub-analysis with urinary symptoms to ensure that these type of symptoms were not associated with lower coffee consumption. But such a bias could still have occurred from an unknown non-urinary symptomology or "drive" for cancer prone men to drink less coffee. As such, the causal association between exposure and outcome from a cohort study is only inferred (e.g "heavy coffee consumption protects against lethal prostate cancer"), and can technically only be interpreted as "people who choose, or are otherwise driven by unknown factors, to consume large amounts of coffee tend to have a lower risk of lethal prostate cancer." And then there's the issue of what we failed to measure.
The second major problem with the validity of cohort studies is the effect of confounding variables. Because the groups are not randomized, the population with the exposure of interest may also have another exposure that associates with the outcome. The classic example is the apparent positive association between coffee consumption and lung cancer. This association is entirely explained by the fact that coffee drinkers also tend to smoke. In the coffee and prostate cancer study, the invetigators made a Herculean effort to control for confounding by adding numerous potential confounders into their risk model. These included: race, BMI, smoking, multivitamin use, PSA test history, and many more for a total of seventeen variables. While adjusting for more and more confounders does enhance the validity of the association, remember that this type of manipulation is limited to "prostate cancer risk factors previously identified in this cohort and in other studies." We cannot know what we have not measured, and it is always possible that there is at least one unknown variable that confounds our association of interest.
For what it's worth, I have a soft-spot for cohort studies. The idea of a "natural experiment" is somehow quaint and very appealing. They offer a lot to validity over other types of observational studies, but they always have important flaws, namely selection bias and potential confouding variables. At the risk of pessimism, good science requires that we highlight the flaws of each experiment. Read the headlines (and preferably the whole article!) with a critical eye. Take note of the study design, and always ask how it fits into the greater scheme of the evidence. As the authors concluded, "it is premature to recommend that men increase coffee intake to reduce advanced prostate cancer risk based on this single study." Given the nature of this study, I will remain skeptical that coffee is therapeutic, although I am more confident that it is harmless. But keep in mind that my opinion is heavily biased, as I've invested too much into my habit to stop any time soon.