Satoshi Village the blog of Daniel Himmelstein

The Cancer Research UK Reassessment of our Lung Cancer versus Elevation Study

We would like to thank Cancer Research UK for their cancer related advocacy and their coverage of our recent publication. However, we find several aspects of their interpretation troubling.

First, we find it unwise to discount our study because it analyzed counties rather than individuals. As Professor Pearce explains, much of our current understanding of cancer risk arose initially from ecological studies:

Historically, the key area in which epidemiologists have been able to “add value” has been through this population focus, although this lesson has been forgotten by many modern epidemiologists. For example, many of the recent discoveries on the causes of cancer (including dietary factors and colon cancer, hepatitis B and liver cancer, aflatoxins and liver cancer, human papilloma virus and cervical cancer) have their origins, directly or indirectly, in the systematic international comparisons of cancer incidence conducted in the 1950s and 1960s. These suggested hypotheses concerning the possible causes of the international patterns, which were investigated in more depth in further studies. In some instances these hypotheses were consistent with biological knowledge at the time, but in other instances they were new and striking, and might not have been proposed, or investigated further, if the population level analyses had not been done.

Compared to these influential studies of the past, our study benefitted from smaller population units (counties, not states as Ms. Osgun asserts), greater data availability, greater adjustment for potential confounding factors, and modern statistical techniques. Nonetheless, we view our findings as a starting point for further epidemiological and experimental research. In the meantime, we view oxygen-driven tumorigenesis as the most plausible explanation based on established knowledge regarding the underlying biology.

The notion of oxygen as a carcinogen is neither radical nor novel, as evidenced by the 1978 Nature letter titled, “Are physiological oxygen concentrations mutagenic?” (Hint: Nature rarely publishes negative results). Our study alone cites over a dozen studies supporting oxygen’s direct role in carcinogenesis. The evidence spans the spectrum from molecular to cellular to organismal and to clinical. Inspired molecular oxygen results in intracellular formation of reactive oxygen species, which lead to mutations. Recent research found mice that were predisposed to cancer took twice as long to develop tumors when housed in low-oxygen chambers. Finally, oxygen toxicity in the human lung is well documented and childhood cancer increases following neonatal oxygen supplementation.

In addition to biological plausibility, the criticism identifies four other characteristics of strong epidemiological evidence. We believe our study satisfies these criteria:

  1. Accounting for other factors: For lung cancer, we included 11 factors in addition to elevation to adjust for important demographic and cancer-risk factors. These factors include established risk factors (such as smoking, radon, and pollution), lifestyle variables (such as obesity, education, income, and race), and demographics (such as percent male and whether the county is metropolitan). We didn’t include an age variable because our cancer rates were already age-adjusted to the 2000-census population. Furthermore, the inverse association persisted when we performed the analysis separately within each of the ten states, for males and females, and for those over and under 65 years of age. Since the causal factor appeared to affect every county, every state, both sexes, and all ages, pervasive environmental variables were the most likely confounders. Thus, we evaluated seven environmental variables such as UVB exposure, temperature, and precipitation. All seven alternative variables produced models over one billion times less likely than the elevation-including model.

  2. Large numbers: The probability of our observed association between lung cancer and elevation occurring by chance is less than one in a quadrillion. After filtering counties to only those with high-quality data, 253 counties remained for lung cancer. In the year 2000, these 253 counties had a combined population of 59,659,978, hardly a small number.

  3. Unbiased: We minimized potential biases from a variety of angles. Counties with high migration rates and Native American composition were filtered. Counties with populations below 10,000 — whose measurements had greater margins of error — were excluded, and the remaining counties were weighted by population in the analysis. Finally, we carefully selected the Western United States to minimize extraneous variation, which could cause bias, while capturing the variation in elevation of this mountainous region.

  4. Consistency: By definition, every breakthrough finding in science has yet to be reproduced upon discovery. We encourage replication on different regions and with different study designs. Our entire dataset, analysis pipeline, results, and codebase are public to make this task easier for others.

The criticism mentions the lag time between exposure to cigarette smoke and lung cancer as a potential confounder and considers the recency of our smoking data as the “likely” culprit behind elevation’s association with lung cancer. Let us destruct this argument using data. To assess the change in smoking, we collected county smoking prevalences for 1996 and 2012. For the counties included in our lung cancer analysis, county smoking prevalence in 1996 was tightly linked to smoking prevalence in 2012 (first panel below). In other words, county smoking prevalences are relatively stable over time and were older smoking data available to us, the effect on our findings would likely be minimal. Next, we found that elevation was only slightly correlated with change in smoking rates over the 16 year period (second panel). For the lag time issue to present a serious omitted-variable bias, elevation and change in smoking would have to be strongly correlated. Finally, we evaluated the correlation between change in smoking and each of the 17 variables included in our lung cancer models. We found that six variables are more correlated with change in smoking than elevation (third panel). In other words, even if the lag time issue were capable of creating a spurious association between elevation and lung cancer, any of six variables we accounted for would prevent this outcome.

The lag time between smoking exposure and cancer is unlikely to confound the elevation association.

Finally, we would like to conclude by reiterating a few points from the paper. First, the elevation association was specific to lung cancer and did not extend to breast, prostate, or colorectal cancer, providing evidence for an inhaled carcinogen. Second, the magnitude of the association was large, second only to smoking and responsible for a 12.7% decrease in lung cancer incidence for an 1000 meter rise. The likelihood of spurious associations, and especially ecological fallacy, decreases with effect size. Third, we performed extensive measures to avoid confounding, such as the stratification and subgrouping analyses, which Ms. Osgun fails to address. Finally, our publication is open access with a public peer review history, and we released our entire code, analysis, and results. We took these measures to allow independent verification and evidence-based data-driven reanalysis rather than speculation.


Daniel Himmelstein & Kamen Simeonov