Finding Good Science
How to Critically Appraise Scientific Literature
BY CARINA DOYLE, MARK R. STENZEL, NANCY E. JOHNSON, AND JOANNA GREIG
Working from Home but Missing Your Synergist? Update Your Address
If you’ve been working from home during the pandemic, please consider updating your address with AIHA. You can change your address by editing your profile through AIHA.org. To ensure uninterrupted delivery of The Synergist, designate your home address as “preferred” on your profile. Update your address now.
Decision-making in occupational health needs to rely on the best available information. Good science must be the foundation for the industrial hygienist’s contributions in a range of areas, such as effective hazard communication, building a business case for the implementation of controls, prioritizing hazard control measures, and evaluating the magnitude of risk. This article offers tips on critically appraising scientific literature to improve the quality of information that influences your decisions.

The amount of relevant published information has increased steadily over the last 20 years. This trend is depicted in Figure 1, which shows the number of studies categorized under the subject “Occupational Health” in the U.S. National Library of Medicine, which is the world’s largest biomedical library. In most scientific journals, peer review processes consider whether the science is valid, original, and important. While some studies that pass through peer review may still lack scientific validity, most published research is critically appraised before publication. However, even the most rigorous peer review process at the highest quality journal is unlikely to produce a paper with both perfect methodology and total applicability to IH professionals’ specific concerns.
Figure 1. Number of studies categorized under the Medical Subject Heading (MeSH) “Occupational Health” in the U.S. National Library of Medicine, 1990–2020.
Click or tap on the figure to open a larger version in your browser.
Scientific research often has limitations, and our interpretation of research findings must be informed by an understanding of and appreciation for these limitations. Assessing the scientific validity of a study, evaluating its practical relevance, and understanding its results are important skills to master to support evidence-informed decision-making.
EVIDENCE FROM STUDIES IN OCCUPATIONAL SETTINGS Our knowledge of the effects of occupational hazards can be enhanced by studies conducted in occupational settings. Consider the example of formaldehyde, an agent with multiple adverse health endpoints that has undergone extensive study.
Through large cohort studies in occupational settings, including embalmers and users of formaldehyde resins or molding compounds, excesses of lymphohematopoietic malignancies associated with peak exposures (greater than 2 ppm) have been observed, as explained in a paper that appeared in the May 2009 issue of the Journal of the National Cancer Institute. From a toxicology perspective, it is not clear how formaldehyde exposure would be associated with excess lymphohematopoietic malignancies because the body naturally produces significant quantities of formaldehyde. Research findings point to potentially important differences between formaldehyde in occupational settings, where workers are typically exposed to solutions of 30–50 percent formaldehyde in water and stabilized with alcohol, compared with the formaldehyde that the body naturally produces. Could the alcohol, or the hemiacetal that forms through reaction of the formaldehyde and alcohol, contribute to the growth of tumors? Would studies on paraformaldehyde, a solid source of formaldehyde that does not contain the stabilizers, result in similar adverse health effects?
It is now well recognized that lymphohematopoietic malignancies have been observed in workers exposed to formaldehyde. While uncertainty remains regarding the mechanism that leads to these malignancies, this example illustrates the benefits of evaluating determinants of injury and health in the workplace environment.
IDENTIFYING SUPPORTING EVIDENCE: RELEVANCE Appraisal of research papers begins with identification of the hypothesis. Often found at the end of the introduction, the hypothesis may be presented as a series of questions posed by the investigators.
When searching for supporting evidence for your evaluation, one of the first steps is to consider whether the question addressed by the study matches the question you wish to answer. This seemingly straightforward assessment sometimes requires careful consideration. Questions addressed by published studies tend to have several components. One component is often a factor that might affect a health outcome, such as an exposure to a hazardous agent or an intervention. Other components typically include the population under study, the outcome being evaluated, and, when applicable, the reference or control group strategy.
There could be important differences between your target population and that which the researchers set out to study, such as differences in socioeconomic status or industry sector. A strategy for preventing workplace musculoskeletal disorders may be effective in one population but not in another.
Ensuring that the question addressed by the study matches yours prevents you from applying the information out of context. Inappropriate application of study findings can lead to decisions that are not supported by scientific research. Consider, for example, a study that evaluates the effects of a low-dose exposure to a hazardous agent. Concluding that the hazardous agent does not lead to adverse health outcomes may be erroneous when extrapolating from the low doses discussed in the study to the peak exposures observed in your workplace. Conversely, extrapolating findings from highly exposed populations may lead to undue concern when exposure levels encountered in your workplace are much lower.
A timeframe is often implicit in the study question, but it may be helpful to explicitly identify the timeframe when dissecting the question. Consider whether the timeframe after which the outcome measures (for example, the adverse health effects) were observed aligns with your question to determine whether the science appropriately supports your decision-making process.
IDENTIFYING SUPPORTING EVIDENCE: STUDY DESIGN Reviews can be thought of as secondary research on individual studies. Critical appraisal of secondary research allows you to differentiate between a well-done systematic review and a subjective summary paper. A key element of systematic reviews is an assessment of heterogeneity between study results. Statistical tests for heterogeneity, although often limited in power, provide a means of evaluating whether the results from different studies are likely to reflect a single underlying effect. Where differences between study results are observed, readers should consider whether the variations are cause for concern. This is particularly important when assessing reviews that combine results of observational studies, which may be more prone to bias and confounding than experimental studies.
A systematic review can be evaluated for potential publication bias, a distortion of the scientific record caused by the greater likelihood that studies with statistically significant findings will be published. Effects of publication bias may also be observed if multiple publications of a single study are included in the same review.
Primary research may be either experimental, where researchers are actively involved (as in experimental trials), or observational, where researchers are passively involved (as in cohort and case-control studies). For ethical reasons, studies that evaluate the effects of exposures to potentially harmful agents are generally observational studies. The basic study design can guide confidence in study conclusions. Some study designs are more prone to bias and confounding than others; see Figure 2 for a hierarchy of study designs adapted from the Oxford Centre for Evidence-Based Medicine. Where bias or confounding is suspected, our overall confidence in the reported results decreases, though the study may still be useful as part of our evolving understanding of the science.
Figure 2. Hierarchy of study types likely to provide the best evidence for questions about risk factors and causation, or prediction and prognosis. Adapted from the Oxford Centre for Evidence-Based Medicine.
Click or tap on the figure to open a larger version in your browser.
ASSESSING METHODOLOGY When reading a paper, a quick way to get to the nuts and bolts of the research is to check the tables and figures within the results section. With the trained eye of an industrial hygienist, you should begin to formulate questions for the authors and then look for answers in the introduction, methods, and discussion sections.
An evaluation of the methodological details within a scientific study can increase or decrease confidence in the quality of the data, analysis, and conclusions. Considerations should include a critical evaluation of participant selection, the setup and management of study groups, and the method of ascertaining the outcome measure (for example, disease or case status).
Participant selection. Consider whether the subjects in the study are representative of the target population. Bias may result if the sample of participants included in the study yields a different result than what would have been obtained if the entire population were enrolled. Consider whether those selected to participate in the study have higher or lower exposures, or are more or less sick, than those not participating.
A classic example of biased participant selection is the vinyl chloride epidemiology initially commissioned by manufacturers. As explained in Deceit and Denial: The Deadly Politics of Industrial Pollution, the cohort under examination excluded some workers with the longest exposure to the monomer while including younger workers with marginal exposure. As a result, the manufacturers were able to tout an overall mortality rate 75 percent that of the general population—the “healthy worker effect”—and therefore “no excess” cancer deaths, despite increasing liver and brain cancer incidence with time of exposure.
Establishing study groups. In comparative evaluations, throughout the course of the study the main difference between the groups should be the factor that is being tested or evaluated. For example, for a study that evaluates the effects of noise on hearing loss, the magnitude of the noise exposure should be the only difference between the exposed group and the reference group. For observational studies, which do not benefit from randomizing participants to groups, the way in which groups are matched plays an important role in mitigating potential effects of confounding variables. Statistical adjustments are often used to control for confounding and approximate the comparable status of the groups.
A potential problem for studies that assess the effects of an occupational exposure is incorrect categorization of participants by exposure status. Where applicable, consider the source and the quality of participants’ work history, the process of identifying exposure groups or job exposure matrices, and the process of estimating exposures (for example, quantitative measurements, modeling techniques, professional judgment, or self-reported survey data). If quantitative measurements have been obtained, consider the sampling strategy, the number of measurements, the analytical methods used, and whether changes in methods over time may be important to the exposure assessment effort.
Maintaining the study groups. If study conditions are likely to change, the groups may not remain comparable. In settings where exposures to occupational hazards may change over time, consider how participants’ exposures may be affected throughout the study’s follow-up period. Could these changes introduce confounding issues that affect interpretation?
When participants are followed over time to observe the effects of an exposure or treatment, the duration of the study must be sufficient to observe relevant health effects. Researchers may lose track of some participants during the follow-up period, possibly due to changes in contact information. If losses are more likely in one group than the other, bias may be introduced. For example, could there have been a greater loss of diseased participants in the exposed group compared with the non-exposed group?
Even the most rigorous peer review process at the highest quality journal is unlikely to produce a paper with both perfect methodology and total applicability to IH.
Outcome ascertainment. To establish the reliability and validity of the outcome measures, the approach to ascertaining them should be evaluated for error and bias. To minimize error, the same method of determining the measurements—the same equipment, the same appraisers, and so on—should be employed for all study participants. Bias may be introduced during the outcome ascertainment step, resulting in a systematic distortion of the measurements. When bias is present, measurements may deviate toward favoring the study hypothesis. Subjective measures, such as self-reported feedback on a questionnaire or interview responses from participants with knowledge of their exposure status, are more prone to bias than objective measures, such as biomarkers of disease. When evaluating a study, consider whether the outcome measure is objective. The validity of a study is enhanced if the outcome was determined by an independent assessor who was not aware of which group the participants were in.
  INTERPRETING RESULTS The results section of a primary research manuscript is the core of the publication. All other sections are built around the collected and analyzed data. It can be helpful to scrutinize each table and figure: do they report raw data? If the data have been transformed, does the study explain how and why? What statistical tests or methods are applied?
Studies publicized in news outlets garner attention to the study team and sponsor, potentially motivating both presentation of the data and the study narrative. Information in the actual figures and tables, versus the text description, may be informative for gaining a better understanding of the scientific findings.
If the findings appear to show an effect, the effect may be real or due to chance. There are generally two ways that statistics are used to evaluate whether the results were due to chance: hypothesis testing (p-values) and estimation (confidence intervals). P-values and confidence intervals are used to express confidence that the test value is correct.
A typical p-value of < 0.05 conveys a less than 5 percent chance that the results are coincidental. The 95 percent confidence interval (CI) expresses the range of values that likely contains the real result. A tighter range is most often associated with a smaller standard deviation, a larger sample size, or both. The 99 percent CI would be wider than a 95 percent CI. If the interval contains the value that indicates no effect (that is, one for a ratio or zero for a difference), the association is not statistically significant. For example, an interval of 0.38 - 1.8 for an odds ratio or relative risk would indicate the association is not significant. Conversely, when the interval does not contain the value that indicates no effect, we can assume the results are statistically significant.
While we must carefully consider whether a study is relevant and valid before deciding to use the results, we must also balance the need to adopt protective measures based on an evolving body of scientific information. The threshold of evidence that you need to support a decision may vary. In the absence of a strong association between exposures and adverse health effects from a well-designed study, a preliminary indication of a possible association may be sufficient for you to err on the side of caution. The industrial hygienist responsible for activities such as hazard communication and working with the occupational physician to oversee the medical surveillance program must stay informed as the science evolves. As the example of formaldehyde demonstrates, illustrating the association between exposure and lymphohematopoietic malignancies involved comprehensive investigations that took time. Neither the IH nor the physician would have the option to go back in time and modify their actions should it later be discovered that they were not commensurate with the available risk information. TOOLS FOR APPRAISAL Having the tools to identify and critically appraise relevant research is key to effective decision-making. In general, a good quality study is one that is designed to prevent the findings from being affected by error, confounding factors, or bias. These potential influences, as ever-present challenges, do not necessarily negate results or conclusions but should inform interpretation of the collected data. Helpfully, having a keen eye for analytical processes and measurement techniques is the industrial hygienist’s area of expertise. Consideration of the issues commonly considered within study evaluations can be useful in gathering evidence for increased or decreased confidence in a study’s conclusions.
CARINA DOYLE, MSc, CIH, completed graduate studies in evidence-based healthcare through the University of Oxford. After working in academia in the field of evidence-based medicine, she has since specialized in occupational and environmental health and safety. She currently practices as an industrial hygienist in western Canada. Carina is a registered specialist with AIHA in Exposure Decision Analysis and is chair of AIHA’s Occupational and Environmental Epidemiology Committee.
MARK R. STENZEL, FAIHA, CIH (1978-2018), is the founder and president of Exposure Assessment Applications, LLC, which specializes in human exposure and risk assessments. He worked for 29 years at two Fortune 500 companies in the chemical industry where he managed all aspects of comprehensive industrial hygiene programs focused on the protection of workers’ health. In 2018 he was awarded the AIHA Donald E. Cummings Memorial Award for Outstanding Contributions to the Knowledge and Practice of the Industrial Hygiene Profession. He is a member of AIHA’s Occupational and Environmental Epidemiology Committee.
NANCY E. JOHNSON, DrPH, MSPH, CIH, is an adjunct professor in the department of Environmental Health Sciences at Eastern Kentucky University. She is also secretary of AIHA's Occupational and Environmental Epidemiology Committee.
JOANNA GREIG, PhD, MHS, is an epidemiology consultant in Fairfax, Virginia, and a corresponding member of AIHA’s Occupational and Environmental Epidemiology Committee.
Send feedback to The Synergist.
RESOURCES
AIHA: Glossary of Occupational Hygiene Terms (2000).
Journal of Internal Medicine: “Outcomes Research: What Is It and Why Does It Matter?” (February 2003).
Journal of the National Cancer Institute: “Mortality from Lymphohematopoietic Malignancies Among Workers in Formaldehyde Industries: The National Cancer Institute Cohort” (May 2009).
Oxford Centre for Evidence-Based Medicine: “Glossary.”
Oxford Centre for Evidence-Based Medicine: “Levels of Evidence (March 2009)” (March 2009).
University of California Press: Deceit and Denial: The Deadly Politics of Industrial Pollution (1987).
U.S. National Library of Medicine: PubMed search, occupational health [MeSH Terms].