Assessing Validity in Systematic Reviews (Internal and External)

internal validity external validity in systematic reviews

The validity of results and conclusions is a critical aspect of a systematic review. A systematic review that doesn’t answer a valid question or hasn’t used valid methods won’t have a valid result. And then it won’t be generalizable to a larger population, which makes it have little impact or value in the literature.

So then, how can you be sure that your systematic review has an acceptable level of validity? Look at it from the perspective of both external validity and internal validity.

What is validity and why is it important for systematic reviews?

Validity for systematic reviews is how trustworthy the review’s conclusions are for a reader.

Systematic reviews compile different studies and present a summary of a range of findings.

It’s strength in numbers – and this strength is why they’re at the top of the evidence pyramid, the strongest form of evidence.

Many health organizations, for instance, use evidence from systematic reviews, especially Cochrane reviews, to draft practice guidelines. This is precisely why your systematic review must have trustworthy conclusions. These will give it impact and value, which is why you spent all that time on it.

Validity measures this trustworthiness. It depends on the strength of your review methodology. External validity and internal validity are the two main means of evaluation, so let’s look at each.

External validity in systematic reviews

External validity is how generalizable the results of a systematic review are. Can you generalize the results to populations not included in your systematic review? If “yes,” then you’ve achieved good external validity.

If you’re a doctor and read a systematic review that found a particular drug effective, you may wonder if you can use that drug to treat your patients. For example, this systematic review concluded antidepressants worked better than placebo in adults with major depressive disorder. But…

  • Can the results of this study also be applied to older patients with major depressive disorder?
  • How about for adolescents or certain cultures?
  • Is the treatment regimen self-manageable?

Various factors will impact the external validity. The main ones are…

Sample size

Sampling is key. The results of a systematic review with a larger sample size will typically be more generalizable than those with a smaller sample size.

This meta-analysis estimated how sample size affected treatment effects when different studies were pooled together. The authors found the treatment effects were 32% larger in studies with a smaller sample size vs. a larger one. Trials with smaller sample sizes could provide more exaggerated results than those with larger sample sizes and, by extension, the greater population.

Using a smaller sample size for your systematic review will lower its generalizability (and thus, external validity). The simple takeaway is:

Include as many studies as possible.

This will improve the external validity of your work.

Participant characteristics

Let’s say the conclusions of your systematic review are restricted to a specific sex, age, geographic region, socioeconomic profile, etc. This limits generalizability to participants with a different set of characteristics.

For example, this review concluded that a mean of 27.22% of medical students in China had anxiety (albeit with a range of 8.54% to 88.30%). That’s a key finding from 21 studies.

But what about medical students from a different country?

Or, for that matter, what about Chinese students not studying medicine? Will a similar percentage of them suffer from anxiety?

These questions don’t decrease the value of the findings. The review provides work to build on. But technically, its external validity faces some limitations.

Study setting

Let’s say that your systematic review examined a particular risk factor for a disease in a specific setting.

Can you extrapolate those findings to other settings?

For example, this study evaluated different determinants of population health in urban settings. The authors found that income, education, air quality, occupation status, mobility, and smoking habits impacted morbidity and mortality, in different urban settings.

Are the same findings valid in other urban settings in a different country? Are the findings adaptable to rural settings?


With what are you comparing your treatment of interest in your systematic review?

If you compare a new treatment with a placebo, you may find a vast difference in treatment effects. But if you compare a new treatment with another active treatment, the difference in effects may be less prominent. See this systematic review and meta-analysis of treatments for hypertrophic scar and keloid. This review examined two treatments and a placebo to increase its external validity.

The comparator you chose for your systematic review should ideally be a close match to real-world practice. This is another way of upping its external validity.

Reporting external validity

Many systematic reviews insist that you report internal validity yet overlook external validity. In fact, researchers don’t usually use the very term external validity. Many authors use “generalizability,” “applicability,” “feasibility,” or “interchangeability.” They are essentially different terms for the same thing.

The PRISMA guidelines are (as of this writing) what your systematic reviews should follow. Read this article to learn about navigating PRISMA. But even PRISMA doesn’t insist on external validity as much as internal validity.

Authors usually don’t see the need to stress external validity in systematic reviews for all these reasons. Researchers have pointed this out and suggested the importance of reporting external validity.

Nevertheless, internal validity may receive greater attention and is also critical for your systematic review’s overall validity and worth.

Internal validity in systematic reviews

As the name implies, internal validity looks at the inside of the study rather than the external factors. It’s about how strong the study methodology is, and in a systematic review, it’s largely defined by the extent of bias.

Internal validity is easier to measure and achieve than external validity. This owes to the extensive work that’s gone into measuring it. Many organizations, such as Cochrane collaboration and the Joanna Briggs Institute, have developed tools for calculating bias (see below). A similar effort hasn’t gone into measuring external validity.

As a systematic reviewer, you must check the methodological quality of the studies in your systematic review and report the extent of different types of bias within them. This accumulates toward your own study’s internal validity.

Selection bias

Selection bias refers to the selection of participants in a trial.

If the baseline characteristics of two groups in a study are considerably different, selection bias is likely present.

For example, in a randomized controlled trial (RCT) of a new drug for heart failure, if one group has more diabetic patients than the other, then this group is likely to have lower treatment success.

Non-uniform allocation of intervention between two can negatively affect the results.

Strong randomization can reduce selection bias. This is why RCTs are considered the gold standard in evidence generation.

To check selection bias in an RCT in your systematic review, search for words that describe how randomization was done. If the study describes a random number table, sequence generation for randomization, and/or allocation concealment before patients are assigned to the different groups, then there’s probably no selection bias.

This neurological RCT is a good example of strong randomization, despite a relatively small population (n=35).

Performance bias

Performance bias checks if all treatment groups in a study have received a similar level of care. A varying level of care between the groups can bias the results. Researchers often blind or mask the study participants and caregivers to reduce performance bias. An RCT with no details about blinding or masking probably suffers from performance bias.

Blinding, however, isn’t always possible, so a non-blinded study may still have worth and still warrant inclusion in your review.

For example, a cancer drug trial may compare one drug given orally and another injected drug. Or a surgical technique trial may compare surgery with non-invasive treatment.

In both situations, blinding is not practical. The existing bias should be acknowledged in such cases.

Detection bias

Detection bias can occur if the outcome assessors are aware of the intervention the groups received. If an RCT mentions that the outcome assessors were blinded or masked, this suggests a low risk of detection bias.

Blinding of outcome assessors is important when an RCT measures subjective outcomes. This study, for instance, assessed postoperative pain after gallbladder surgery. The postoperative dressing was identical so that the patients would be unaware of (blinded from) the treatment received.

Attrition bias

Attrition bias results from incomplete study data.

Over the course of an RCT, patients may be excluded from analysis or may not return for follow-up. Both result in attrition. All RCTs have some attrition, but if the attrition rate is considerably different between the study groups, the results become skewed.

Attrition bias decreases when using intention-to-treat analysis. But in a per-protocol analysis, attrition bias is usually high. If a study uses both these analyses and finds the results are similar, the attrition bias is considered low.

For example, this RCT of a surgical procedure found that the intention-to-treat analysis and per-protocol analysis were similar. This suggests low attrition bias.

If you find the RCT included in your systematic review hasn’t performed an intention-to-treat analysis, then it’s likely that the included RCT suffers from attrition bias.

Reporting bias

When there are remarkable differences between reported and unreported findings in an RCT, that’s usually a case of reporting bias.

This bias can also arise when study authors report only statistically significant results, leaving out the non-significant ones. Many journals encourage authors to share their data sets to overcome this bias.

For an expert look at risk of bias in systematic reviews, see this article.

Calculating and reporting internal validity/bias

As bias can hurt your review’s internal validity, you must identify the different types of bias present in the studies you include.

Many tools now exist to help with this. Which tool you use depends on the nature of the studies in your review.

Do your systematic review or meta-analysis in less than 1 month

A systematic review is a valuable contribution to the literature. It’s top-level evidence and it will advance your research career.

We have published experts ready to assist you with all steps of the systematic review process. Go here to find how you can put Edanz on your team and get a systematic review in as little as 3 weeks!

And find how Edanz’s other research services can help you reach new heights.

Scroll to Top