A problem called Sampling bias

Regardless of the methodology used or the discipline studied, researchers need to ensure that they are using representative samples that reflect the characteristics of the population they are studying. This article will explore the concept of sampling bias, its different types and ways of application, and best practices to mitigate its effects.

What is sampling bias?

Sampling bias refers to a situation in which certain individuals or groups in a population are more likely to be included in a sample than others, leading to a biased or unrepresentative sample. This can happen for a variety of reasons, such as non-random sampling methods, self-selection bias, or researcher bias.

In other words, sampling bias can undermine the validity and generalizability of research findings by skewing the sample in favor of certain characteristics or perspectives that may not be representative of the larger population.

Ideally, you have to select all of your survey participants in a random manner. However, in practice, it can be hard to do a random selection of participants due to constraints such as cost and respondent availability. Even if you do not do a randomized data collection, it is crucial to be aware of the potential biases that could be present in your data.

Some examples of sampling bias include:

Volunteer bias: Participants who volunteer to participate in a study might have different characteristics than those who do not volunteer, leading to a non-representative sample.
Non-random sampling: If a researcher only selects participants from certain locations, or only selects participants with certain characteristics, it can lead to a biased sample.
Survivorship bias: This occurs when a sample only includes individuals who have survived or succeeded in a particular situation, leaving out those who did not survive or failed.
Convenience sampling: This type of sampling involves selecting participants who are easily accessible, such as those who happen to be nearby, or those who respond to an online survey, which may not represent the larger population.
Confirmation bias: Researchers might select – unconsciously or deliberately – participants who support their hypothesis or research question, leading to biased results.
Hawthorne effect: Participants may alter their behavior or responses when they know they are being studied or observed, leading to non-representative results.

If you are aware of these biases, you can consider them in the analysis to do bias correction and better understand the population that your data represents.

Types of sampling bias

Selection bias: occurs when the sample is not representative of the population.
Measurement bias: occurs when the data collected is inaccurate or incomplete.
Reporting bias: occurs when the respondents provide inaccurate or incomplete information.
Non-response bias: occurs when some members of the population do not respond to the survey, leading to an unrepresentative sample.

Causes of sampling bias

Convenience sampling: selecting a sample based on convenience rather than using a scientific method.
Self-selection bias: only those who volunteer to participate in the survey are included, which may not be representative of the population.
Sampling frame bias: when the sampling frame used to select the sample is not representative of the population.
Survival bias: when only certain members of the population participate, leading to an unrepresentative sample. For example, if researchers only survey people who are alive, they may not receive input from people who died before the study was conducted.
Sampling bias due to lack of knowledge: failing to recognize the sources of variability that can result in biased estimates.
Sampling bias due to errors in sample administration: failing to use an appropriate or well-functioning sampling frame or refusing to participate in the study leading to a biased selection of the sample.

Sampling Bias in clinical trials

Clinical trials are responsible to test the effectiveness of a new treatment or medication on a particular population. They are an essential part of the drug development process and determine whether a treatment is safe and effective before its release to the public in general. However, clinical trials are also prone to selection bias.

Selection bias occurs when the sample used for a study is not representative of the population to represent. In the case of clinical trials, selection bias can occur when participants are either selectively chosen to participate or are self-selected.

Let us say that a pharmaceutical company is conducting a clinical trial to test the efficacy of a new cancer medication. They decide to recruit participants for the study through advertisements in hospitals, clinics, and cancer support groups, as well as through online applications. However, the sample they collect may be biased toward those who are more motivated to participate in a trial or who have a certain type of cancer. This can make it difficult to generalize the results of the study to the larger population.

To minimize selection bias in clinical trials, researchers must implement strict inclusion and exclusion criteria and random selection processes. This will ensure that the sample of participants selected for the study is representative of the larger population, minimizing any bias in the data collected.

Problems due to sampling bias

Sampling bias is problematic because it is possible that a statistic computed of the sample is systematically erroneous. It can lead to a systematic over- or under-estimation of the corresponding parameter in the population. It occurs in practice, as it is practically impossible to ensure perfect randomness in sampling.

If the degree of misrepresentation is small, then the sample can be treated as a reasonable approximation to a random sample. In addition, if the sample does not differ markedly in the quantity being measured, then a biased sample can still be a reasonable estimate.

While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample or ignorance of the bias in their process of measurement or analysis.

Extrapolation: beyond the range

In statistics, drawing a conclusion about something beyond the range of the data is called extrapolation. Drawing a conclusion from a biased sample is one form of extrapolation: because the sampling method systematically excludes certain parts of the population under consideration, the inferences only apply to the sampled subpopulation.

Extrapolation also occurs if, for example, an inference based on a sample of university undergraduates is applied to older adults or to adults with only an eighth-grade education. Extrapolation is a common error in applying or interpreting statistics. Sometimes, because of the difficulty or impossibility of obtaining good data, extrapolation is the best we can do, but it always needs to be taken with at least a grain of salt — and often with a large dose of uncertainty

From science into pseudoscience

As mentioned on Wikipedia, an example of how ignorance of a bias can exist is in the widespread use of a ratio (a.k.a. fold change) as a measure of the difference in biology. Because it is easier to achieve a large ratio with two small numbers with a given difference, and relatively more difficult to achieve a large ratio with two large numbers with a larger difference, large significant differences may be missed when comparing relatively large numeric measurements.

Some have called this a ‘demarcation bias’ because the use of a ratio (division) instead of a difference (subtraction) removes the results of the analysis from science into pseudoscience.

Some samples use a biased statistical design, which nevertheless allows the estimation of parameters. The U.S. National Center for Health Statistics, for example, deliberately oversamples minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups.

These surveys require the use of sample weights to produce proper estimates across all ethnic groups. If certain conditions are met (chiefly that the weights are calculated and used correctly) these samples permit accurate estimation of population parameters.

Best Practices for Mitigating Sampling Bias

It is crucial to select an appropriate sampling method to ensure the resulting data accurately reflects the studied population.

Random Sampling Techniques: Using random sampling techniques increases the probability that the sample is representative of the population. This technique helps to ensure that the sample is as representative as possible of the population in question, and thus, less likely to contain biases.
Sample Size Calculation: Sample size calculation should be done so that adequate power is available to test statistically meaningful hypotheses. The larger the sample size, the better the representation of the population.
Trend Analysis: Seeking alternative data sources and analyzing any observed trends in the data that may be unselected.
Checking for Bias: Occurrences of bias should be monitored to identify systematic exclusion or over-inclusion of specific data points.

Mind the samples

Sampling bias is a significant consideration when conducting research. Regardless of the methodology used or the discipline studied, researchers need to ensure that they are using representative samples that reflect the characteristics of the population they are studying.

When creating research studies, it is crucial to pay close attention to the sample selection process, as well as the methodology used to collect data from the sample. Best practices such as random sampling techniques, sample size calculation, trend analysis, and checking for bias should be used to ensure that research results are valid and reliable, thus making them more likely to affect policy and practice.

Eye-catching scientific infographics in minutes

Mind the Graph is a powerful online tool for scientists who need to create high-quality scientific graphics and illustrations. The platform is user-friendly and accessible to scientists with varying levels of technical expertise, making it an ideal solution for researchers who need to create graphics for their publications, presentations, and other scientific communication materials.

Whether you are a researcher in the life sciences, physical sciences, or engineering, Mind the Graph offers a wide range of resources to help you communicate your research findings in a clear and visually compelling way.