^{1}

^{2}

The author has declared that no competing interests exist.

I explore

Qualitative research is becoming an increasingly prominent way to conduct scientific research in business, management, and organization studies [

A general statement from inductive qualitative research about sample size is that the data collection and analysis should continue until the point at which no new codes or concepts emerge [

Most qualitative researchers who aim for theoretical saturation do not rely on probability sampling. Rather, the sampling procedure is purposive [

However, the minimum size of a purposive sample needed to reach theoretical saturation is difficult to estimate [

There are two reasons why the minimum size of a purposive sample deserves more attention. First, theoretical saturation seems to call for a “more is better” sampling approach, as this minimizes the chances of codes being missed. However, the coding process in qualitative research is laborious and time consuming. As such, especially researchers with scarce resources do not want to oversample too much. Some scholars give tentative indications of sample sizes that often lie between 20 and 30 and are usually below 50 [

Second, most research argues that determining whether theoretical saturation has been reached remains at the discretion of the researcher, who uses her or his own judgment and experience [

In this paper I explore the sample size that is required to reach theoretical saturation in various scenarios and I use these insights to formulate guidelines about purposive sampling. Following a simulation approach, I assess experimentally the effects of different population parameters on the minimum sample size. I first generate a series of systematically varying hypothetical populations. For each population, I assess the minimum sample sizes required to reach theoretical saturation for three different sampling scenarios: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. The latter two are purposive sampling scenarios.

The results demonstrate that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, when the mean probability of observing codes is low, the minimal information and maximum information scenarios are much more efficient in reaching theoretical saturation than the random chance scenario. However, the purposive scenarios yield significantly fewer multiple observations per code that can be used to validate the findings.

By using simulations, this study adds to earlier studies that base their sample size estimates on empirical data [

Based on my analyses, I offer a set of guidelines that researchers can use to estimate whether theoretical saturation has been reached. These guidelines help to make more informed choices for sampling and add to the transparency of the research, but are by no means intended as mechanistic rules that reduce the flexibility of the researcher [

In section 2, I discuss the theoretical concepts about purposive sampling. Section 3 describes the simulation, and the results are presented in section 4. In section 5, I draw conclusions, discuss the limitations, and offer recommendations.

I base this section largely on the existing literature on purposive sampling. I also introduce some new ideas that are sometimes implied by the literature, but that were never conceptualized.

Concept | Definition | Symbol |
---|---|---|

Information source | The unit from which information is gathered | |

Population | The total set of information sources that are potentially relevant to answering the research question | |

Sub-population | A subset of information sources that are potentially relevant to answering the research question | |

Sampling step | The number of information sources sampled so far | |

Code | A unique piece of information in the population relevant to the research | _{k} |

Number of codes | The number of unique pieces of information relevant to the research in the population | |

Theoretical saturation | All codes are observed at least once. | |

Probability of reaching theoretical saturation | The probability that each code is observed at least once | _{n} |

Sampling steps to reach theoretical saturation | The number of sampling steps needed to observe each code at least once | _{s} |

Mean probability of observing codes | The mean probability that a code is observed at an information source | |

Repetitive codes | Codes that are observed more than once. | - |

Minimum number of repetitive codes | The minimum number of times that a code needs to be observed | |

Sampling strategy | How the researcher selects the information sources; commonly empirically based. | - |

Sampling scenario | Three theory based scenarios on how the sampling process proceeds: random chance, minimal information, maximal information | - |

Efficiency | The fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is | - |

A population is the “universe of units of analysis” from which a sample can be drawn [

From this population, one or multiple information sources are sampled as part of an iterative process that includes data collection, analysis, and interpretation. At each iteration the researcher has the opportunity to adjust the sampling procedure and to select a new information source to be sampled. I assume in this paper that at each iteration only one source is sampled; this assumption has no further consequences for the remainder of the paper. Moreover, I use the term “sampling steps” rather than iterations, as this excludes analysis and interpretation. Finally, contrarily to formal quantitative sampling terminology, I count as sampling steps only observations that participated in the research, thus excluding non-response or the inability to access sources. This eases interpretation.

A population of information sources is usually not homogeneous. Multiple

First, if there are differences in the type of information source, sampling strategy, type of data, data collection, or methods of analysis, then there are sub-populations. The reason for this criterion is that different methods are needed. These different methods need to be accounted for [

Second, information sources should be interchangeable at the sub-population level. Within a sub-population, no single information source may be critical for reaching theoretical saturation. Hence, no single information source in a sub-population can contain information that is not found in other information sources in that sub-population. The reason for this criterion is that if a particular information source is critical for theoretical saturation, it should by definition be included in the research. Observing critical information is not guaranteed if the inclusion is dependent on a particular sampling strategy. A critical information source should then be treated as a separate sub-population of size one.

Second, if cases or groups are compared, it is important to treat these as sub-populations. For example, distinguishing between sub-populations is a condition for data triangulation, because the researcher effectively compares the results from one sub-population (for example interviews with managers) with the results from another (for example annual reports). Furthermore, comparative case studies [

The concept of sub-populations implies that theoretical saturation can be reached at the level of the overall population or at the level of the sub-population. Reaching theoretical saturation in all the sub-populations is not a condition for reaching theoretical saturation at the level of the population, since sub-populations can have an overlap in information. However, it is necessary to reach theoretical saturation in each sub-population in comparative research or when triangulating results, as this is the only way to make a valid comparison.

In most cases of inductive qualitative research, information is extracted from information sources, interpreted and translated into codes. I refer to codes here in the context of inductive qualitative data analysis, which means that they can be seen as “tags” or “labels” on unique pieces of information [

The population contains all the codes that can be potentially observed. At the start of a study, the codes in the population are unobserved and the exact number of codes in the population is unknown. Consulting information sources sampled from the population allows codes to become observed. Theoretical saturation is reached when each code in the population has been observed at least once.

I let, the number of sampling steps required to reach theoretical saturation depend on two population characteristics. First, the larger the

Purposive sampling allows the researcher to make an informed estimation about the probability of observing a given code at each sampling step, using (theoretical) prior information, like sampling frames [

Some researchers consider codes that are observed more than once as redundant, since they do not add new information to the data [

To guard against misinformation and to enhance the credibility of the research, it can be advisable to aim for a sample in which each code is observed multiple times (this also follows from the logic behind triangulation). One could argue that if a code, after a substantial number of sampling steps, is still observed only once while almost all other codes have a higher incidence, a critical examination of the code is warranted. In many cases, the researcher may already be suspicious of such a code during the analysis. A frequency of one does not mean that the code is wrong by definition; it is possible that the code is just rare or that the low frequency is just a coincidence. However, it is relatively easy to make an argumentative judgment about the plausibility of rare codes (for example based on theory).

A sampling strategy describes how the researcher selects the information sources. The most elaborate inventory of sampling strategies comes from Patton [

I use the concepts described above to formulate three generic sampling scenarios. I refer to sampling scenarios to avoid confusion with the sampling strategies. The term scenarios term signifies that they are based on theoretical notions, instead of empirical data or observed practices. The three sampling scenarios are based on the number of newly observed codes that a sampled information source adds. This criterion is motivated by the premise of purposive sampling: based on the expected information, the researcher makes an informed decision about the next information source to be sampled at each sampling step. This informed decision implies that the researcher can thus reasonably foresee whether, and perhaps how many, new codes will be observed at the next sampling step. The fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is.

The three scenarios that I identify are “random chance,” “minimal information,” and “maximal information.”

I use simulations as they allow me to assess the effects of the three scenarios for a series of hypothetical populations that vary systematically regarding (1) the number of codes in the population and (2) the mean probability of observing codes. The controlled setting allows me to assess the relative influence of each of these factors on the reaching of theoretical saturation. In an empirical setting, this would not be possible, because the researcher can generally not control the characteristics of the population under study, because the number of populations that can be studied is limited and because it is never entirely certain whether theoretical saturation has been reached [

To keep the paper readable for audiences with either a quantitative or qualitative background, I minimize the mathematical details in the main text as much as possible. The full technical details of the simulation are in

I denote the number of sampling steps to reach theoretical saturation by _{s}, and the number of codes in the population as

Using the R-program [

For each hypothetical population, I simulate the number of sampling steps necessary to reach theoretical saturation under the three scenarios from section 2.5.

All three scenarios operate in a similar manner. After generating a population, an information source is selected:

If the source has not been selected before, it is added to the sample. After each sampling step, the model evaluates if theoretical saturation is reached. If so, the process stops and the number of sampling steps _{s} is reported. Otherwise, the next sampling step takes place and a new information source is selected from the population.

As there are multiple combinations of information sources that allow reaching theoretical saturation per population, I apply each of the three sampling scenarios to each population 500 times. This produces a distribution for each scenario with values of the number of sampling steps to reach theoretical saturation. From this distribution, I report the value that leads to theoretical saturation in 95% of the 500 simulations of a population as main outcome. The value of 95% is in line with statistical conventions, and makes my results more robust.

Finally, for each code in a population, I calculate the mean number of occurrences over the 500 simulations. From this set of numbers I again take the mean, which I denote by

^{th} percentile of the number of sampling steps required to reach theoretical saturation _{s}

Note that the y-axis is logarithmic. The solid black line indicates the calculated random chance’s value of

The blue dots represent random chance, the green diamonds represent minimal information, and the red triangles represent maximal information.

In line with the result above, the mean probability of observing codes has a greater influence than the number of codes on the mean number of observations per code in the random chance scenario. Second, the random chance scenario gives the largest number of repetitive codes (over 400) at a low mean probabilities of observing codes. This is explained by the fact that this scenario has the most sampling steps on average. However, for higher mean probabilities of observing codes, the random chance scenario yields about the same number of codes as minimal information, which is between 3 and 5. Finally, the maximal information scenario only yields between 1 and 3 observations per code. This low number of codes makes the use of repetitive codes for maximal information very limited.

Overall, the results show that there is a clear trade-off between the efficiency of the scenario and the number of repetitive codes. To increase the credibility of one’s research, it is possible to aim for a minimum number of observations of each code

The results for the purposive scenarios produced the same range of minimum sample sizes (below 50 information sources) as tentatively indicated in the literature. The simulations also uncovered mechanisms that give key insights into the estimation of the minimum size of a qualitative sample. The mean probability of observing codes is more important than the number of codes in the population for reaching theoretical saturation. Furthermore, when the probability of observing codes is low, the purposive scenarios are much more efficient than the random chance scenario. When this probability is high, the differences between scenarios are small. Finally, the more efficient a scenario is, the lower the mean number of observations per code, but only a few sampling steps are required to increase the minimum number of observations of all the codes.

This paper has two potential limitations that deserve discussion. First, critics could claim that the scenarios are mechanistic and do not represent real-world sampling procedures. I used ideal typical scenarios that capture the full range of possible empirical sampling procedures. Researchers who view their research through the lens of these scenarios are likely to observe that their sampling procedure shares characteristics with at least one of the three scenarios or that their sampling procedure is a mixture of two scenarios. Future researchers can also simulate other scenarios that they conceive and even include different sampling strategies in their simulations, like snowball sampling or sampling for maximal variation [

Second, I simulated a broad range of scenarios for the purpose of this paper, but other simulations are also possible. For example, I simulated only one population per combination of mean probabilities of observing codes and the number of codes. This lack of variation could cast doubt on the robustness of my results. However, there was a large variation among the 1100 populations, as the number of codes was not important for the minimum sample size and because the variance around the mean probability of observing codes was not important. By letting the mean probability of observing codes vary between 0.09 and 0.91, I only considered a range of probabilities that is realistic in an empirical setting. I also did not vary the population sizes. Instead, I chose a large number that produced conservative estimates of the minimal sample size. It would be empirically interesting to vary the sample sizes in the simulations. For computational reasons and to reduce the complexity of this paper, I left this challenge for future researchers. Finally, I did not simulate different minimum observations per code, as the formula based on random chance gave sufficient insights into this issue.

Based on these insights, I formulate a set of guidelines for sampling in qualitative research. I am aware that such guidelines are contested by many qualitative researchers, but Tracy [

The guidelines for sampling in qualitative research are as follows:

The basis for distinguishing sub-populations.

Whether the sources are interchangeable in a sub-population.

Whether the sub-populations serve a comparative purpose or are used for other means.

The process of data collection, sampling, and analysis per sub-population.

Other criteria that are deemed important by the researcher.

The more detailed the researcher’s description of the population and sub-populations, the better. This is especially true when the researcher aims to use a maximal information scenario. However, as the researcher usually keeps an eye open for new developments, the delineation of the population and sub-populations can be updated at each sampling step.

The complexity and scope of the research question.

The existing theory and information available about the sub-population.

Other possible factors that are deemed to be of influence.

Because the influence of the number of codes on theoretical saturation is small, it is more important to give an order of magnitude than an exact number. The estimation can be adapted after each sampling step.

The likelihood of an information source actually containing codes (is required information rare in the population and what are the chances of non-response?).

The willingness and ability of the source (or its authors) to let the code be uncovered (are there strategic interests?).

The probability that the researcher is able to observe the code (based on the researcher’s prior research experience and familiarity with the topic).

Other criteria that are deemed important by the researcher.

Random chance is only appropriate if after a substantial number of sampling steps, the researcher still has little or no idea about the characteristics of the sub-population and where codes can be found. In that case, random chance serves as a fallback scenario. If theoretical saturation is reached under random chance, then it is also reached in the other two scenarios. With conservative estimates of the mean probability of observing codes, the minimum sample size is over 4000 information sources, while for higher means, the minimum sample size rapidly drops to below 100 at probabilities of around 0.3 and below 50 at probabilities of 0.4.

Choosing a minimal information scenario requires some argumentation. Most important is that the researcher makes it plausible that a new code will be observed at each sampling step. This is something that the researcher will experience as the research progresses. If at a sampling step an information source does not yield any new codes, the researcher can opt for increasing the number of sampling steps by one. Usually there is little need to aim deliberately for multiple observations per code, because the scenario delivers sufficient repetitive codes. Under low estimates of the mean probability of observing codes, the minimum sample size for minimal information is around 50, while for higher means the minimum sample size is below 25.

The researcher can only choose maximum information when there is already a full overview of all the information sources in the (sub-)population and how information-rich these sources are (e.g. how many codes they contain). However, as maximum information makes very strong assumptions, the choice needs proper argumentation. The benefit of the maximum information scenario is that even under low estimates of the mean probability of observing codes, the minimum sample size is only 20 information sources. For higher means, the minimum sample size drops below 10. However, unless there is already strong theory present, I advise to aim for multiple observations of each code to guard against misinformation.

It is unlikely that a scenario will be followed exactly; rather, the researcher will notice that the sampling procedure falls somewhere in between the scenarios. As such, the researcher can argue which scenario the sampling procedure resembles most. The researcher can use the results from the simulations above to assess whether theoretical saturation is likely to have been reached.

Following these recommendations does not mean that overall quality of the research is good. The recommendations can only help to improve the sampling, which is but one aspect of the entire process. In addition, in many instances, codes are not yet fixed at the start of the research. Rather, they become more known as the research progresses. I suggest that researchers reevaluate their assessment during each sampling step.

Keeping the analyses in mind, I recommend that researchers should generally opt for a minimal information strategy, as it makes reasonable assumptions, it is efficient, and it yields sufficient codes. Whether saturation has been reached remains in the argumentative judgment of the researcher. These guidelines can aid the researcher in making this judgment and the readers in assessing it. Overall, the results and the guidelines offered in this paper can improve the quality and transparency of purposive sampling procedures. Therefore, I encourage fellow researchers to consider using these ideas and guidelines and to improve upon them where they see fit.

Mathematical details of the simulation.

(DOCX)

Code for the simulations in R.

(R)

The simulated data set used for this study.

(CSV)

The authors is grateful for feedback on this work by Allard van Mossel, Chris Eveleens, Marijn van Weele, Colette Bos and Maryse Chappin. An earlier version of this paper was presented at the Qualitative and Mixed Methods in Research Evaluation and Policy conference 2015 in London, and the 2016 Annual Meeting of the Academy of Management in Anaheim.