^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{5}

The authors have declared that no competing interests exist.

Estimating the size of key risk populations is essential for determining the resources needed to implement effective public health intervention programs. Several standard methods for population size estimation exist, but the statistical and practical assumptions required for their use may not be met when applied to HIV risk groups. We apply three approaches to estimate the number of people who inject drugs (PWID) in the Kohtla-Järve region of Estonia using data from a respondent-driven sampling (RDS) study: the standard “multiplier” estimate gives 654 people (95% CI 509–804), the “successive sampling” method gives estimates between 600 and 2500 people, and a network-based estimate that uses the RDS recruitment chain gives between 700 and 2800 people. We critically assess the strengths and weaknesses of these statistical approaches for estimating the size of hidden or hard-to-reach HIV risk groups.

Estimating the size of key HIV risk populations is difficult because these groups may be hidden, hard to reach, or socially stigmatized. People who inject drugs (PWID) often suffer from high HIV infection, but because their drug use may be criminalized, PWID may not be willing to participate in a public health research study, or to report accurately about their risk behaviors. Understanding the course of the injection drug use epidemic and reducing HIV incidence in PWID depends on accurate estimation of the number of PWID for design and implementation of harm reduction and prevention programs that reach a substantial proportion of the PWID population. Estimating the number of PWID is essential in evaluating the coverage of these programs and estimating changes in population-level characteristics such as HIV prevalence and risk behaviors.

Traditional sampling methods like capture-recapture and the multiplier method require independent random samples from the target population, which are difficult to achieve when the group of interest is hidden [

A newer class of methods raises the possibility that researchers can use network-structural information obtained from respondent-driven sampling (RDS), a method for recruiting research subjects through their social contacts in the target population social network [

RDS is typically used to estimate the average value of traits or outcomes in the population, such as HIV prevalence. Some authors have used sample averages (e.g. sample HIV prevalence) from RDS surveys as inputs to the multiplier method for estimating total population sizes [

The first HIV case in Estonia was diagnosed in 1988 [

Estonian syringe exchange programs (SEPs) serving PWID were launched in 1997. Since 2007, each participating injection drug user has received an average of 117 sterile syringes per year [

In this paper, we evaluate three approaches to estimate the number of PWID in Kohtla-Järve region, Estonia using data from an RDS study of 600 PWID conducted in 2012. To estimate the size of this population, we employ three complementary statistical approaches, each relying on different assumptions. The first is the standard multiplier method where the number of PWID among antiretroviral treatment (ART) patients is divided by an estimate of the proportion of PWID who receive ART. The second approach is the SS method, which uses the ordered sequence of recruited subjects’ network degrees. The third method is the network-based method, which uses network-structural information from the RDS recruitment process. For this approach, we report point estimates under an idealized recruitment model and semi-parametric bounds that do not rely on strong assumptions about the recruitment process. Results from these three statistical approaches exhibit reasonable agreement. We discuss the significance of these findings in the context of the HIV epidemic in Estonia and assess the possibilities and limitations of using RDS data to estimate the size of hidden and hard-to-reach populations.

The 2012 RDS study of PWID in the Kohtla-Järve region (city of Kohtla-Järve and Jõhvi parish), Estonia, was carried out from May to July [

In this study,

The top left panel shows the recruitment tree of 600 recruited subjects originating from six seeds. The top right panel shows the number of recruited subjects daily; gaps correspond to weekends. The bottom left panel illustrates the cumulative number of recruitment at each day of the study. The bottom right panel shows a histogram of subjects’ reported degrees.

We first present the analysis from the multiplier method since it is a standard approach for population size estimation. The method requires two pieces of information: the number of people

We choose antiretroviral treatment (ART) as the trait and estimate the number of PWID among ART patients in Kohtla-Järve region. In Estonia, ART is available free of charge in five major hospitals [

While the exact number of PWID receiving ART in Ida-Viru central hospital is not readily available, we can derive an approximation based on simple random sampling to estimate it. In a recent study of ART adherence, 318 patients were recruited. Among these 318 patients, 57 to 58 subjects who were residents in Kohtla-Järve region reported receiving ART currently and injecting drugs during last 4 weeks. Not all subjects in the ART adherence study were living in the Kohtla-Järve region. If we include neighboring municipalities, the number of PWID in ART is 57. If we include all the municipalities in Ida-Viru county, the number of PWID in ART is 58. Because we have no reason to prefer one of these values over the other, we took the empirical average, and hence we use 57.5 for the calculation. Using the finite population correction for variance of the sample mean, we estimate the number of PWID among ART patients in Kohtla-Järve region as 109 people (95% CI 91–128).

Next, we use results from the RDS study to estimate the proportion of PWID who receive ART treatment. Among 600 participants, one patient filled out the questionnaire partially and 12 patients who reported being HIV+ had negative HIV test results; we assume that these people are not currently on ART in Kohtla-Järve region. Of the 600 subjects, 123 reported being currently on ART in Kohtla-Järve region. The raw proportion of PWID currently on ART in Kohtla-Järve region among 600 subjects is 20.5% (95% CI 17.27%–23.73%). The RDS Analyst software was also used to weight estimators according to their network degree, and the RDS sequential sampling estimate for the proportion is 16.66% (95% CI 12.74%–19.27%).

The estimated number of PWID is the ratio of the number of PWID receiving ART and the proportion of PWID on ART. We use the delta method to calculate the variance of the ratio and form a confidence interval. The total number of PWID in Kohtla-Järve region using the raw (unweighted) proportion is estimated to be 532 people (95% CI 413–654), and the estimated number of PWID using the weighted proportion is 654 people (95% CI 509–804).

The SS method estimates population size from a different perspective: it only requires the ordered sequence of network degrees and does not utilize information about network structure in the RDS recruitment chain [

We obtain posterior estimates under two degree conditions (imputed and raw degree) and two priors for population size (beta and uniform). Imputed degree substitutes the raw degree with the fitted degree by Conway-Maxwell-Poisson distribution. The beta prior models the proportion of sampled subjects among target population. We set the maximum possible number of population size as 1500 and 2500 in uniform prior. Posterior mean, 5% and 95% quantiles are reported.

Degree | Prior size | Posterior Mean | 95% Posterior Quantile | Implied Prevalence |
---|---|---|---|---|

Imputed | Beta | 801 | (621,1106) | 1.8% |

Imputed | Uniform[0,1500] | 1104 | (686,1463) | 2.5% |

Imputed | Uniform[0,2500] | 1546 | (739,2399) | 3.5% |

Raw | Beta | 918 | (600,2002) | 2.1% |

Raw | Uniform[0,1500] | 1107 | (600,1497) | 2.5% |

Raw | Uniform[0,2500] | 1320 | (600,2489) | 3.0% |

The SS method assumes that the average degree of recruited subjects decreases over the course of the study [

Social or drug use connections between PWID in Kohtla-Järve region form a network where the degree of each subject is the number of PWID they know. Under statistical assumptions about homogeneity of link probabilities in the population social network, Crawford et al. [

We employ a standard vague prior for the population size, ^{−1}. A Beta prior distribution

The first two columns are prior mean

Prior Parameter | Point Estimate | Bound Estimates | |||
---|---|---|---|---|---|

95% Posterior Quantile | Implied Prevalence | Posterior Quantile of Lower and Upper Bound | |||

0.00393 | 3 | 2202 | (1851, 2713) | 4.9% | (700, 2805) |

5 | 2218 | (1866, 2739) | 5.0% | (700, 2920) | |

0.01617 | 3 | 2089 | (1791, 2504) | 4.7% | (700, 2635) |

5 | 2027 | (1738, 2419) | 4.5% | (700, 2631) | |

0.02841 | 3 | 2016 | (1716, 2439) | 4.5% | (700, 2581) |

5 | 1937 | (1679, 2286) | 4.3% | (700, 2589) | |

0.04065 | 3 | 2048 | (1733, 2466) | 4.6% | (700, 2644) |

5 | 1911 | (1655, 2270) | 4.3% | (700, 2552) |

The dashed horizontal line is the minimum number of population size (600); this is a lower bound for the PWID population size. For the multiplier method, results from the raw and weighted proportion of traits are presented. For the network-based method, results from

In this paper, we compare the multiplier method, network-based method, and the SS method for estimating the size of PWID in the Kohtla-Järve region (city of Kohtla-Järve and Jõhvi parish), Estonia. The multiplier method is well-established in public health research, and has been used in published estimates of the number of PWID in different contexts and regions [

Our application of the multiplier method is subject to several limitations. First, patients may have received ART from other hospitals besides Ida-Viru central hospital, which could lead to under-estimation of the numerator

The network-based and SS methods rely on data gathered by RDS, which typically are used to estimate population-level characteristics of certain traits, such as HIV prevalence. However, Handcock et al. [

The SS method also has several limitations. First, it assumes that at each step in the recruitment process, the new recruit is drawn at random from all yet-unrecruited members of the target population, with probability proportional to their network degree. This assumption ignores the relationship between the underlying social network (on whose links the RDS recruitment process is supposed to operate) and the chain of recruitments. However, this sampling assumption may be warranted when recruitment happens independently of a network. For example, if new recruits are chosen by recruiters according to their “popularity” in the target population, and reported network degree is a surrogate measure of popularity, the assumption may hold. Second, the assumption that subjects with higher degree are more likely to be sampled (i.e. receiving a coupon and successfully redeeming it) earlier in RDS may not hold in practice. Third, like the network-based method, the SS estimate is sensitive to user-specified prior information about population size and the population degree distribution. In particular, estimates from the SS method appear to be sensitive to the maximum

The network-based method employs assumptions about the recruitment process and the population social network to estimate the hidden population size. The point estimates reported in

In this paper, we have used data from a large RDS study to estimate the number of PWID in the Kohtla-Järve region using the multiplier method, the network-based method, and the SS method. The credibility of the assumptions underlying each estimate must be critically assessed by researchers who use RDS to estimate population size. We are hopeful that these techniques, along with future refinements, can assist public health researchers and policymakers in obtaining information vital in determining the proper scale of effective education, treatment, and intervention campaigns for PWID and other risk populations.

Evidence supports the effectiveness of harm reduction programs such as syringe exchange programs in preventing the transmission of HIV among PWID, but PWID may not benefit when programs are not scaled to the size of the at-risk population [

Data description, statistical details, and comparison to the SS method.

(DOCX)

Illustration of

(TIF)

Linear regression on time-ordered network degree.

(TIF)

Subject ID, recruiter ID, date of recruitment, network degree, number of coupons, and ART status for each subject.

(CSV)

FWC was supported by NICHD DP2 OD022614, NIH/NCATS grant KL2 TR000140, NIMH grant P30MH062294, the Yale Center for Clinical Investigation, and the Yale Center for Interdisciplinary Research on AIDS. Estonian Ministry of Education and Research under grant #TARTH15017I. The RDS study was funded by NIH/NIDA grant 1R01DA029888 to Heimer/Uuskula (Co-PIs).