The problem of sample selection complicates the process of drawing inference about populations. Selective sampling arises in many real world situations when agents such as doctors and customs officials search for targets with high values of a characteristic. We propose a new method for estimating population characteristics from these types of selected samples. We develop a model that captures key features of the agent's sampling decision. We use a generalized method of moments with instrumental variables and maximum likelihood to estimate the population prevalence of the characteristic of interest and the agents’ accuracy in identifying targets. We apply this method to tuberculosis (TB), which is the leading infectious disease cause of death worldwide. We use a national database of TB test data from South Africa to examine testing for multi-drug resistant TB (MDR-TB). Approximately one-quarter of MDR-TB cases were undiagnosed between 2004-2010. The official estimate of 2.5% is therefore too low and MDR-TB prevalence is as high as 3.5%. Signal-to-noise ratios are estimated to be between 0.5 and 1. Our approach is widely applicable because of the availability of routinely collected data and abundance of potential instruments. Using routinely collected data to monitor population prevalence can guide evidence-based policy making.
An Econometric Method for Estimating Population Parameters from Non-Random Samples: An Application to Clinical Case Finding