Tuesday, March 25, 2014

How do I combine NSDUH public-use file (PUF) data for analysis?

Because of the 2002 National Survey of Drug Use and Health (NSDUH) methodology changes, the 2002 data constitute a new baseline for tracking trends in substance use and other measures. As noted in the 2002 to 2013 codebooks, it is not considered appropriate to make comparisons of the 2002 to 2013 estimates with 2001 NSDUH and earlier NHSDA (National Household Survey and Drug Use) estimates to assess trends in substance use. Though 1999 through 2004 data are part of the same sample design, beginning with the 2002 survey, respondents were given a $30 inceptive payment for participation, which increased response rates for several consecutive surveys.

Statistical disclosure limitation methods were implemented on the original data file in such a way that the NSDUH PUF continues to be representative of civilian members of the noninstitutionalized population in the United States. Disclosure limitation methods include micro agglomeration, optimal probabilistic substitution, optimal probabilistic subsampling, and optimal sampling weight calibration. Further variance estimation variables (VESTR and VEREP) were treated by coarsening, substitution, and scrambling. For the purpose of variance calculation, the sample design for NSDUH PUFs is a stratified single-stage cluster sample design with replacement sampling.

The 2002 through 2004 NSDUH PUFs are part of one sample design while the 2005 through 2013 PUFs are part of another sample design. There were 50% overlapping samples for adjacent survey years for the 2005 through 2013 surveys. VESTR (variance estimation stratum) is coded from 20001 to 20060 for years 2002 through 2004 in the NSDUH PUF datasets, and from 30001 to 30060 for years 2005 through 2013. VEREP (variance estimation cluster replicates) is coded as 1 and 2. The degrees of freedom (df) are 60 for national estimates of each individual survey1. When combining any years of data from 2005 through 2013, the df remains the same as it were for a single year (e.g., 60 for national estimates) since sampling of these years are part of the same sample design. This combined data can be used to obtain the standard error (SE) of estimates for individual years and/or SE of difference estimates (e.g., contrast of means) for the purposes of comparison between adjacent years. The df of 60 also remains the same when combining any years of data from 2002 through 2004, but when combining years of data from two different sample designs from 2002 through 2013 (or, at least one year data from 2002 through 2004 and at least one from 2005 through 2013), the df will be 120 (e.g., sum of the df for two different sample designs). For individual year [inferential] estimates using such a combined file containing data from multiple years with different sample designs, users must specify the customizable option for the degrees of freedom to override the default. Alternatively, users can subset data for a year within a procedure/method run using an appropriate statement so that complex design is retained for the desired analysis. When comparing estimates in two domains with different df (e.g., equality of the proportions of past month alcohol use for two individual survey years having different sample designs) in combined data, err on the conservative side and use the smaller degrees of freedom (see page A-2 in 2012 NSDUH Statistical Inference Report). Note that the covariance estimate between the estimates (e.g., proportions) in such comparison is zero because of two distinct designs.

Analysts can receive all of the ratio type estimates (including their standard errors, confidence intervals, and p-values etc.) from an analysis run of combine data. Note that sums/totals in cells and/or margins of an output from such a run should not always be the intended estimates. If the analyst is interested in an annual estimate of a population total in addition to ratio type estimates, the weight should be divided by the number of years that were pooled. Users should be careful in reporting and interpreting the results while using survey year variable in an analysis for pooled data with adjusted weight.



1 See Appendix A in 2012 NSDUH Statistical Inference Report.

No comments:

Post a Comment