National Survey on Drug Use and Health (NSDUH)1 employs a  multistage (stratified cluster) sample design for the selection of a  representative sample from non-institutional members of United States  households aged twelve and older. The NSDUH public-use file (PUF) includes the  variance estimation variables (which were derived from the complex sample  designs2): variance estimation stratum (VESTR), variance estimation  cluster replicates (VEREP) and final analysis weight (ANALWT_C). VEREP is  nested within the VESTR. It is therefore considered that the complex survey  method for NSDUH PUF is a single-stage stratified clustering design, where the  clusters are sampled with replacement (WR). There are no missing values in the  variance estimation variables and final analysis weight, VESTR, VEREP and  ANALWT_C. However, analysis variables can have missing values. 
SUDAAN, all survey procedures in SAS, Stata, R and the survey add-on module  in SPSS can handle data from complex sampling designs. The WR design is the  default design, except in SPSS and Taylor series linearization is also the  default method for variance estimation of them. Note that users should  read the help document (of her/his respective statistical package) regarding  how missing values are being handled if any exist in the analysis variables.
SAMPLE SYNTAX
Using analysis weights is important to get the point estimates right. Users  must consider the weighting, clustering, and stratification of the survey  design to produce correct standard errors (and degrees of freedom). The example  code provided below shows how to specify these variables correctly, using an  individual year of the NSDUH PUF, and also indicates how to calculate the  proportions, standard errors (SE), and confidence intervals of the risk of  smoking one or more packs of cigarettes per day by gender. This statistical  analysis plan (SAP), in turn, results in two subpopulation analyses of  proportions for each level of gender. The dependent or outcome variable is the risk  of smoking one or more packs of cigarettes per day and is determined using the  categorical variable, RSKPKCIG. Gender  is determined using the categorical variable, IRSEX. Both of the variables in the  NSDUH PUF file are numeric in downloadable SAS, Stata, SPSS, and R specific  datasets. RSKPKCIG is coded numeric as 1 to 4 for no risk, slight risk,  moderate risk, and great risk for valid values and as system missing for  invalid values. IRSEX is imputation revised gender for missing values and is  coded numeric as 1 for male and 2 for female.
For analysis of the NSDUH PUF file, one should consider three important  things before preparing program code in  a statistical software package:
  - How to correctly  specify the variance estimation variables including analysis weights;
 
  - The statistical procedure along with requested  statistics; and
 
  - The domains of analysis, if any.
 
Each of these three considerations is discussed below. [Note the following  conventions for wording in program syntax: upper case codes are  statements/procedures, upper case italics are option keywords in software  packages, and upper case bolded codes are variables from the input dataset.]
  Specify variance  estimation variables. For variance estimation, each sample program code  uses the Taylor linearization method for this example SAP. The WR design method  is the default with the Taylor method for all but SPSS software packages. The  stratification and clustering of the complex sample design in the NSDUH PUF are  described by specifying the variance estimation variables (and also the analysis  weight) via the statements in the analysis procedure program code for SUDAAN  and SAS software packages. For example, "NEST VESTR VEREP /MISSUNIT; WEIGHT ANALWT_C;" in SUDAAN and  "STRATA VESTR; CLUSTER VEREP; WEIGHT ANALWT_C;"in SAS. Note that the order of the variables in the NEST statement is  important. One should use the above block of statements (specific to NSDUH  complex design) in any survey procedure program code for any SAP. For example,  PROC SURVEYLOGISTIC in SAS and PROC LOGISTIC in standalone SUDAAN.
  The above can be implemented by the SVYSET command in Stata as "SVYSET VEREP[pweight  =ANALWT_C], STRATA(VESTR) SINGLEUNIT(centered)".  SPSS requires an analysis plan design file for the data file to perform a  complex survey analysis. The first block of code in the SPSS program syntax,  below, for the CSPLAN ANALYSIS procedure will create such an analysis plan  file.
  See the "How  are complex sampling variances computed by Taylor linearization method?"  FAQ on the use of MISSUNIT option in SUDAAN, SINGLEUNIT(centered) option in Stata, and the NOMCAR procedure option in SAS.
 
  Statistical procedures and  requested statistics. In the SAP, RSKPKCIG is the analysis variable with four  valid and missing values and IRSEX will be used to define the domains or  subpopulations. [In general, the goal is to get results in proportions (not in  percentages) of a (outcome or dependent) variable for multiple subpopulations.] Stata's SVY: PROPORTION produces estimates of proportions,  along with SEs, for character or numeric variables. SAS's  SURVEYMEANS always analyzes character variables as categorical. So, RSKPKCIG  can be declared as a character variable by specifying it in both the CLASS and  VAR statements to obtain the estimates of the proportions, along with SEs. R’s  svyby procedure with the factor (variable) function and FUN=svymean argument  produces the mean, SEs and confidence intervals, in which the factor function converts  a variable to a set of dummy variables, while SUDAAN's  DESCRIPT procedure and SPSS's CSDESCRIPTIVES method compute  means, along with SEs, for only continuous variables. The mean estimate of a  0/1 coding dummy variable is essentially a proportion estimate. Therefore, the four  dummy variables (RSK1, RSK2, RSK3, and RSK4) that were created for the RSKPKCIG  variable also contain missing values. SUDAAN’s DESCRIP and SPSS’s  CSDESCRIPTIVES procedures can be used for our desired analysis by obtaining  means (i.e., proportions), along with SEs, for the four dummy variables of  RSKPKCIG. [Note that using SVY: MEAN in Stata and SURVEYMEANS (no CLASS  statement) in SAS, the same analysis of means can be obtained for a list of  indicator variables of a categorical variable.]
 
  Domains of Analysis. The CLASS  and TABLES statements in SUDAAN, OVER command in Stata, DOMAIN statement in SAS  and SUBPOP TABLE statement in SPSS specify the multiple subpopulations/domains  for a variable (e.g., two for IRSEX) to which analyses are to be performed. For  example, the following SAS program code with "DOMAIN IRSEX;"  statement will output the analysis results for the variables in VAR statement  for two domains; one for the male (IRSEX=1) population and the other for the  female (IRSEX=2) population.
  
SUDAAN standalone syntax:
  [The input data file that is used for this example  is in the SAS Export format. SUDAAN recommends that an input dataset is sorted  by the variables in the NEST statement; otherwise, NOTSORTED option  must be specified in the PROC statement.]
  PROC DESCRIPT filetype=SASXPORT data="path\nsduh2011.xpt" NOTSORTED;
    NEST  VESTR  VEREP  / MISSUNIT;
    WEIGHT  ANALWT_C;
    CLASS IRSEX;
    TABLES IRSEX;
    VAR  RSK1  RSK2  RSK3  RSK4;
    PRINT mean semean lowmean upmean 
    /style=nchs meanfmt=f6.3 semeanfmt=f6.3 lowmeanfmt=f7.3 upmeanfmt=f7.3;
Stata specific code for the same analysis:
  use drivename:\path\nsduh2011.dta
    svyset VEREP[pweight=ANALWT_C], strata(VESTR)  singleunit(centered)
    save drivename:\path\nsduh2011svy.dta, replace
  [It is a good practice to save the survey setting  permanently in the data file. This allows for this saved data to be used for  any subsequent survey analysis.]
  use drivename:\path\nsduh2011svy.dta
    SVY: PROP RSKPKCIG, OVER(IRSEX)
SAS code for  this analysis:
  PROC SURVEYMEANS data=sasdata NOMCAR mean clm;
    STRATA VESRT;
    CLUSTER VEREP;
    WEIGHT ANALWT_C;
    DOMAIN IRSEX / DFADJ;
    CLASS RSKPKCIG;
    VAR RSKPKCIG;
    run;
The SPSS specific  code for the same analysis:
  [The example code assumes that the user's SPSS  software package has the complex survey module installed. Select the first  block of code below, copy, and then paste it into the SPSS Syntax Editor.  Replace the ‘folder-path' by providing the path location where you would like  to save the nsduh11.csplan xml file. Select and run this  modified syntax code in the Syntax Editor. CSPLAN will write this Analysis  Design into the nsduh11.csplan xml file. The second block of  code shows how to reference the nsduh11.csplan xml file in the CSDESCRIPTIVES and other Complex Survey  procedures in the current and future SPSS sessions.]
  CSPLAN ANALYSIS
    /PLAN FILE='folder-path\nsduh11.csplan'
    /PLANVARS ANALYSISWEIGHT= ANALWT_C
    /DESIGN STRATA= VESTR CLUSTER= VEREP
     /ESTIMATOR TYPE = WR.
  get file='path\nsduh2011.sav'.
  CSDESCRIPTIVES
    /PLAN FILE= 'folder-path\nsduh11.csplan'
    /SUMMARY VARIABLES= RSK1  RSK2  RSK3  RSK4
    /SUBPOP TABLE= IRSEX DISPLAY=LAYERED
    /MEAN
    /STATISTICS se cin(95)
    /MISSING SCOPE=ANALYSIS CLASSMISSING=EXCLUDE.
R specific sample code for the same analysis:
  load("folder-path/nsduh2011.rda")
    keepvars =  c("VESTR", "VEREP",   "ANALWT_C", "IRSEX", "RSKPKCIG" )
    nsduh11 = nsduh2011[, keepvars]             #make a data file with fewer  variables
  library(prettyR)   # requires to install the prettyR package
    nsduh11$RSKPKCIG<-  as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", nsduh11$RSKPKCIG))
    nsduh11$IRSEX <-  as.numeric(sub("^\\(0*([0-9]+)\\).+$", "\\1", nsduh11$IRSEX))
  library(survey)  #needs to install the survey package
    options( survey.lonely.psu =  "adjust" )
    desg <- svydesign(id =  ~VEREP , strata = ~VESTR , weights = ~ANALWT_C , data = nsduh11 , nest = TRUE )
  # calculate the means or  proportions of RSKPKCIG by the levels of IRSEX and their SEs
    out = svyby(~factor(RSKPKCIG),  ~IRSEX, design = desg , FUN=svymean, vartype=c("se","ci"),  na.rm = TRUE) 
    coef(out)      #extracting the means
    SE(out)         #extracting the SEs of means
    confint(out)  # 95% confidence intervals of mean
    print(out)     #all results
Note that the variance estimation variables  (except the sampling weight variable) do not affect the mean, proportion,  percent, and other first-order moment statistics. For example, in SUDAAN syntax  code, the design=WR option and entire "nest VESRT VEREP;" statement  in PROC DESCRIPT have no impact on mean/proportion estimates. However, the  variance estimation variables must be used (e.g., design=option and nest  statement as above) to produce the SE estimates of descriptive (e.g., SE of  mean/proportion) and inferential statistics (e.g., confidence intervals of  mean/proportion and p-value of testing hypothesis).
    1Prior to 2002, data were collected under the old title - National Household Survey on Drug Abuse (NHSDA)
    2For further details on the sampling design and weighting adjustment method, please see the 2011 NSDUH Methodological Resource Book