Thursday, March 12, 2015

How do I calculate variance of difference between totals for NSDUH data?

Before reading this document, users should review the "How do I calculate variance of totals for NSDUH data?" FAQ.

Sample syntax to calculate the SE of contrast between totals for controlled Domains of YEAR

The following programming code shows how to compute the SEs of the differences of two totals and the p-values for the tests of differences of two totals when total estimates are considered to be from controlled domains. In this example, we  will  test the differences between totals of DEPNDALC (alcohol dependence) for three levels of YEAR (2006 versus 2005 and 2007 versus 2005) using a combined file containing 2005-2007 data.

For controlled domains: SE of the contrast between totals1 (i.e., t2-t1) is:

SE(t2-t1) = sqrt(V(t1)+V(t2)-2*COV(t1,t2)),
where,
V(t1) = wsum1*wsum1*V(mean1) and
COV(t1,t2) = wsum1*wsum2*COV(mean1,mean2)

The covariance, COV(t1,t2), between two totals of t1 and t2 will be computed from the estimated covariance, COV(mean1,mean2), between the means or proportions. Note that these total estimates may take the form of total estimates from: (a) two different survey years (e.g., 2005 and 2006) within a combined file; (b) total estimates from sets of combined survey data (e.g., 2005-2006 annual totals and 2007-2008 annual totals) in a combined file where, for instance, ANALWT_C2 is an adjusted weight calculated by multiplying ANALWT_C by 0.5, or (c) total estimates for populations of interest within a single survey year. The following code is based on the data file of condition (a).

  • SUDAAN in conjunction with SAS code:

    proc descript data=work.nsduh5to7  design=WR  notsorted nomarg;
    nest vestr verep; weight analwt_c;
    var DEPNDALC;
    class YEAR/ nofreqs;
    diffvar YEAR=(6 5)/name="contrast Year6 vs 5";
    diffvar YEAR=(7 5)/name="contrast Year7 vs 5";
    output wsum total/replace filetype=sas filename=work.diff_total wsumfmt=f15.3 totalfmt=f15.4;
    run;

  • Previous sample syntax demonstrated how to calculate variance of the total for a controlled domain. The following pieces of code show how to compute the covariance between two totals. This is accomplished after extracting the covariance between means (from SUDAAN output work.cov1 in (1)) of respective domains.

    proc descript  data=work.nsduh5to7  design=WR  notsorted NOMARG;
    nest vestr verep; weight analwt_c;
    var DEPNDALC;
    class YEAR / nofreqs;
    tables YEAR;
    output / meancov=default filetype=sas filename=work.cov1 replace; * --------- (1);
    output wsum semean total /replace  filetype=sas filename=work.semean_domains
    wsumfmt=f15.3 semeanfmt=f12.10 totalfmt=f15.4;
    run;

    The following data step subsets the rows (i.e., rows when IDNUM=2) and columns (i.e., 3 B's and ROWNUM variables) to extract the variance-covariance matrix. As a result of the NOMARG option, there were 3 B's variables in work.cov101 because YEAR in the Tables statement has 3 levels.

    data work.cov1 (keep=rownum b01--b03);
    set work.cov101; /* by default adds 01 for meancov= option in (1) */
    if idnum = 2;   
    attrib _all_ label ='';
    run;

    The next data step will extract the covariance estimates between the first level and a successive level of YEAR. These estimates correspond to variable B01 for rownum=2 onwards.

    data work.covar (keep=covmean);
    set work.cov1;
    if rownum ne 1 then do; covmean=b01; end;
    if rownum ne 1 then output;
    run;

    data work.yr1(rename=(wsum=wsum1 total=total1) keep=YEAR wsum vmean1 total)
    work.yr2(rename=(wsum=wsum2 total=total2) keep=YEAR wsum vmean2 total)
    work.yr3(rename=(wsum=wsum2 total=total2 vmean3=vmean2) keep=YEAR wsum vmean3 total);

    set work.semean_domains;
    vmean1 = semean**2;
    vmean2 = semean**2;
    vmean3 = semean**2;
    if YEAR = 5 then output work.yr1;
    if YEAR = 6 then output work.yr2;
    if YEAR = 7 then output work.yr3;
    attrib _all_ label ='';
    run;

    data work.t12;
    merge work.yr1 work.yr2;
    data work.t13;
    merge work.yr1 work.yr3;
    data work.variances;
    merge work.t12 work.t13;
    by YEAR;
    run;

    In work.diff_total dataset, wsum is the sum of weights from two domains involved in the contrast and total is the contrast estimates between two respective totals.

    data work.contrast_totals(rename=(wsum=cntrwsum total=cntrtotal));
    length Cntr_Year $ 8;
    merge work.diff_total(keep=wsum total) work.variances work.covar;
    if year = 6 then do;
    Cntr_year = ‘y6 vs y5';
    secntrtotal = sqrt(wsum1**2*vmean1+wsum2**2*vmean2-2*wsum1*wsum2*covmean);
    end;
    if year = 7 then do;
    Cntr_year = ‘y7 vs y5';
    secntrtotal = sqrt(wsum1**2*vmean1+wsum2**2*vmean2/*-2*wsum1*wsum2*covmean*/);*?(2);
    end;
    if secntrtotal gt 0.0 then do;
    t_val = total/secntrtotal;
    p_value = 2*(1-probt(abs(t_val),60)); *df=120-60=60; *p-value computed for 2-sided test;
    end;
    run;

    Note that we deleted the covariance term from equation (2) above for nonconsecutive years (see footnote -20 on page-13 in the 2011 NSDUH Sample Design Report.)

      proc print data=work.contrast_totals noobs;
      var Cntr_Year cntrwsum cntrtotal secntrtotal t_val p_value;
      run;

    • Stata code for the same analysis:

    This assumes some knowledge of vector-matrix computation for the syntax.

    svy: mean DEPNDALC, over (YEAR)
    mat cov = e(V)
    svy: total DEPNDALC, over (YEAR)
    # extracts (numerator weighted) totals of DEPNDALC for each domain
    mat totals = e(b)
    # extracts denominator weighted sizes
    mat wsum = e(_N_subp)
    mat vmean = (cov[1,1],cov[2,2],cov[3,3])

    # compute contrast between totals and sum of the weights from two domains in contrast
    mat cntrtotal = (totals[1,2]-totals[1,1], totals[1,3]-totals[1,1])
    mat cntrwsum = (wsum[1,2]+wsum[1,1], wsum [1,3]+wsum [1,1])

    # compute the variances and covariances of totals
    mat vt1 = (wsum[1,1]^2*cov[1,1])
    mat vt2 = (wsum[1,2]^2*cov[2,2])
    mat vt3 = (wsum[1,3]^2*cov[3,3])
    mat cvt12 = (wsum[1,1]*wsum[1,2]*cov[1,2])
    mat cvt13 = (wsum[1,1]*wsum[1,3]*cov[1,3])

    # covariance-term from variance-formula of second contrast is omitted because of nonadjacent years
    mat secntrtotal = (sqrt(vt1[1,1]+vt2[1,1]-2.0*cvt12[1,1]), sqrt(vt1[1,1]+vt3[1,1]/*-2.0*cvt13[1,1]*/))
    mat t_val = (cntrtotal[1,1]/secntrtotal[1,1], cntrtotal[1,2]/secntrtotal[1,2])
    # p value for two-sided test (df=60=120-60)
    mat p_val = (2*ttail(60,t_val[1,1]), 2*ttail(60,t_val[1,2]))
    mat result = (cntrwsum', cntrtotal', secntrtotal', t_val', p_val')
    mat colnames result = CombWgtSize contrTotal SEcontrTotal t_stat pvalue
    mat rownames result = alcohol:Y6_vs_5 alcohol:Y7_vs_5
    # command below displays the pooled weighted size of two domains, difference between two totals, SE of difference, Student's t statistic, and the p-value of the test
    mat list result

    R does not currently compute the covariance matrix for the Taylor linearization method.



    1 See Sections 5 and 7 in 2012 NSDUH Statistical Inference Report.