Wednesday, April 10, 2013

How do I perform a homogeneity test of proportions or percentages in the R-DAS?

The R-DAS does not have the Comparison of Means analysis available. The Frequencies/Crosstabulation program has an option of Summary Statistics to perform a test of independence (or no association) between two categorical variables using the Rao-Scott F statistics. These statistics take the complex design effect into account. The test of independence of a two-way contingency table is equivalent to the test of homogeneity of row (or column) percents (StataCorp 2011, page: 141-142). The null hypothesis for the later test is that row (or column) percents are equal for every category of the column (or row) variable.

For example, if your variable of interest is levels of alcohol consumption in the Column field, you can use the row percentage option and the resulting table output to approximately determine homogeneity (or the lack of homogeneity) of row (ethnic) groups among the levels of alcohol consumption. In other words, this test determines whether the distributions of each of the ethnic groups (among the alcohol levels) are equal.

The Rao-Scott F statistics are calculated from the contingency table for Row by Column variables. The test is significant at x% level of significance if the p-value of the Rao-Scott F statistic is less than x%. And, overall, the test concludes that there is association (or dependence) between Row and Column variables. The first screenshot shows RACE4 x ALCREC (recoded) table output for Total percent with the Summary Statistics box checked. From this table display of cell percent (i.e., total percent), confidence intervals and weighted cell frequencies, it is difficult to compare the prevalence of alcohol in different racial groups. In order to interpret the table output with regard to the test of homogeneity, we have to look at the table display in a better way. The second screenshot shows the contingency table output for Row percent. A larger percent (i.e., 63.7%) of whites have consumed alcohol within the last 30 days than blacks, other, or Hispanics. Since we only changed the way the percentages are displayed, the Rao-Scott F statistic is identical for both screenshots.

Reference:

StataCorp. 2011. Stata Survey Data Reference Manual, Release 12. Statistical Software. College Station, TX: StataCorp LP.

Monday, April 8, 2013

Is there a way to compare multiple means using the MEANS analytic option in SDA?

Yes. There are ways to compare multiple means (k-1 comparisons) using the Comparison of Means Program in SDA. Note that if a dependent variable is coded as 0/1, then the mean of the dependent variable is essentially the proportion.

In the Means Program, there are different dropdown options when selecting from the main statistic to display box. The default setting is to display the Means of the dependent variable against the required Row variable categories. Row variable categories define domains for subpopulation analysis. The selection of differences from Row category option allows you to choose a base category in the If differences from a row or column, indicate base category box. When you run the analysis for this selection, the result produced in each of the other row cells is the difference between that cell's mean and the base Row category cell mean. The above selection along with the selection of the z/t-statistic and p-value options produce the (k-1) comparisons of means or proportions and associated statistics from a single comparison of means run. This is a comparison of domain means for an outcome variable where domains are defined by a Row-only variable categories.

NOTES:

  1. In comparison of means testing, there are k(k-1)/2 differences of means or proportions being compared from k domains (or subpopulations or subgroups). In a single table run, the Means program enables us to simultaneously test the differences of (k-1) pair of means from a base category mean of a k category classification variable. So you will have k(k-1)/2 distinct comparison tests after choosing a different base category from separate (k-1) Means program runs. For a Row variable with k=4 levels (say, A, B, C and D), you will obtain 6 tests of difference of means (i.e., B-A, C-A, D-A for base reference A, C-B, D-B for base reference B and D-C for base reference C) from 3 (=4-1) separate MEANS program run when A, B, and C are used as base categories respectively.
  2. The computing in the R-DAS analysis system is equivalent to the Frequencies/Crosstabulation program module in SDA. It is not possible to perform a (k-1) means or proportions comparison using the R-DAS or SDA Frequencies/Crosstabulation program, while it is possible to perform a test of homogeneity of row (or column) percents for a two-way table. There is a separate FAQ on How do I perform a homogeneity of proportions or percentages using the Frequencies/Crosstab program in R-DAS?

Additional information on using the "Main statistic to display" option in the MEANS program can be found here:  http://www.icpsr.umich.edu/SDAHELP/helpan.htm#mstats