Thursday, January 17, 2013

My results in R-DAS were blocked by the disclosure protection settings. How do I avoid having my output blocked?

Because of confidentiality concerns, we are unable to provide specific details about what is causing the disclosure protection settings to block output for a specific analytic run. However, we are able to provide solutions for several common reasons that analytic results are blocked.

When output is blocked, you may get one of these messages:

  • "The Row Total is equal to the value of one of the cells."
  • "To preserve confidentiality, tables cannot be displayed when the number of observations in any cell in the table is too low."

Definitions of the various blocked result messages are available in another FAQ.

Below are several examples of analytic requests where the results were blocked, and possible solutions for how to change your request to receive some analytic results.

Example 1: A user runs a crosstabulation where State is the column variable.

Possible solutions:

If interested in a single state, you might try placing the State variable in the Filter field to specify the one state for analysis. For example, entering STATE(1) in the filter field will give you results for just Alabama. Focusing your analysis on only one state might help you avoid a circumstance where a different state is causing your results to be blocked.

Another option would be to use a geographic variable like Census Region or Division in an attempt to avoid low record counts that can result in causing your results to be blocked.

Example 2: A user runs a crosstabulation where AGE is the column variable.

Possible Solutions:

The AGE variable spans an age range from 12 to 103 years old. You could try using one of the categorized age variables within the data file.

Alternatively, you could utilize the temporary recode feature in R-DAS that allows you to recode a variable into fewer categories.

Help documentation on doing temporary recodes can be found at: http://www.icpsr.umich.edu/icpsrweb/content/SAMHDA/help/helpan.htm#recode

Example 3: A user runs a three-way crosstabulation using the Row, Column, and Control fields. However, the results are blocked, and the user has no idea which variable or combination of variables contains the low record count.

Possible solutions:

Run frequencies for the variables in your analysis one at a time. One variable may stand out as having a value with a particularly low weighted frequency. It is possible that a variable has a value with such a small record count that the univariate frequency is blocked. If one variable does stand out as being the primary cause of the problem, then you could check to see if a similar variable exists with fewer categories, or you could do a temporary recode to create larger record counts.

If no single variable stands out as causing the problem, then try running crosstabs on two of your variables. If any cross combination of values from the two variables has a particularly low weighted frequency, then this can be an indicator that the combination is the cause of the problem. If one combination does stand out, you could find similar variables to the ones you chose, but have fewer categories. Again, you could do a temporary recode on one or more of your variables to create larger record counts for the categories/values of the two variables that are the possible cause of the problem.

My results in R-DAS were blocked by the disclosure protection settings. What do the various messages mean?

Below are descriptions of the most common messages that display when analytic results are blocked.

  1. The Row Total is equal to the value of one of the cells.

    This message refers to a built in disclosure limitation protection for specific crosstab output. In the following 5 X 3 crosstabulation example, the sum of the 4th row is equal to a single cell in that row. The whole table is suppressed when this happens.

    6 15 8
    9 17 8
    3 20 5
    0 5 0
    30 4 7

  2. To preserve confidentiality, tables cannot be displayed when the number of observations in any cell is too low.

    This error message states that at least one cell in the frequency of the table or crosstabulation does not meet the threshold established by CBHSQ/SAMHSA for protecting the confidentiality of respondents.

  3. To preserve confidentiality, analyses are not permitted to use the following variable(s): 'variable name'

    This message appears when one of the complex design variables (weight, strata, or cluster) is entered into one of the analysis fields (i.e. ROW). While the complex sampling design variables are used by the R-DAS system to calculate accurate statistics, the design variables are not available because of the potential disclosure risk involved.

What is the Data Portal?

The Data Portal provides secure remote access to confidential data from the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA).

CBHSQ confidential data can only be accessed remotely through the Data Portal using special software. This virtual computing environment has been designed to provide authorized researchers access to confidential data for approved research projects. The Data Portal can only be accessed from approved computer location(s) and IP address(es) at the researcher's organization. Users are required to maintain the confidentiality of the data in the Data Portal. Researchers cannot transfer data into or out of the Data Portal.

The goal of the Data Portal is to maximize the use of CBHSQ data for important research and policy analyses, while conforming to Federal law and protecting identifiable data from disclosure.

What is the process for Data Portal approval and access?

The application process is described in detail in section 3 of the Data Portal Confidentiality Procedures Manual. An abbreviated description of the application process follows.

For each research project, the organization(s) must complete the Application for Access. Completed applications are to be submitted to SAMHDA at dataportal@icpsr.umich.edu. (The application does not need to be signed and does not need to include CVs.)

Once a complete application is submitted to SAMHDA, the Center for Behavioral Health Statistics and Quality (CBHSQ) will review the contents of the application for completeness. CBHSQ will verify that only eligible individuals will have access to the data.

CBHSQ can only approve a limited number of applications. If more completed applications are received than Data Portal resources can support, additional criteria for evaluating the applications will be used. The primary criteria for selection are:

  • The behavioral health impact of the proposed project and its potential contribution and alignment with Department of Health & Human Services and SAMHSA missions,
  • How well the research is aligned with the purpose1 for which the data were collected, and
  • Whether the data requested is suitable for the proposed research project given data limitations (available sample size or survey content).

CBHSQ will also consider secondary evaluation criteria:

  • Available resources needed by CBHSQ to prepare the data file and the cost of site inspection.
  • The experience and capabilities of the research team.

Once the application has been approved, all individuals listed on the application must participate in confidentiality training. The project team will be notified about how this training will be conducted.

After the training is completed, the applicant submits the required paperwork:

  1. Confidential Data Use and Nondisclosure Agreement (CDUNA)
  2. Designation of Agent and Affidavit of Non-Disclosure Form
  3. Declaration of Nondisclosure (for federal employees only)

Approved applicants have six (6) months to complete the required confidentiality training and submit the required forms. Applications will be terminated for any applicant who fails to meet these requirements within six (6) months of application approval. Applicants with closed applications will need to reapply for Data Portal access during a future call for applications.

When the original signed CDUNA and affidavit(s) are received by CBHSQ and CBHSQ determines they are complete and final, the Principal Project Officer (PPO) and project team will be authorized to access the Data Portal. A copy of the signed and approved CDUNA will be sent to the PPO.

An email will be sent to each approved project team member listed on the application with information on how to access the custom dataset, which will contain only the variables that were requested and approved. Access to these data is allowed only for approved project members who have signed affidavits within the last year.

1The Data Portal provides access to Drug Abuse Warning Network (DAWN) and National Survey on Drug Use and Health (NSDUH) data sets. For descriptions of these data sets, see DAWN and NSDUH resources.

How can I get help with the Data Portal?

For questions and assistance with the Data Portal, please email dataportal@icpsr.umich.edu.

SAMHDA also operates a toll-free helpline (888-741-7242) Monday through Friday, 8:00 a.m. to 5:00 p.m. (EST). The local helpline number is (734) 615-9524. Staff try to respond to email and helpline questions within one business day. Answers to many questions can be found in the Data Portal Confidentiality Procedures Manual.