Wednesday, February 6, 2013

How do I find the number of cases with any mention of a specific drug in DAWN?

The Drug Abuse Warning Network (DAWN) public-use data file includes information on one or more substances contributing to an Emergency Departments (ED) visit. In other words, some cases report only a single substance and other cases report multiple substances (i.e. cocaine, simvastatin, and Benadryl all present in the same ED visit). Beginning with the 2009 DAWN public-use file, information is included for up to 22 drugs reported in the ED visit; an increase from the 16 drug mentions available in previous years of DAWN. Within the DAWN data, there is currently no automated way to search for a specific drug name across all drug mentions (up to 22) for each case in order to produce the total number of ED cases involving a particular substance. For instance, a particular case reports only a single substance (i.e. codeine) that is provided in variable, DRUGID_1, with 11 for codeine’s code. Then the remaining variables, DRUGID_2 to DRUGID_22, will have missing codes. Suppose another case reports four substances (i.e. ibuprofen, simvastatin, codeine, and warfarin). These substances are provided in the first four variables, DRUGID_1 to DRUGID_4, with 14 in DRUGID_1 for ibuprofen, with 468 in DRUGID_2 for simvastatin, with 11 in DRUDID_3 for codeine, with 21 in DRUGID_4 for warfarin, and the remaining DRUGID variables having missing codes.

Determining the number of DAWN ED cases that involve a particular drug is important for many types of analyses and reports.

Some information may be obtained via the Excel files available on the SAMHSA website. These tables provide weighted national estimates for a particular drug or category. Tables that provide weighted estimates for some metropolitan areas are also available.

For users looking for a specific drug that is not included in the SAMHSA tables or for those interested in more detailed statistical analysis, we provide the following programming code to create a new variable that specifically answers the question, "Is Drug _______ present at the time of ED visit, Yes/No?". This FAQ provides sample code for the following five formats: online analysis system (SDA), SAS, SPSS, Stata, and R. Some knowledge of either SDA, SAS, SPSS, Stata, or R is required.

NOTE: The sample SDA, SAS, SPSS, STATA, and R code below use the DRUGID variables and the DRUGID value/code that corresponds with codeine in the 2009 DAWN for illustration purposes. When creating your own "Is substance _____ present in the persons system" variable, you will want to find and replace information via the following steps to obtain the drug information you are specifically seeking.

  1. Determine the year of DAWN data for which you wish to create the customized variable.
  2. Decide which substance name/categorization variable is best suited for the information you are seeking. Select one category from DRUGID, SDLED_1, SDLED_2, SDLED_3, SDLED_4, SDLED_5, or SDLED_6.
  3. In the PDF or HTML codebook, find the numeric value that corresponds to the drug name/category you are investigating.
  4. Decide on a name for your newly computed variable (it should not be a variable name that already exists in the original DAWN data file).
  5. Replace the variable name, the drug ID/category code, and the final new variable name information in the sample code with the information you identified in the previous steps.

Example 1: SDA

The SDA system can be used to obtain the number of cases in which a particular drug was reported. The first step is to use the PDF codebook appendix to look up the drug id number for the substance you are interested in. For this example we have selected codeine (drug id number = 12) as the drug of interest.

In SDA, select the "Compute a new variable" option from the "Create Variables" dropdown. See Exhibit 1, below.

Exhibit 1

Once you are in the "Compute a new variable" function, the field "Expression to Define the New Variable" is shown. See Exhibit 2, below.

Exhibit 2

The code to enter into the "Expression to Define the New Variable" field for this example is (it assumes codeine):

IF (DRUGID_1 eq 11 OR DRUGID_2 eq 11 OR DRUGID_3 eq 11 OR DRUGID_4 eq 11 OR DRUGID_5 eq 11 OR DRUGID_6 eq 11 OR DRUGID_7 eq 11 OR DRUGID_8 eq 11 OR DRUGID_9 eq 11 OR DRUGID_10 eq 11 OR DRUGID_11 eq 11 OR DRUGID_12 eq 11 OR DRUGID_13 eq 11 OR DRUGID_14 eq 11 OR DRUGID_15 eq 11 OR DRUGID_16 eq 11 OR DRUGID_17 eq 11 OR DRUGID_18 eq 11 OR DRUGID_19 eq 11 OR DRUGID_20 eq 11 OR DRUGID_21 eq 11 OR DRUGID_22 eq 11)
CODEINE = 1
ELSE
CODEINE = 0

It is very important to specify "Yes" for "Include missing-data values in the computation?". Otherwise, nearly all cases would be missing for the new computed variable due to the high number of missing cases that exist by the time you get to DRUGID_22.

In the SDA Compute Program, specifying a variable label, value labels, and descriptive text (question text) is optional. These options are useful to add if you intend to paste SDA output results into a document (note: all tables and graphs produced by SDA can be copied and pasted into Excel or Word).

As with all other functions in SDA, nearly every field and option has help documentation that can be selected by clicking on the field of interest.

Example 2: SAS

data work.tmp;
set 'drivename:\filepath\filename';

Codeine=0;
if (drugid_1= 11 or drugid_2= 11 or drugid_3= 11 or drugid_4= 11 or drugid_5= 11
or drugid_6= 11 or drugid_7= 11 or drugid_8= 11 or drugid_9= 11 or drugid_10= 11
or drugid_11= 11 or drugid_12= 11 or drugid_13= 11 or drugid_14= 11 or drugid_15= 11
or drugid_16= 11 or drugid_17= 11 or drugid_18= 11 or drugid_19= 11 or drugid_20= 11
or drugid_21= 11 or drugid_22= 11) then Codeine=1;
run;

Example 3: SPSS

compute Codeine=0.
if (drugid_1= 11 or drugid_2= 11  or drugid_3= 11 or drugid_4= 11 or drugid_5= 11 or drugid_6= 11
or drugid_7= 11 or drugid_8= 11 or drugid_9= 11 or drugid_10= 11  or drugid_11= 11  or drugid_12= 11
or drugid_13= 11 or drugid_14= 11 or drugid_15= 11 or drugid_16= 11 or drugid_17= 11 or drugid_18= 11
or drugid_19= 11 or drugid_20= 11 or drugid_21= 11 or drugid_22= 11)   Codeine=1.

Example 4: Stata

*drop Codeine
gen Codeine=0
replace  Codeine=1 if drugid_1== 11 | drugid_2== 11 | drugid_3== 11 | drugid_4== 11 | drugid_5== 11 | drugid_6== 11  | drugid_7== 11 | drugid_8== 11 | drugid_9== 11 | drugid_10== 11 | drugid_11== 11 | drugid_12== 11 | drugid_13== 11  | drugid_14== 11 | drugid_15== 11 | drugid_16== 11 | drugid_17== 11 | drugid_18== 11 | drugid_19== 11 | drugid_20 == 11 | drugid_21== 11 | drugid_22== 11

Example 5: R

# get R format downloaded DAWN 2009 PUF file into R console

load("d:/~/Desktop/ICPSR_31921/DS0001/31921-0001-Data.rda")
# da31921.001.rda is created (by default) from above load command
dawn9 = da31921.0001  # make a copy of data file with a shorter name
rm(da31921.0001)    # removing the duplicate data

codeine01 = as.numeric(as.numeric(dawn9$DRUGID_1)==11)
codeine01[is.na(codeine01)] <- 0  # replacing NA by 0
codeine02 = as.numeric(as.numeric(dawn9$DRUGID_2)==11)
codeine02[is.na(codeine02)] <- 0
codeine03 = as.numeric(as.numeric(dawn9$DRUGID_3)==11)
codeine03[is.na(codeine03)] <- 0
codeine04 = as.numeric(as.numeric(dawn9$DRUGID_4)==11)
codeine04[is.na(codeine04)] <- 0
codeine05 = as.numeric(as.numeric(dawn9$DRUGID_5)==11)
codeine05[is.na(codeine05)] <- 0
codeine06 = as.numeric(as.numeric(dawn9$DRUGID_6)==11)
codeine06[is.na(codeine06)] <- 0
codeine07 = as.numeric(as.numeric(dawn9$DRUGID_7)==11)
codeine07[is.na(codeine07)] <- 0
codeine08 = as.numeric(as.numeric(dawn9$DRUGID_8)==11)
codeine08[is.na(codeine08)] <- 0
codeine09 = as.numeric(as.numeric(dawn9$DRUGID_9)==11)
codeine09[is.na(codeine09)] <- 0
codeine10 = as.numeric(as.numeric(dawn9$DRUGID_10)==11)
codeine10[is.na(codeine10)] <- 0
codeine11 = as.numeric(as.numeric(dawn9$DRUGID_11)==11)
codeine11[is.na(codeine11)] <- 0
codeine12 = as.numeric(as.numeric(dawn9$DRUGID_12)==11)
codeine12[is.na(codeine12)] <- 0
codeine13 = as.numeric(as.numeric(dawn9$DRUGID_13)==11)
codeine13[is.na(codeine13)] <- 0
codeine14 = as.numeric(as.numeric(dawn9$DRUGID_14)==11)
codeine14[is.na(codeine14)] <- 0
codeine15 = as.numeric(as.numeric(dawn9$DRUGID_15)==11)
codeine15[is.na(codeine15)] <- 0
codeine16 = as.numeric(as.numeric(dawn9$DRUGID_16)==11)
codeine16[is.na(codeine16)] <- 0
codeine17 = as.numeric(as.numeric(dawn9$DRUGID_17)==11)
codeine17[is.na(codeine17)] <- 0
codeine18 = as.numeric(as.numeric(dawn9$DRUGID_18)==11)
codeine18[is.na(codeine18)] <- 0
codeine19 = as.numeric(as.numeric(dawn9$DRUGID_19)==11)
codeine19[is.na(codeine19)] <- 0
codeine20 = as.numeric(as.numeric(dawn9$DRUGID_20)==11)
codeine20[is.na(codeine20)] <- 0
codeine21 = as.numeric(as.numeric(dawn9$DRUGID_21)==11)
codeine21[is.na(codeine21)] <- 0
codeine22 = as.numeric(as.numeric(dawn9$DRUGID_22)==11)
codeine22[is.na(codeine22)] <- 0

codeine = codeine01 + codeine02 + codeine03 + codeine04 + codeine05 + codeine06 + codeine07 + codeine08 + codeine09 + codeine10 + codeine11 + codeine12 + codeine13 + codeine14 + codeine15 + codeine16 + codeine17 + codeine18 + codeine19 + codeine20 + codeine21 + codeine22

rm(codeine01, codeine02, codeine03, codeine04, codeine05, codeine06, codeine07, codeine08, codeine09, codeine10, codeine11, codeine12, codeine13, codeine14, codeine15, codeine16, codeine17, codeine18, codeine19, codeine20, codeine21, codeine22)

table(codeine)  # to display the counts

How can I transfer output from SDA and R-DAS to a document, spreadsheet, or presentation?

You can copy and paste output, including tables and charts, from SDA and R-DAS into a document, spreadsheet, or presentation. To copy and paste, use your mouse to highlight the output and then click "copy". When pasting the output into a document, use the "paste special" option to retain the same display as in SDA and R-DAS. To transfer data only, paste the output using the standard paste option.

If you have a PDF file creator or print driver, you can also print the output to a PDF file.

Thursday, January 17, 2013

My results in R-DAS were blocked by the disclosure protection settings. How do I avoid having my output blocked?

Because of confidentiality concerns, we are unable to provide specific details about what is causing the disclosure protection settings to block output for a specific analytic run. However, we are able to provide solutions for several common reasons that analytic results are blocked.

When output is blocked, you may get one of these messages:

  • "The Row Total is equal to the value of one of the cells."
  • "To preserve confidentiality, tables cannot be displayed when the number of observations in any cell in the table is too low."

Definitions of the various blocked result messages are available in another FAQ.

Below are several examples of analytic requests where the results were blocked, and possible solutions for how to change your request to receive some analytic results.

Example 1: A user runs a crosstabulation where State is the column variable.

Possible solutions:

If interested in a single state, you might try placing the State variable in the Filter field to specify the one state for analysis. For example, entering STATE(1) in the filter field will give you results for just Alabama. Focusing your analysis on only one state might help you avoid a circumstance where a different state is causing your results to be blocked.

Another option would be to use a geographic variable like Census Region or Division in an attempt to avoid low record counts that can result in causing your results to be blocked.

Example 2: A user runs a crosstabulation where AGE is the column variable.

Possible Solutions:

The AGE variable spans an age range from 12 to 103 years old. You could try using one of the categorized age variables within the data file.

Alternatively, you could utilize the temporary recode feature in R-DAS that allows you to recode a variable into fewer categories.

Help documentation on doing temporary recodes can be found at: http://www.icpsr.umich.edu/icpsrweb/content/SAMHDA/help/helpan.htm#recode

Example 3: A user runs a three-way crosstabulation using the Row, Column, and Control fields. However, the results are blocked, and the user has no idea which variable or combination of variables contains the low record count.

Possible solutions:

Run frequencies for the variables in your analysis one at a time. One variable may stand out as having a value with a particularly low weighted frequency. It is possible that a variable has a value with such a small record count that the univariate frequency is blocked. If one variable does stand out as being the primary cause of the problem, then you could check to see if a similar variable exists with fewer categories, or you could do a temporary recode to create larger record counts.

If no single variable stands out as causing the problem, then try running crosstabs on two of your variables. If any cross combination of values from the two variables has a particularly low weighted frequency, then this can be an indicator that the combination is the cause of the problem. If one combination does stand out, you could find similar variables to the ones you chose, but have fewer categories. Again, you could do a temporary recode on one or more of your variables to create larger record counts for the categories/values of the two variables that are the possible cause of the problem.

My results in R-DAS were blocked by the disclosure protection settings. What do the various messages mean?

Below are descriptions of the most common messages that display when analytic results are blocked.

  1. The Row Total is equal to the value of one of the cells.

    This message refers to a built in disclosure limitation protection for specific crosstab output. In the following 5 X 3 crosstabulation example, the sum of the 4th row is equal to a single cell in that row. The whole table is suppressed when this happens.

    6 15 8
    9 17 8
    3 20 5
    0 5 0
    30 4 7

  2. To preserve confidentiality, tables cannot be displayed when the number of observations in any cell is too low.

    This error message states that at least one cell in the frequency of the table or crosstabulation does not meet the threshold established by CBHSQ/SAMHSA for protecting the confidentiality of respondents.

  3. To preserve confidentiality, analyses are not permitted to use the following variable(s): 'variable name'

    This message appears when one of the complex design variables (weight, strata, or cluster) is entered into one of the analysis fields (i.e. ROW). While the complex sampling design variables are used by the R-DAS system to calculate accurate statistics, the design variables are not available because of the potential disclosure risk involved.

What is the Data Portal?

The Data Portal provides secure remote access to confidential data from the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA).

CBHSQ confidential data can only be accessed remotely through the Data Portal using special software. This virtual computing environment has been designed to provide authorized researchers access to confidential data for approved research projects. The Data Portal can only be accessed from approved computer location(s) and IP address(es) at the researcher's organization. Users are required to maintain the confidentiality of the data in the Data Portal. Researchers cannot transfer data into or out of the Data Portal.

The goal of the Data Portal is to maximize the use of CBHSQ data for important research and policy analyses, while conforming to Federal law and protecting identifiable data from disclosure.

What is the process for Data Portal approval and access?

The application process is described in detail in section 3 of the Data Portal Confidentiality Procedures Manual. An abbreviated description of the application process follows.

For each research project, the organization(s) must complete the Application for Access. Completed applications are to be submitted to SAMHDA at dataportal@icpsr.umich.edu. (The application does not need to be signed and does not need to include CVs.)

Once a complete application is submitted to SAMHDA, the Center for Behavioral Health Statistics and Quality (CBHSQ) will review the contents of the application for completeness. CBHSQ will verify that only eligible individuals will have access to the data.

CBHSQ can only approve a limited number of applications. If more completed applications are received than Data Portal resources can support, additional criteria for evaluating the applications will be used. The primary criteria for selection are:

  • The behavioral health impact of the proposed project and its potential contribution and alignment with Department of Health & Human Services and SAMHSA missions,
  • How well the research is aligned with the purpose1 for which the data were collected, and
  • Whether the data requested is suitable for the proposed research project given data limitations (available sample size or survey content).

CBHSQ will also consider secondary evaluation criteria:

  • Available resources needed by CBHSQ to prepare the data file and the cost of site inspection.
  • The experience and capabilities of the research team.

Once the application has been approved, all individuals listed on the application must participate in confidentiality training. The project team will be notified about how this training will be conducted.

After the training is completed, the applicant submits the required paperwork:

  1. Confidential Data Use and Nondisclosure Agreement (CDUNA)
  2. Designation of Agent and Affidavit of Non-Disclosure Form
  3. Declaration of Nondisclosure (for federal employees only)

Approved applicants have six (6) months to complete the required confidentiality training and submit the required forms. Applications will be terminated for any applicant who fails to meet these requirements within six (6) months of application approval. Applicants with closed applications will need to reapply for Data Portal access during a future call for applications.

When the original signed CDUNA and affidavit(s) are received by CBHSQ and CBHSQ determines they are complete and final, the Principal Project Officer (PPO) and project team will be authorized to access the Data Portal. A copy of the signed and approved CDUNA will be sent to the PPO.

An email will be sent to each approved project team member listed on the application with information on how to access the custom dataset, which will contain only the variables that were requested and approved. Access to these data is allowed only for approved project members who have signed affidavits within the last year.

1The Data Portal provides access to Drug Abuse Warning Network (DAWN) and National Survey on Drug Use and Health (NSDUH) data sets. For descriptions of these data sets, see DAWN and NSDUH resources.

How can I get help with the Data Portal?

For questions and assistance with the Data Portal, please email dataportal@icpsr.umich.edu.

SAMHDA also operates a toll-free helpline (888-741-7242) Monday through Friday, 8:00 a.m. to 5:00 p.m. (EST). The local helpline number is (734) 615-9524. Staff try to respond to email and helpline questions within one business day. Answers to many questions can be found in the Data Portal Confidentiality Procedures Manual.