Friday, November 16, 2012

I need help logging into MyData Account and changing my email address or password.

Logging into Your MyData Account

The only time the SAMHDA website requires users to log in using a MyData account is for access to the R-DAS (?). When prompted to log in to the R-DAS on the Log into SAMHDA page, select the "Login via MyData" option to log in using your MyData account.

Changing Your Email Address

Once you've logged in to the R-DAS, the upper right corner of the SAMHDA website will display a link with your name and a logout option. Clicking on your name will take you to the "Edit Settings" page. Select the "Change your password" link to change your password.

Recovering/Resetting Your Password

To recover your MyData password, select the "Request your password to be sent to you via email" link on the "Log into SAMHDA" page and follow the instructions on the next page. After you receive the email, we recommend that you change the password.

Locked Out?

For assistance with MyData, please contact ICPSR User Support at netmail@icpsr.umich.edu.

What is MyData?

MyData is one login option for access to the R-DAS. It is a user registration and authentication system for archives within ICPSR, like SAMHDA. The system creates a more secure environment for users and enables them to sign up for ICPSR services. MyData has the following user-friendly features:

  • Uses an email address as a login ID
  • Requires a password to authenticate to the website
  • Allows users to create an account and set preferences
  • Allows registered users to reset a forgotten password

Do I have to establish a MyData account?

Users who wish to analyze restricted-use data available in the R-DAS may either establish a MyData account or log in using their Facebook, Google, or LinkedIn account. When a user creates a new MyData account, s/he will be asked to complete a brief questionnaire about user preferences.

Friday, November 9, 2012

How do I produce correct estimates for NSDUH: 4- and 2-Year R-DAS data files?

To generate correct estimates for the NSDUH: 4- and 2-Year R-DAS data files, users must use the year pair indicator variable (YRPRIND) in one of the following ways:

  1. As a filter to subset the data file for a specific group of years such as:
    - For the 2-Year R-DAS data file use "YRPRIND(1)" for 2002-2003, "YRPRIND(2)" for 2004-2005, "YRPRIND(3)" for 2006-2007, "YRPRIND(4)" for 2008-2009, "YRPRIND(8)" for 2010-2011 or "YRPRIND(10)" for 2012-2013 in the filter field.
    - For the 4-Year R-DAS data file use "YRPRIND(5)" for 2002-2005 or "YRPRIND(6)" for 2006-2009 in the filter field.
  2. As a control variable. PLEASE NOTE: This option will produce separate results for each combined group of years (e.g., 2002-2005). The last table with total estimates is invalid and should be ignored.
  3. As a row variable in the row field. PLEASE NOTE: Under this option the column total estimates are invalid and should be ignored.
  4. As a column in the column field. PLEASE NOTE: Under this option the row total estimates are invalid and should be ignored.

The NSDUH R-DAS data files do not allow for the creation of single-year estimates because of the potential for disclosure of confidential information. The NSDUH R-DAS data files contain revised weights designed to produce estimates that are representative of the average population across combined two-year (2002-2003, 2004-2005, 2006-2007, 2008-2009, 2010-2011, and 2012-2013), four-year (2002-2005 and 2006-2009), eight-year (2002-2009), or ten-year (2002-2011) periods. There are no weights on the 4-year R-DAS that are representative of all eight years; thus all estimates of totals across the period 2002-2009 will be twice as large as they should be. Similarly, there are no weights on the 2-year R-DAS that are representative of all twelve years. To get the correct estimates of totals across the eight year period, the 8-year R-DAS file should be used. Similarly, the 10-year R-DAS file should be used to get the correct estimates of totals across the ten year period.

Monday, October 15, 2012

How can I access the MTF series? I can no longer find MTF on the SAMHDA site.

The MTF series has been transferred from SAMHDA to the National Addiction & HIV Data Archive Program (NAHDAP) Archive. All MTF data and documentation files can be accessed through the NAHDAP and ICPSR General Archive websites. We apologize for any inconvenience this transition may have caused.

Monday, September 24, 2012

What is the Restricted-use Data Analysis System (R-DAS)?

The R-DAS is an online analysis system that allows researchers to produce frequencies and cross-tabulations using restricted-use data files. The R-DAS provides output in the form of tables and frequencies for viewing and exporting. Advanced statistical methods are not available in the R-DAS at this time.

The R-DAS does not permit listing of individual cases and does not provide unweighted frequencies in the R-DAS codebook, nor are users able to generate unweighted frequencies (no unweighted sample sizes are provided). These limitations have been put in place to reduce the potential for disclosing confidential information.

The R-DAS provides standard errors that take into account the complex survey design. All weighted totals and point estimates are rounded to the nearest thousand and all percents and associated statistics are rounded to one decimal point. If any cell in a table contains too few unweighted cases, then the entire table is suppressed.

The R-DAS does not currently allow for the creation of composite variables (i.e., the creation of new variables using other variables), but that capability is under development. The R-DAS does allow for temporary recoding of existing continuous and categorical variables. See the SDA 3.5 help documentation for assistance with how to Temporarily Recode a Variable.

R-DAS webinar

Watch the "Broadening Access to Substance Abuse and Mental Health Data with the R-DAS" webinar to learn about the National Survey on Drug Use and Health (NSDUH) data available through the R-DAS.

For more information on analyzing data with the R-DAS, consult the FAQ section on Help with the Restricted-use Data Analysis System (R-DAS).

Tuesday, September 18, 2012

How do I correctly calculate N-SSATS client counts and admissions using SDA?

The N-SSATS is a database of facilities, not clients or admissions. Thus, calculating client counts / admissions using one of the client counts / admissions variables might produce unexpected results. This note describes how to correctly calculate client counts and admissions for N-SSATS data.

The client counts and admissions variables are listed in the following sections of the codebook:

  • Section B: Reporting Client Counts
    • Hospital Inpatient Client Counts
    • Residential (Non-Hospital) Client Counts
    • Outpatient Client Counts
    • Totals and Percentages
      • (Excluding the variables that give percentages)
  • Section 7: Added variables

To correctly calculate client counts or admissions, do the following:

  1. Access N-SSATS within SDA.
  2. Go to the "Analysis" menu (top left) and select "Comparison of means".
  3. For the "Dependent" variable, specify one of the client counts or admissions variables. For example, specify T_METH for the total number of methadone clients on the reference date.
  4. Specify an appropriate "Row" variable, such as STFIPS for the state.
  5. Specify the "Column", "Control", and "Selection Filter(s)" variables as needed. For example, under "Selection Filter(s)", you might state "YEAR(2010), OTP(1)" to restrict analysis to the year 2010 and to facilities that operated a certified Opioid Treatment Program.
  6. Set the "Main statistic to display" to "Totals (numerator of means)". This is a key step.
  7. Under "Table Options" uncheck all the checkboxes.
  8. Make any other adjustments to the display options that are necessary. For example, if you don't want a chart, change "Type of Chart" to "(No Chart)".
  9. Click "Run the Table". The output table from this example gives the total number of methadone clients on the reference date by state in 2010 in facilities that operated a certified Opioid Treatment Program.

Wednesday, February 15, 2012

Are earlier years of data available for the TEDS-Discharges series?

The earliest year available for TEDS-D is the 2006 file. The Substance Abuse and Mental Health Services Administration (SAMHSA) has no plans for making earlier years of TEDS-D available at this time.

When will the next year of data be available for NSDUH?

The NSDUH public-use file (PUF) is typically released for the preceding year by the end of November or beginning of December.

The NSDUH Restricted-use Data Analysis (R-DAS) file is typically available one to two months following the release of the PUF.

Who is the sponsor for NSDUH?

The Center for Behavioral Health Statistics and Quality (CBHSQ, formerly the Office of Applied Studies), Substance Abuse and Mental Health Services Administration (SAMHSA), Department of Health and Human Services (HHS).

How do I get the TEDS-A Concatenated file?

The TEDS-A concatenated data file is now available for download. This public-use file is provided in an ASCII rectangular format with SPSS and SAS data definition statements. A SPSS System and ASCII tab-delimited files are also available. Please note that because of the size of the data file normally available Stata files and the SAS Transport s file were unable to be produced.

If you are unable to download the file in one of the available formats, SAMHDA can provide you with the file. You can obtain the entire data set or select a subset of cases and/or variables. We can make the files available to you through a special Web link where you can access them directly. This is the quickest and easiest way for you to get the data you request. However, we can send you a CD of the files as well.

Please email us the following information at samhda-support@icpsr.umich.edu

  1. Data File Delimiter (choose one of the following):

    None -- the preferred delimiter for analyzing the data using SAS, SPSS, or Stata.
    Blank -- one of two delimiters for using Excel, Access, or other similar software package.
    Comma -- another delimiter for using Excel, Access, or other similar software package.

  2. Type of setup file(s) (choose one or more of the following):

    SAS
    SPSS
    Stata
    DDI (XML)
    SDA (DDL)

  3. Cases: Please indicate whether you desire all cases or a subset of cases. One common example of a subset would be to request only those cases for a given state. You may also request more than one filter to define a subset (e.g., males in the 12-14 year old age group). Please refer to the TEDS codebook when considering filters. Please note that some variables, such as age, are categorized and therefore only allow for certain specific age ranges.

  4. Variables: Please indicate whether you desire all variables or a subset of variables. If you request a subset of variables, please refer to the TEDS codebook when choosing variables for the subset. We prefer that you make your requests using Variable Groups (e.g., CLIENT CHARACTERISTICS, SUBSTANCES OF ABUSE: ORIGINAL VARIABLES, etc.).

As soon as we get this information from you, we will begin to process your request. We will email you instructions on how to retrieve the file from the Web site.

Who is the sponsor for TEDS?

The Center for Behavioral Health Statistics and Quality (CBHSQ, formerly the Office of Applied Studies), Substance Abuse and Mental Health Services Administration (SAMHSA), Department of Health and Human Services (HHS).

When will the next year of data be available for MTF?

The MTF data files are released after the annual report is published, which is at the end of October in the year following the data collection. For example, the 2009 data files were released in October 2010.

Important note about using the RECODE or COMPUTE online analysis features for TEDS-A, Concatenated

This FAQ applies to data users who are creating new variables using the RECODE or COMPUTE functions in the SAMHDA online analysis system (SDA). Due to the large number of cases involved in the TEDS-A Concatenated data set, it may take up to 30 minutes for SDA to complete a RECODE or COMPUTE command. The system will use a new variable (created using RECODE or COMPUTE functions) in an analysis even before the variable is completely created; thus, users are requested to pay close attention to analytic results that use a newly created variable to make sure that enough time has elapsed and the new variable has been completely created. It is recommended that users wait for approximately 30 minutes before using newly created variable in analyses.

When running either RECODE or COMPUTE, you may encounter one of the following: (1) the screen turns blank and the status indicator in your browser says "Done" or (2) the server "times out." If these problems occur, SDA will still continue to process your request, although it may take 30 minutes or more to complete.

Is the longitudinal panel data available for Monitoring the Future?

The information below comes directly from Monitoring the Future. Please refer to their Web site for more information.

  1. A subset of high school seniors are selected each year for follow-up, which is conducted in an alternating biennial fashion, with the first half of the subset receiving their first follow-up questionnaire one year after high school, and the second half receiving their follow-up two years after high school. They receive a series of six questionnaires within this arrangement, so the second half of the subset is 12 years past high school when they receive their last young adult "FU-12" questionnaire. Then, the follow-up procedure changes to 5-year intervals to cover middle adulthood.
  2. The questionnaires in the young adult follow-ups are directly comparable to the base year questionnaires, both in content and in numbers of questionnaire forms. The core drug use questions are included along with the same types of related attitude and behavioral items, many of which are unique to each form, so respondents receive the same questionnaire form throughout the base year and young adult follow-up series.
  3. All data for a particular individual are linked (or, in the case of form-specific items, capable of being linked) in the panel dataset. The sheer amount of information greatly increases the risk of breaching confidentiality. Thus, based on policies approved by our funding source and IRB, the panel data set cannot be made available to the public in totality and without modification.
  4. Special data requests can be made through the Web site email address. Once we get a request, information about policies and procedures is sent out. Requests are considered on a case-by-case basis, and may be fulfilled - at requestor's cost - typically by providing data analytic access.

Additional information about the design of the panel component of the design and procedures used in the study are included in our annual NIDA report, Volume II, and in more detail in the MTF "Occasional Papers." See, for example, "The Aims and Objectives of the Monitoring the Future Study and Progress Toward Fulfilling Them as of 2006" (pdf).

To make a request for this data and for further information, please contact MTF staff at: MTFinfo@isr.umich.edu

Is state-level data available for NSDUH?

Public-use NSDUH files do not include state-level identifiers. The Substance Abuse Mental Health Services Administration (SAMHSA) does not publicly release certain data, including state-level identifiers. Federal law requires SAMHSA to protect the confidentiality of individual respondents. For this reason, state codes are not placed on NSDUH public-use files. SAMHSA does release tables at the state and sub-state levels. A limited number of estimates are also available for the 20 largest metropolitan areas.

SAMHSA has made access to restricted-use NSDUH data available through the Restricted-use Data Analysis System (R-DAS). Through the R-DAS, researchers can now work with state-level variables to produce frequencies and crosstabulations using the restricted-use NSDUH data files. Learn more about the R-DAS.

Finally, the Data Portal provides researchers with off-site access to individual record level data, including state and substate variables. For access to confidential CBHSQ data through the Data Portal, researchers are required to submit a research proposal for approval and sign a contract (data use agreement). For approved projects, a researcher downloads software onto their computer that allows them to access servers where the confidential data reside. The researcher can then log into the server remotely and open a virtual desktop to access the confidential data using SAS, SPSS, or Stata.

Friday, January 20, 2012

What is a codebook?

A codebook provides information on the structure, contents, and layout of a data file. Users are strongly encouraged to look at the codebook of a study before downloading the data file(s).

While codebooks vary widely in quality and amount of information given, a typical codebook includes:

  • Column locations and widths for each variable
  • Definitions of different record types
  • Response codes for each variable
  • Codes used to indicate nonresponse and missing data
  • Exact questions and skip patterns used in a survey
  • Other indications of the content and characteristics of each variable

Additionally, codebooks may also contain:

  • Frequencies of response
  • Survey objectives
  • Concept definitions
  • A description of the survey design and methodology
  • A copy of the survey questionnaire
  • Information on data collection, data processing, and data quality

The following example illustrates the main components of a typical SAMHDA codebook using the National Survey on Drug Use and Health (NSDUH), 2012 (ICPSR 34933):

Title Page
Bibliographic Citation
Data Collection Description
Codebook Body

The body of a codebook describes the content of the data file. The following elements are generally included for each variable in the data file:

  • Variable Name: Indicates the variable number or name assigned to each variable in the data collection.

  • Variable Column Location: Indicates the starting location and width of a variable. If the variable is a multiple-response type, the width referenced is that of a single response.

  • Variable Label: Indicates an abbreviated variable description (maximum of 40 characters) that can be used to identify the variable. In some cases, an expanded version of the variable name can be found in a variable description list.

  • Missing Data Code: Indicates the values and labels of missing data. If 9 is a missing value, then the codebook could note 9=Missing Data. Other examples of missing data labels include REFUSED, DON’T KNOW, BLANK (NO ANSWER), and LEGITIMATE SKIP. Some analysis software packages require that certain types of data that the user desires to be excluded from analysis be designated as "MISSING DATA," (i.e.., inappropriate, unascertained, unascertainable, or ambiguous data categories). Although these codes are defined as missing data categories, this does not mean that the user should not or could not use them, if so desired.

  • Code Value: Indicates the code values occurring in the data for a variable.

  • Value Label: Indicates the textual definitions of the codes. Abbreviations commonly used in the code definitions are "DK" (Do Not Know), "NA" (Not Ascertained), and "INAP" (Inapplicable).

What is the preferred submission format for data files?

ASCII, SAS Transport, SPSS Portable, or Stata data files are accepted, with the non-ASCII files preferred.

Once I've deposited my data, I do not want changes made to my data collection without my permission. Will you contact me prior to altering the files?

For most data collections, the archive distributes data and documentation in essentially the same form in which they were received. When appropriate, documentation is converted to Portable Document Format (PDF), data files are converted to non-platform-specific formats, and variables are recoded to ensure respondents' anonymity. Our staff will generally contact you regarding any suggested changes after an initial assessment of your data collection. Regardless of changes made, the archive keeps several copies of all files in the form in which they were submitted.

Does your archive keep the original version of the files that I submit for archiving?

Yes. The archive keeps several copies of the original files as submitted by the data depositor, including copies stored offsite. If you are in need of these files, please contact deposit@icpsr.umich.edu.

I'm using SPSS on a Mac. I can't read ASCII data files using the SPSS setup file I downloaded. What do I do?

SAMHDA recommends using SPSS system files (.SAV), which are available for download, when using SPSS on a Mac.

I receive a "file not found" error when trying to use a setup file. What is wrong?

When an application generates a "file not found" error, it is sometimes caused by an incomplete filename being used in the setup statements; the specification for the filename may be missing the filename extension. This is often due to Windows not displaying filename extensions in either Windows Explorer or My Computer and subsequent dialog boxes. Adding the correct extension to the file specification in the setup statements should correct the problem.

To ensure that you are always presented with complete filenames, the default Windows filename display option for folders should be changed to show file extensions. Please reference the help documentation for your version of Windows to change the default filename display.

Where do I get information about the names and meanings of variables?

All variable information may be found in the study's codebook(s). For survey data, the codebook usually includes the variable question text.

I downloaded data from your site, but my statistics application issues an error message that the file cannot be found.

Since the data files on SAMHDA are zipped with WinZip, they must be decompressed before they can be used with statistics applications. If the downloaded file is represented with a vice-grips folder icon, the computer has WinZip installed. In this case, extract the files from the file with WinZip. If the file is represented with a folder icon that has a zipper on the left-hand side of the folder, WinZip is not installed on your computer. In this case, files can be extracted by moving the nested folders and/or files to a decompressed folder. Please see the FAQ on decompressing downloaded files for further information on this topic.

If you chose to save the downloaded zipped files to a removable media such as a CD or external hard drive, be aware that the compression issue may still need to be addressed.

The statistical application still does not see the data file. What else could it be?

Some statistical applications have a limit on the number of characters that can be used to specify a file location. The default folder hierarchy in which SAMHDA distributes its files comprises at least three levels. If this folder hierarchy is extracted to a folder that is already nested within several other folders, the length of the resulting drive, folder hierarchy, and filename specification could exceed that which is usable by the statistics application. In cases such as this, SAMHDA recommends that the lowest folder in the hierarchy be moved or copied to a location as high in the folder hierarchy as possible and that this new location be specified in the setup files.

What are RSS Feeds?

Real Simple Syndication (RSS) feeds are updated content sent to your computer via the Internet. RSS feeds enable a user to view content from multiple websites on one screen, with all site-specific navigation, advertisements, and branding removed.

For example, a user might visit multiple news sites in any given week to keep up-to-date on recent events. Without RSS, the user must go to each news site individually and find the desired information through each site's own navigation. With RSS, the user subscribes to each site through the browser's built-in subscription functions, or via a standalone news aggregator (also called a news reader) that may be downloaded and installed. The user can then view the aggregated information through a consistent display that automatically sorts the information according to his or her needs.

When the user opens his or her browser and goes to the RSS view, it automatically downloads the latest information from the subscribed sites. Most RSS browsers also let you search the articles (essentially enabling the user to search across multiple sites), sort by a variety of fields (such as title, date, or author), filter/subset by date or source, and customize the amount of information that displays on screen.

RSS also presents an alternative to receiving notifications via email. Unlike email, RSS feeds don't get filled up with unwanted solicitations.

RSS readers come with the following Web browsers automatically:

  • Internet Explorer (Windows)
  • Firefox (Windows/Mac)
  • Safari (Windows/Mac)

In addition, some readers are available as desktop software that you can download and install on your computer. The reader you choose will have instructions for how to subscribe to RSS feeds.

Can I do Boolean searching on this website?

True Boolean searching is not available. Multiple terms entered into the search box are treated as if they were connected by a Boolean "and" operator. Boolean operators ("and," "or," and "not") are ignored.

The Search Results page indicates that my search is "sorted by relevance." What does that mean?

The search engine determines how closely search results match your search terms. By default, the most relevant studies will appear at the top of the list of search results. Depending on the search you are using (Data, Search/Compare Variables, or Reports & Publications), search results can be sorted by study, variable, and bibliography relevance. For more information on SAMHDA's search options, see the FAQ "How do I search SAMHDA's holdings for datasets or particular variables?"

The SAMHDA search engine also enables sorting of search results in several other ways:

  • Released/Updated: For studies, this date refers to the date on which SAMHDA first released (or updated) the data. Reports & Publications search results may be sorted by publication (pub.) date.

  • Title A-Z: Search results may be sorted alphabetically by study title (in the Data search) and author (in the Reports & Publications search). Variable search results may be sorted alphabetically by variable label, study title, and series title.

  • Time Period: Search results may be sorted by year of the study. The sort references the last year in the time period (when the time period is a range or group of ranges). Variable search results may be sorted by both newest and oldest time periods.

  • Most Cited in the ICPSR Bibliography (Bib.): This is the count of citations that the ICPSR search engine found that relate to the study.

  • Most Downloaded: This is an indication of unique downloads over the last six months for users logged into SAMHDA or ICPSR using a MyData, Google, Facebook, or LinkedIn account.

  • Variable Relevance: This option is available with the Data search. The sort is based on how well the study-level metadata matches the search query, as well as how well the variable-level metadata matches the query terms. When sorted by variable relevance, links to matching variables are displayed for each study in the search results.

  • Recent Additions: This option is available with the Reports & Publications search. Results are ordered by the date the citation was added to the SAMHDA bibliography.

What is a SAS CPORT file? How do I use it?

The CIMPORT procedure imports a CPORT transport file that was created (exported) by the CPORT procedure in SAS. PROC CIMPORT will extract SAS datasets and catalogs from the .STC file. CPORT files also include SAS libraries that can be used to apply value labels.

SAS 9.2 and later versions operating in a Windows environment support the opening of .STC files from either Windows Explorer or My Computer. Earlier versions of either SAS or another OS will require the submission of edited versions of the following statements to the SAS processor. Libraries other than the WORK library can be specified if they are already defined.

proc cport
   file = '<drive:><\PATH>FILENAME.stc'
   library = WORK
   disk
;
run ;

How do I read data into R?

There is no such thing as an R system file similar to a Stata .dta or an SPSS .sav file. Instead, R reads data from a variety of formats “including files created in other statistical packages” directly into working memory. R generally lacks intuitive commands for data management, so users typically prefer to clean and prepare data with SAS, Stata, or SPSS. Once the data are ready, several functions are available for getting the data into R.

Reading Data Files in SPSS, Stata, and SAS formats

The foreign package can be used to read data stored as SPSS .sav files, Stata .dta files, or SAS XPORT libraries. If foreign is not already installed on your local computer, go to the Packages menu and choose Install package(s).

If prompted, choose the closest CRAN mirror. When the Packages dialog box appears, scroll down to choose foreign and then click OK.

To use the commands in foreign one must first attach the library using the library function. At the prompt, type

> library(foreign)

As an example of reading data from other formats, assume that there is an SPSS file called survey.sav saved in the directory C:\mydata. The read.spss function from the foreign library will read the file into R.

> dataSPSS<-read.spss("C:/mydata/survey.sav", to.data.frame=TRUE)

This creates a data object called dataSPSS that is ready for analysis. The to.data.frame argument, whose default value is FALSE, tells R to treat the object as a data frame. Note that when specifying the pathname, R understands forward slashes whereas Windows reads backward slashes. If it is necessary to read in several data files from the same directory, the amount of typing can be reduced by first setting the working directory and then using the relative pathname. For example,

> setwd("C:/mydata")
> dataSPSS<-read.spss("survey.sav", to.data.frame=TRUE)

Alternatively, if one prefers to search for the location of a data file, one can type

> dataSPSS<-(file.choose(), to.data.frame=TRUE)

This will open a dialog box that can be used to navigate to the appropriate folder.

R will assume that any value labels recorded in the SPSS file refer to factors (categorical variables) and will store the labels rather than the original number. For example, a variable named gender may be coded 0=male and 1=female, and the labels are saved in the .sav file. When R reads in the data from SPSS, the values of the variable will be "male" and "female" rather than "0" and "1". This is the default behavior, but it can be changed in the call to the read.spss function:

> dataSPSS<-read.spss(file.choose(), use.value.labels=FALSE)

Reading Stata files is equally straightforward using the read.dta function. Assuming there is a Stata data file survey.dta in the C:\mydata folder, the appropriate syntax is

> dataStata<-read.dta("C:/mydata/survey.dta")
or
> dataStata<-read.dta(file.choose())

The created object is automatically a data frame. The default is to convert value labels into factor levels ("male" and "female" rather than "0" and "1"), but this can be turned off.

> dataStata<read.dta(file.choose(), convert.factors=FALSE)

Note that Stata sometimes changes how it stores data files from one version to the next, and the foreign package may lag a little behind. If the read.dta command returns an error, try saving the data in Stata using the .saveold command. This will create a .dta file saved in a previous version of Stata that read.dta may be more likely to recognize.

R can also read SAS XPORT libraries. The function takes only a single argument, the pathname:

> dataXPORT<-read.xport("C:/mydata/survey")

The function returns a data frame if there is a single dataset in the library or a list of data frames if there are multiple datasets.

Reading in ASCII files

R can also easily read in space-, tab-, and comma-delimited text files. The read.table function handles the first two cases; read.csv handles the other. Say there is an ASCII data file survey.dat in which white space separates the values for each variable. The following syntax reads in this data.

> dataTEXT<-read.table("C:/mydata/survey.dat", header=TRUE, sep= " ")

The header argument tells R that the first row includes variable names. Its default is FALSE. The sep argument specifies that values are separated by any white space, which is the default. If the values are separated by tabs, the value of the sep argument is changed to

> dataTAB<-read.table("C:/mydata/survey.dat", header=TRUE, sep= "\t")

The read.csv command is available for reading data files with comma-separated values.

> dataCOM<-read.csv("C:/mydata/survey.csv", header=TRUE)

The following are also equivalent:

> setwd("C:/mydata") > dataCOM<-read.csv("survey.csv", header=TRUE)

and

> dataCOM<-read.csv(file.choose(), header=TRUE)

It is also possible to read fixed format ASCII files (those with pre-specified columns and no delimiters) using the read.fwf function. However, this task is tedious (as it is in any package). For ICPSR data, it is recommended to use the available setup files to read fixed format data into another package and then use the commands in R's foreign library.

Data in Excel Format

The easiest way to get Excel data into R is to save the spreadsheet as a comma-separated file and use R's read.csv function. The file type can be altered in Excel by changing the Save as type option to CSV (Comma Delimited).

How can I be notified of new resources/studies that are in my field of interest?

Real Simple Syndication (RSS) is used to notify SAMHDA website users of new resources or studies related to a topic of interest. RSS is a broad application that is used to aggregate content from different websites. (For more background information, please see the FAQ on RSS.)

To create your RSS notification, first create a search that matches your interest(s). An example of a simple search is a search on "residential treatment." An example of a more complex search is a search on "HIV" with the addition of a filter by subject for "attitudes."

Once your search has been completed, you will see the option for RSS above the search results and to the right of the "Study Search Results" header.

How do I subscribe to a RSS feed?

Subscribing to a RSS feed is like bookmarking a webpage. You won't receive an email notification for new resources. Instead, each time you visit that bookmark, you'll see a list of new resources that have been added since your last visit to the page.

To receive a general notification of anything new related to SAMHDA, click on the search box without entering any terms. This returns a list of all SAMHDA resources.

How do I decompress the files I download from your site?

Files distributed via the Internet were compressed using Windows Zip (WinZip) data compression software. Files compressed using WinZip have the .zip file name extension. Users who download compressed files will have to decompress the files before using them.

Please note that files downloaded prior to November 29, 2004, may have the Gzip compression format.

Windows

Users with Windows operating systems may need to download a zip program (e.g., WinZip, 7-Zip, PeaZip).

Macintosh

For Macintosh OSX users, decompression software is built into the operating system; you can open compressed files by double-clicking on the .zip file.

UNIX/Linux

Users in the UNIX/Linux environment can use the unzip command to decompress .zip files.

Once you have the appropriate software on your local machine, follow the instructions supplied by your software to decompress the zipped files.

What kind of documentation files do you provide?

SAMHDA data collections may contain the following documentation files:

  • Codebook: Information on the structure, contents, and layout of a data file. The codebook may also contain information on study design and methodology.

  • Data collection instrument: Original survey instrument or questionnaire.

  • User Guide: More detailed information about a particular collection, often provided by the principal investigator.

How do I find data referenced in a journal as being available in your archive?

You may search SAMHDA's holdings by title, principal investigator, or other information related to the data.

What are the consequences of violating the terms of use agreement for data distributed by SAMHDA?

Researchers who participate in surveys and other research instruments distributed by SAMHSA expect their responses to remain confidential. The data distributed by SAMHDA are for statistical analysis, and may not be used to identify specific individuals or organizations. SAMHSA takes steps to assure that subjects cannot be identified. Data users are also obligated to act responsibly and not to violate the privacy of subjects intentionally or unintentionally.

If SAMHSA or ICPSR determines that the Terms of Use agreement has been violated, then possible sanctions could include:


  • Report of the violation to the Research Integrity Officer, Institutional Review Board, or Human Subjects Review Committee of the user's institution. A range of sanctions are available to institutions including revocation of tenure and termination.

  • If the confidentiality of human subjects has been violated, then report of the violation may be made to the Federal Office for Human Research Protections. This may result in an investigation of the user's institution, which can result in institution-wide sanctions including the suspension of all research grants.

  • Report of the violation of federal law to the United States Attorney General for possible prosecution.

  • Court awarded payments of damages to any individual(s)/organization(s) harmed by the breach of confidential data.

A data collection instrument is included in the documentation for a study. Can I use this instrument for my project?

Some instruments utilized as part of the data collection process for a project deposited with SAMHDA may contain whole or partial contents from copyrighted instruments. Reproductions of such instruments are provided as documentation for the analysis of the data in the associated collection. You cannot use a survey question from one of the SAMHDA studies in building your own survey without conducting further research on your part. The question may be part of a copyrighted instrument, and using it would be copyright infringement and/or plagiarism. Restrictions on "fair use" apply to all copyrighted content.

Circular 21 from the U.S. Copyright Office provides basic information on fair use and several important legislative provisions and other documents addressing reproduction of copyrighted materials by librarians and educators.

How do I use Excel to import tab-delimited ASCII data?

SAMHDA produces a tab-delimited ASCII data file (*.tsv) that can be used to import data into Excel. An example of a tab-delimited ASCII data file name is 34481-0001-Data.tsv, which can be downloaded for the National Survey on Drug Use and Health (NSDUH), 2011 study.

Warning: An error will occur if you attempt to read in a data file that exceeds Excel’s maximum row and column limits.

Prior to Excel 2007, the maximum number of rows and columns in a single spreadsheet could not exceed 65,536 rows and 256 columns. From Excel 2007 through Excel 2013, the number of rows and columns increased to 1,048,576 rows by 16,384 columns.

Instructions

  1. Download a tab-delimited ASCII data file (*.tsv) from the SAMHDA site.
  2. Most of the files downloaded from the SAMHDA site are compressed. You will have to decompress the files using decompression software (e.g., WinZip). More information about decompressing files can be found in the FAQ: How do I decompress the files I download from your site?
  3. Open the tab-delimited ASCII data file in Excel using an Open file dialog box. This will open Excel’s text Import Wizard.
  4. In the Import Wizard, complete the following steps:
    1. Confirm that the button for Delimited is marked and the box for "Start import at row" is set to 1.
      Click on Next.
    2. Select Tab in the Delimiters option box.
      Click on Next.
    3. Leave all columns set to General. SAMHDA studies do not contain string or date variables.
      Click on Finish.
  5. Review the imported data file.

Row 1 will contain the names of the variables. Column A will be the CASEID variable. To confirm the import worked properly, scroll across and down to check the number of variables and cases imported. Compare these numbers against those provided by SAMHDA in the file manifest included in your download.

Why can't I print my PDF codebook?

Some versions of the Acrobat software can't properly manage PDFs created in older versions of the software. SAMHDA processors test documentation files to ensure compatibility with the latest version of the free Acrobat Reader. We suggest downloading the latest version of the free Acrobat Reader from Adobe's website.

How do I interpret a record from an ASCII data file?

Data files in SAMHDA are usually distributed as columnar ASCII files that consist of rows and columns of alphanumeric characters. Since ASCII data files are text files, they can be opened in any word processing program or Internet browser. However, the alphanumeric characters are not meaningful without the help of a codebook or setup files to identify the columns of the ASCII data file as particular variables.

This example illustrates how to interpret an ASCII data file for the Treatment Episode Data Set - Discharges (TEDS-D), 2009 (ICPSR 33621).

The data file consists of 1,620,588 cases or observations, which in this example are treatment discharges. Example 1 shows the first 10 lines of data in this file. The first observation, or line of data, is highlighted in yellow.

Example 1: The first case or line of data in the data file

The data file is a fixed format data file and is stored in a logical record length of 127. This means that each line is comprised of 127 characters. These 127 characters correspond to 65 variables or data items. In example 2, the first and last columns are highlighted. The first column is labeled with a “1” and contains a value of “0”; the last column is labeled with “127” and contains a value of “8.”

Example 2: Each record is the same length (127 characters long)

In order to know which columns comprise particular variables, it is necessary to refer to the TEDS-D, 2009 codebook. The following examples illustrate how to read the first five variables from this ASCII data file, beginning with the first record (row) and counting from left to right:

VARIABLE 1

CASEID-CASE IDENTIFICATION NUMBER: This variable is positioned in column locations 1 through 8 and contains the value "1" for the first record (highlighted in red). This value represents the first sequential case identification number and is used to uniquely identify a given record in the data file.

Example 3: Variable 1 in Columns 1-8

VARIABLE 2

YEAR-YEAR OF DISCHARGE: This variable is positioned in column locations 9 through 12 and represents the year of the client's discharge from substance abuse treatment. Each record in the data file has the value "2009."

Example 4: Variable 2 in Columns 9-12

VARIABLE 3

AGE-AGE (RECODED): This variable is positioned in column locations 13 through 14 and contains the value "6" for the first record. This value represents the age category of “25-29.”.

Example 5: Variable 3 in Column 13-14

VARIABLE 4

GENDER-SEX: This variable is positioned in column locations 15 through 16 and contains the value "1" for the first record. This code identifies the sex of this client as "MALE."

Example 6: Variable 4 in Columns 15-16

VARIABLE 5

RACE-RACE: This variable is positioned in column locations 17 through 18 and contains the value “4” for the first record. This code identifies this client as “BLACK OR AFRICAN AMERICAN.”

Example 7: Variable 5 in Columns 17-18

Commercially available statistical software packages such as SAS, SPSS, and Stata may make it easier to interpret data files and to subset the variables and/or cases as needed.

How can I decompress a file when my zip program (e.g., WinZip, 7-Zip) says that the file name is insensible?

The total path length (not file name length) has to be less than 255 characters. SAMHDA file names can be lengthy. If the path to which you wish to extract your files is also lengthy, then the zip program will fail.

Extract your files to the root directory of your hard drive (i.e., extract the files to c:\ instead of c:\User\My Documents\Projects\SAMHDA Data\).

How do I submit a citation for a publication I have written using your data?

If you have published work based on SAMHSA data, or if you know of data-related literature that is not in the SAMHDA bibliography, complete the online form to submit your citation or send the citation to bibliography@icpsr.umich.edu

What is the distinction between a series and a study?

By study, we mean a one-time data collection, a single year, or a collection within a series.

For example, the National Survey on Drug Use and Health (NSDUH) has been conducted annually since 1990, and every 2-3 years prior to that (as far back as 1979). When we talk about all the NSDUH surveys, we refer to the series; when we talk about a particular year, we refer to a study.

When we divide SAMHDA data into "Series" and "Other Studies," we're making a distinction between surveys repeated over time, and surveys that were a one-time event.

What is faceted searching? How does it work?

The SAMHDA website provides enhanced searching using the Solr search platform. Solr offers the following advantages:

  • Faceted searching;
  • No limit on the number of results;
  • Date searching of multiple fields; and
  • Same rules for Data, Reports & Publications, and Variable searches.

Faceted searching offers:

  • Easy shifting between refining and expanding search results;
  • Reduced likelihood of hitting a "no results found" page as facets provide an indicator of the size of your result set; and
  • Seamless integration with keyword searching.

For more on SAMHDA’s search options, see the FAQ "How do I search SAMHDA's holdings for datasets or particular variables?"

How do I get permission to use the public-use data archived at SAMHDA?

You do not need to obtain permission to access, analyze, or publish findings based on SAMHSA public-use data. All users of SAMHSA data sets are required to agree to the Terms of Use when downloading or analyzing data online, which includes properly citing the data files.

Do I have to create a MyData account to download SAMHDA holdings?

Users do not need to have a MyData account to download public-use data on the SAMHDA website. All users of SAMHSA data sets are required to abide by the Terms of Use before the download can begin and properly cite the data files. A copy of the Terms of Use is also included with downloaded data files.

Restricted-use data are not available for download, but can be analyzed online in the R-DAS. To access the R-DAS, you must create a MyData account or login using your Facebook , Google, or LinkedIn Account.

How do I search SAMHDA's holdings for datasets or particular variables?

The SAMHDA website provides three ways to search SAMHDA holdings:

  • The Search/Compare Variables option searches question text, value, and variable labels for all SAMHDA studies.

  • The Data option searches for keywords or phrases within the descriptions of all the studies in SAMHDA's holdings. This search is available on each page of the SAMHDA site in the upper-right corner (outside of SDA).

  • The Reports & Publications option searches SAMHDA's bibliography that contains thousands of citations referencing data archived by SAMHDA.

SAMHDA's search utility is provided by ICPSR. SAMHDA’s holdings can also be searched on the ICPSR website. Note that searching on the ICPSR website may return studies that are not held or supported by SAMHDA, and that may be restricted to ICPSR member institutions.

Why do I get prompted for a username and password when trying to download a file on the SAMHDA site?

Most data archived in SAMHDA are in the public domain. You should be taken directly to the Terms of Use page after selecting a public-use data file for download or online analysis with SDA, without any request to log in. SAMHDA makes a select number of restricted-use studies available for analysis through the Restricted-use Data Analysis System (R-DAS). Login is required to access R-DAS studies. Users who wish to analyze restricted-use data available in the R-DAS may either establish a MyData account or log in using their Facebook, Google, or LinkedIn account.

If you are requested to log in, then you are either attempting to access a R-DAS study, download a study that is not part of the SAMHDA archive, or there is a problem with the website and you should notify the SAMHDA help desk immediately by emailing samhda-support@icpsr.umich.edu.

What are weights?

Many of the datasets archived in SAMHDA are designed to represent particular populations. For instance, the National Survey on Drug Use and Health (NSDUH) is designed to represent citizens of the United States, ages 12 and up. Since not every citizen can be surveyed, such studies utilize a stratified random sampling strategy. That means that the population is divided into broader groups (called sampling units) from which individual persons are randomly selected for participation. Depending on the focus of the research, more participants may be selected from particular sampling units than is necessary for proportional representation (a practice known as oversampling). This is done to ensure that the dataset will have an adequate number of cases to facilitate analysis of the subpopulation. Mathematical weights are used to adjust the findings to be proportionally accurate; thus, weights are adjustments used to ensure that findings accurately represent the population. Users should always consult the codebook to determine which weight, if any, is required.

The variables I want to examine have been removed or modified. How do I gain access to them?

Given the sensitive nature of the data archived in SAMHDA, great lengths are taken to ensure that respondent identity is protected. For this reason, variables that pose an identification (or disclosure) risk are modified or removed from public-use data files.

A select number of restricted-use data files can be analyzed using the Restricted-use Data Analysis System (R-DAS).

Access to confidential data may be made available through the Data Portal for approved researchers. Completion of an application process and project approval are required for access to the Data Portal. Data Portal application periods are announced through the SAMHDA email list.

What is SDA and why should I use it?

The Survey Documentation and Analysis (SDA) system allows users to conduct statistical analysis online. SDA was developed by the Computer-assisted Survey Methods Program (CSM) at the University of California at Berkeley. The SDA system is capable of performing a wide range of statistical analyses from bivariate cross-tabulation to multiple regression and analysis of variance. The system allows users to design and implement custom recodes, and generate subsets of data that can be extracted from SDA and imported into other statistical software for further analysis.

For more information on SDA, please consult the Online Analysis Using SDA page.

Additional information about SDA and its capabilities can be found in the SDA online documentation from the University of California at Berkeley.

Can I select multiple datasets for a download? What about multiple stat packages?

For every study in the archive, you can download all files or individual files.

Individual files can be downloaded from multiple locations:

To download files for all parts of a multiple-part study: Select the "Quick Download" option from the left column of a study home page to download the data and documentation files in a selected data format (e.g., SAS),  for all parts of a multiple-part study. Note that you cannot create a subset of datasets for a single download. Once you have downloaded the entire study, you can then select individual files to extract from the zip file provided.

To download a study part (dataset) in multiple statistical packages: Select the "All Dataset Files" option from a study home page to download all data and documentation files associated with a dataset in all data formats. Once you have downloaded the entire dataset in all data formats, you can then select individual files to extract from the zip file provided.

I don't want to run my own statistics. Where can I find reports or pre-run tables?

The SAMHSA Publications Store and SAMHSA Data, Outcomes, and Quality (DOQ) page provide data tables and both detailed and short reports for SAMHSA data.

SAMHDA also offers the following options if you are looking for summary information or tabulated reports:

  • Quick Tables allow for production of custom tables and graphs for preselected core variables from select studies.

  • Interactive Maps allow for visual exploration of geographic data using a U.S. map and drop down menus. Learn more about Interactive Maps.

  • The "Reports and Related Sites" section on series home pages provides links to related reports and websites for studies within the series (when applicable).

Thursday, January 19, 2012

HELP! I'm trying to locate alcohol and drug treatment resources.

SAMHSA sponsors a treatment locator that will help you find treatment professionals in your geographical area.

What are the main components of the SDA interface?

The SDA interface is split into four main sections:

  1. Program Selection Menu. Select from programs to perform analysis, create or recode variables, download the dataset or a customized data subset, view the codebook, or view the Getting Started help file. From the Analysis menu, users may select cross-tabulation, comparison of means, correlation matrix, comparison of correlations, multiple regression, or logit/probit regression. Users may also choose to list values of individual cases.
  2. Variable Selection. The buttons within this section change depending on the type of analysis selected. When variables are selected from the Variable Tree, they are placed into the selection box. The user can then specify which analysis field the variable should go into (i.e., row or column for a cross-tabulation, independent or dependent for a regression, or used as a control or filter variable). Users can also obtain a frequency table and accompanying question text for that variable by selecting the View button.
  3. Variable Tree. All variables and variable labels are listed and organized into groups with headings and subheadings as they appear within the codebook. Click on the +/- boxes next to the heading to view all the variables within a selected group. When a variable of interest is located, select the variable and SDA will place it into the variable selection box.
  4. Analysis. This section displays the required and optional fields for the type of analysis you have selected from the Analysis menu.

For further information on SDA, please select Getting Started from the SDA menu.

What enhancements have been made to SDA?

SAMHDA uses Survey Documentation and Analysis (SDA) version 3.5. A log of updates included in versions of SDA released since 2009 follows.

SDA 3.5 (released April 2011)

For the REGRESS and LOGIT programs, enhancements include:

  1. Computation of standard errors for complex samples using the jackknife repeated replication method.
  2. Ability to create multiple dummy variables for each value of a numeric variable in a single command.
  3. Option to request confidence intervals (90%, 95%, or 99%).
  4. Option to display the variance/covariance matrix of the regression coefficients.
  5. Ability to suppress the list of independent variables.

Content adapted from the SDA Manual (SDA 3.5).

SDA 3.4 (Released January 2010)

For the TABLES program, enhancements include:

  1. Corrections to the calculation of standard errors and confidence intervals.
  2. Addition of Rao-Scott F-tests.
  3. Ability to display weighted or unweighted N of cases.
  4. Option to set the number of decimals for all statistics.

For the MEANS program, enhancements include:

  1. Corrections to the calculation of the standard errors and confidence intervals.
  2. Option to display the p-value of each difference from the cells in a base row or column.
  3. Default reporting of the weighted N of cases in each cell for weighted analyses.
  4. Option to include charts in output.
  5. Optional diagnostic table for design variables.

Content adapted from the SDA Manual (SDA 3.4).

SDA 3.3 (Released June 2009)

  1. Disclosure Protection: SDA has the ability to suppress output that may compromise the confidentiality of survey respondents by applying disclosure protection rules to a data file. Analysis programs, including RECODE and COMPUTE, check for disclosure risk through use of disclosure rules. Disclosure rules may be specified to: a) prevent an analysis from being run; b) suppress the output after running an analysis; and c) suppress the unweighted number of cases from being reported in the output. The SDA 3.3 Documentation for Disclosure provides greater detail on the disclosure rules that may be specified. SAMHDA data files with disclosure protection rules in place are available in the Restricted-use Data Analysis System (R-DAS), which is based on the SDA platform.
  2. List Created Variables - View Button: The output from the listing of recoded and computed variables now includes a "View" button that provides access to descriptions of the variables. This feature can be accessed under the SDA Create Variables menu.
  3. Title: A title or label can be entered for each analysis request and will appear at the top of the HTML output produced by SDA analysis programs.
  4. Customized Subset: This procedure has also been revised in that recoded and computed variables may now be included in a subset. If pre-set selection filters have been defined by SAMHDA, these filters now apply to the interactive version of the subset procedure as well as to the analysis programs. A Comma Separated Values (CSV) file is available for output.

Content adapted from the SDA Manual (version 3.3)

For further information on SDA, please select the Getting Started button from the SDA menu or consult the Online Analysis Using SDA page.

How do I deposit data in the SAMHDA collection?

SAMHDA considers archiving data that meets the standards established with our funder, the Office of Applied Studies, Substance Abuse and Mental Health Services Administration. When making a determination to archive data, SAMHDA considers the quality of the research design and methods, the completeness of the data and documentation files, the scope of the study, and the significance of the study to the field.

Prior to a public release, SAMHDA processes the data and documentation files in order to ensure the quality of the public-use file(s), enhances their user friendliness, and resolves any confidentiality issues that may be present.

For New Depositors:

If you would like SAMHDA to consider archiving your data, please send us an inquiry at samhda-support@icpsr.umich.edu. Also, please visit ICPSR's Deposit Data site to view guidelines, instructions, and answers to frequently asked questions about the data deposit process.

Previous / Authorized Depositors:

Please access our Electronic Deposit Form.

What are Quick Tables?

Quick Tables is a streamlined data analysis tool that allows for production of custom tables and graphs for preselected core variables from select studies. The results can be copied and inserted into documents.

Currently, Quick Tables are available for the following series: NSDUH, TEDS-A, and TEDS-D. The Quick Tables page also provides links to DAWN Emergency Department data tables that are available on the SAMHSA website.

There are more than 500 citations that I want to export, but the system only allows for up to 500. How do I export the rest?

After completing a search for citations, you have the option of downloading your results into RIS, CSV, or EndNote XML. Currently the system allows you to export the first 500 results. Most searches will not yield more than 500 citations; however, when this happens you will need to export the citations in stages (i.e., 500 citations at a time). The export files can then be merged into one file.

To export results beyond the first 500 citations, you will need to alter the URL in the address bar. Below is an illustration of this process using the NSDUH series, which contains over 1,600 citations. To export all of the NSDUH citations you will need to alter the URL two times, after obtaining the initial 500 results. The URL for the first 500 results is:

RIS

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/ris/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA

CSV

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/csv/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA&paging.startRow=500

EndNote XML

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/rsxml/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA&paging.startRow=500

To export the citations beginning with 501 up to 1000, add "&paging.startRow=501 " if there's no such element in the URL, or replace "paging.startRow=1" with "paging.startRow=501". For the above example, this would look like:

RIS

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/ris/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA&paging.startRow=501

CSV

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/csv/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA&paging.startRow=501

EndNote XML

http://www.icpsr.umich.edu/icpsrweb/ICPSR/biblio/rsxml/resources?sortBy=1&seriesId=64&studyId=0&paging.rows=500&archive=SAMHDA&paging.startRow=501

Notice that the URL contains "seriesId=64". The number 64 is the series number for the NSDUH. To use a different series, replace the series number in the URL (e.g., TEDS-A is 56).

What kind of data formats does SAMHDA distribute? Do you have SPSS portable files? SAS transport? Stata?

SAMHDA distributes data files in the following formats: SAS, SPSS, Stata, ASCII, tab-delimited, and R. ASCII data files can be downloaded with their accompanying SAS, SPSS, or Stata setup files.

SAS

SAS transport files are generated by the SAS XPORT (for studies with older release dates) and SAS CPORT (for studies with newer release dates) procedures. Both types of files contain specially formatted SAS data sets, which have variable labels as well as data. SAMHDA's SAS CPORT files include SAS format catalogs with value labels. SAMHDA also supplies a SAS supplemental syntax file that provides missing value recodes.

SAS CPORT files should be imported into SAS with the SAS CIMPORT procedure. See the FAQ "What is a SAS CPORT file? How do I use it?" for additional information.

Since SAS has an engine that reads SAS XPORT files, they can be read by any SAS command that can read an ordinary SAS data set, such as the SAS set statement or the SAS FREQ procedure. SAS XPORT files can also be converted to standard SAS data sets with the SAS COPY procedure.

SPSS

SAMHDA distributes two types of SPSS data files: SPSS SAV files written by the SPSS save command and SPSS portable files written by the SPSS export command. Both types of data files include variable labels and usually include value labels and missing value definitions.

To load SPSS SAV files into SPSS use the SPSS get command.

To read SPSS portable files into SPSS use the SPSS import command.

Stata

Stata data files should be loaded into Stata with the Stata use command.

R

To load R .rda files into R, use the R load () command. For example, if your downloaded data file, 32722-0001-Data.rda, is located at d:/Downloads/ICPSR_32722/DS0001, use load("d:/Downloads/ICPSR_32722/DS0001/32722-0001-Data.rda").

Using ASCII data and setup files

ICPSR has prepared tutorials on how to use setup files to import ASCII data:

  • ASCII Data File + SAS Setup Files : PDF PPT
  • ASCII Data File + SPSS Setup Files : PDF PPT
  • ASCII Data File + Stata Setup Files : PDF PPT

SAMHDA also offers the following help resources related to using specific data formats:

What are Interactive Maps and how do I use them?

Interactive Maps is an online analysis tool that allows for visual exploration of geographic data using a U.S. map and drop-down menus. Color-coding enables quick visual identification of a state by category (e.g. percent). Interactive Maps are available on mobile devices as well as desktop computers and can be easily printed.

Currently, maps allow for state-level geographic analysis of the Treatment Episode Data Set - Admissions (TEDS-A) series beginning with the year 2005. Maps for additional years of TEDS-A, other studies, and other geographic levels are planned for future release.

To begin using the maps, follow these steps:

  1. Select the year of data you wish to analyze.
  2. Select the substance of interest. Substances are those reported at admission and may have been reported as the primary, secondary, or tertiary substance of abuse.
  3. Click the "go" button.
  4. Move the cursor over a state to view the count and percentage of the substance at the time of treatment admission.

Several resources are available for assistance while using the maps:

  • A color-coded key beneath the map defines the varying degrees of response across the geographic level of analysis.
  • Zoom buttons located at the bottom-left of the map allow for zooming in on a geographic area. This is useful when trying to select one of the smaller states or the District of Columbia.
  • A map citation is provided for proper documentation of the map.
  • A hyperlink beneath the citation provides access to a tabular display of the data.
  • The Related Resources section gives quick access to the selected study and series home pages.

What are SAMHDA's Terms of Use?

All users of data archived at SAMHDA must agree to the Terms of Use prior to downloading or analyzing data online using SDA, the Simple Crosstab/Frequency tool, or the R-DAS. A copy of the Terms of Use is also included with downloaded data files.

Terms of Use

Please read the Terms of Use below. If you agree to them, click on the "I Agree" button to proceed. If you do not agree, you can click on the "I Do Not Agree" button and return to the home page.

These data are distributed under the following terms of use. By continuing past this point to the data retrieval process, you signify your agreement to comply with the requirements as stated below:

Privacy of RESEARCH SUBJECTS

Any intentional identification of a RESEARCH SUBJECT (whether an individual or an organization) or unauthorized disclosure of his or her confidential information violates the PROMISE OF CONFIDENTIALITY given to the providers of the information. Disclosure of confidential information may also be punishable under federal law. Therefore, users of data agree:
  • To use these datasets solely for research or statistical purposes and not for re-identification of specific RESEARCH SUBJECTS.
  • To make no use of the identity of any RESEARCH SUBJECT discovered inadvertently and to report any such discovery to CBHSQ and SAMHDA (samhda-support@icpsr.umich.edu)

Citing Data

You agree to reference the recommended bibliographic citation in any of your publications that use SAMHSA data. Authors of publications that use SAMHSA data are required to send citations of their published works to ICPSR for inclusion in a database of related publications (bibliography@icpsr.umich.edu).

Disclaimer

You acknowledge that SAMHSA and ICPSR will bear no responsibility for your use of the data or for your interpretations or inferences based upon such uses.

Violations

If SAMHSA or ICPSR determines that this terms of use agreement has been violated, then possible sanctions could include:

  • Report of the violation to the Research Integrity Officer, Institutional Review Board, or Human Subjects Review Committee of the user's institution. A range of sanctions are available to institutions including revocation of tenure and termination.
  • If the confidentiality of human subjects has been violated, then report of the violation may be made to the Federal Office for Human Research Protections. This may result in an investigation of the user's institution, which can result in institution-wide sanctions including the suspension of all research grants.
  • Report of the violation of federal law to the United States Attorney General for possible prosecution.
  • Court awarded payments of damages to any individual(s)/organization(s) harmed by the breach of confidential data.

Definitions


CBHSQ

Center for Behavioral Health Statistics and Quality

ICPSR

Inter-university Consortium for Political and Social Research

Promise of confidentiality

A promise to a respondent or research participant that the information the respondent provides will not be disseminated in identifiable form without the permission of the respondent; that the fact that the respondent participated in the study will not be disclosed; and that disseminated information will include no linkages to the identity of the respondent. Such a promise encompasses traditional notions of both confidentiality and anonymity. In most cases, federal law protects the confidentiality of the respondent's identity as referenced in the Promise of Confidentiality. Under this condition, names and other identifying information regarding respondents would be confidential.

Research subject

A person or organization that participates in a research study. A research subject may also be called a respondent. A respondent is generally a survey respondent or informant, experimental or observational subject, focus group participant, or any other person providing information to a study.

SAMHDA

Substance Abuse and Mental Health Data Archive

SAMHSA

Substance Abuse and Mental Health Services Administration

Why and how should I cite SAMHDA data?

Citing data files in publications based on those data is important for several reasons:

  • Other researchers may want to replicate research findings and need the bibliographic information provided in citations to identify and locate the referenced data;
  • Citations appearing in publication references are harvested by key electronic social sciences indexes, such as Web of Science, providing credit to the researchers; and
  • Data producers, funding agencies, and others can track citations to specific collections to determine types and levels of usage, thus measuring impact.

Where do I find the citation?

Citations for SAMHDA data can be found in the following locations:

  1. Study descriptions on study home pages
  2. File manifest
  3. PDF study description file

Both the file manifest and the PDF study description file are automatically included with every download. Thus, every download is accompanied by a copy of the standard citation that can be copied and pasted.

What do the citations look like?

Here is an example:

United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2011. ICPSR34481-v3. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2014-05-19. http://doi.org/10.3886/ICPSR34481.v3.

Note that we also include a DOI (Digital Object Identifier) at the end of each citation. A DOI provides a persistent link to a published digital object, such as an article or study. This means that if you publish an article using ICPSR data and you include the DOI in the data citation, you make it easy for other researchers to get back to the original data.

How can I let SAMHDA know about my publication?

Users of SAMHSA data are required to send the bibliographic citation for each completed manuscript or thesis abstract to SAMHDA. To have your publication added to the SAMHDA bibliography, complete the online form or email your citation to bibliography@icpsr.umich.edu.

What is SAMHDA?

The Substance Abuse and Mental Health Data Archive (SAMHDA) is a public resource funded by the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse & Mental Health Services Administration (SAMHSA).

CBHSQ has primary responsibility for the collection, analysis, and dissemination of SAMHSA's behavioral health data. CBHSQ promotes the access and use of the nation's substance abuse and mental health research data through SAMHDA.

SAMHDA provides public-use data files, file documentation, and access to restricted-use data files to support a better understanding of this critical area of public health. The University of Michigan, Inter-University Consortium for Political and Social Research (ICPSR), is under contract to CBHSQ to disseminate data, and maintain the SAMHDA website and bibliography of publications.

What resources are available through SAMHDA?

Documentation for studies consists of one or more data files and codebooks, as well as setup files for SPSS, SAS, and Stata. SAMHDA also provides a detailed description file for each study.

Many studies can be analyzed using the online Survey Documentation and Analysis (SDA) system. The SDA system allows users to conduct statistical analyses online without having to download data files or setup files. For more information on the SDA system, please consult the Online Analysis Using SDA page.

The Simple Crosstab/Frequency tool can be used for any study that is available for online analysis in SDA. This tool can be accessed from the Analyze Online page and the Dataset(s) section of the study home page.

Some studies also have Quick Tables and Interactive Maps.

A select number of restricted-use data files can be analyzed using the Restricted-use Data Analysis System (R-DAS). The R-DAS allows researchers to produce frequencies and cross-tabulations, and export their results.

SAMHDA provides links to publications, including a searchable database of bibliographic citations for publications based on SAMHSA data.

How much do SAMHDA data sets cost?

SAMHDA is a public resource funded by the Center for Behavioral Health Statistics and Quality (CBHSQ), Substance Abuse & Mental Health Services Administration (SAMHSA).

The SAMHDA website provides public-use data files, file documentation, and access to restricted-use data files at no cost to the user.

How does SAMHDA prepare public-use data files for release?

SAMHDA follows a series of steps for archiving each new SAMHSA data set. SAMHDA works with SAMHSA program staff to make any necessary corrections to the data and remedy any problems uncovered during data review.

Processing a study for public use requires that all variables, missing data codes, and coding schemes be standardized across elements of a study. This stage of processing may be lengthy depending on the data and completeness of materials received. All variables must be examined to ensure that each is identified and labeled. When variables are not thoroughly described, SAMHDA staff consult the documentation and/or questionnaires.

Each study is assessed to determine if any issues of respondent confidentiality exist, and checks are made for problems arising from either direct or indirect identifying variables. Direct identifiers may be blanked or deleted to safeguard privacy before releasing the data to the public. Reducing the disclosure risk introduced by indirect identifiers may involve recoding the data. For example, dates may be converted to time intervals; this allows for time lapse analyses without providing exact dates that might permit identification of respondents. Variables such as age and income may be converted to categories.

The technical characteristics of the documentation are verified against the data to ensure that the data and documentation match. Information relating to the data collection as a whole are examined (e.g., number of cases, number of variables, number of data files, record length, data structure, and how multiple files are linked). User-defined missing data codes and weights are documented and inter-field consistency checks are performed. Value labels are added when they are not part of the files that were received.

After the initial processing is complete, further quality checks are made. For example, the observed frequencies are verified against the reported frequencies and checks are made for consistency of survey responses and skip patterns. Data files are also reformatted to the smallest possible size for optimum transfer speed over the Internet.

Finally, public-use data files are released as SAS Transport (CPORT), SPSS System, Stata System, R system, ASCII Tab-delimited, and ASCII rectangular format with SAS, SPSS and Stata data definition statements (setup files). Supplemental files containing optional commands are available for the SAS Transport and Stata System files. When possible, data sets and codebooks are prepared for compatibility with SAMHDA's public-use online analysis system (SDA (?)).

How often does SAMHDA release new data files for download? Can I be notified?

New data files are released periodically throughout the year. Announcements about new releases are posted on the SAMHDA home page. You can also receive email notifications by subscribing to SAMHDA News.

How do I contact someone at SAMHDA for assistance?

Questions may be emailed to samhda-support@icpsr.umich.edu. SAMHDA also operates a toll-free helpline (888-741-7242) Monday through Friday, 8:00 a.m. to 5:00 p.m. (EST). The local helpline number is (734) 615-9524. Staff try to respond to email and helpline questions within one business day. Answers to many questions can be found in the SAMHDA help documentation.