Information on survey instruments, variable types, the interviewing process, item nonresponse, sample weights and design effects, data documentation, and how to access the data are available below. To learn more about the cohort, please see the NLSY97 Data Overivew.
The term "survey instrument" is used to refer to the NLSY97 questionnaires that serve as the primary source of information on a given respondent. Each questionnaire is organized around a set of topical subjects, the titles of which usually appear on either the first page of each section of the questionnaire or as a header. The primary variables found within the main data set are derived directly from one or more survey instruments. The various survey instruments are described in detail in Interview Methods. To access the individual questionnaires themselves, go to Questionnaires in the Other Documentation section.
Several types of variables are present in the NLSY97 data including:
This section will help users understand variable descriptions or titles, symbols and rosters, and created variables. View Types of Variables: Raw, Symbols, Rosters and Created.
The NLSY97 sampling weights, which are constructed in each survey year, provide the researcher with an estimate of how many individuals in the United States are represented by each NLSY97 respondent. Individual case weights are assigned to produce group population estimates when used in tabulations. If users need longitudinal weights for multiple survey years or for a specific set of respondent ids, they can create custom weights by going to the NLSY97 Custom Weighting page. This section includes the following information:
After the interviewer has finished surveying the respondent, the interviewer fills out an interviewer remarks section, providing objective and subjective details about the interview. Interviewers also provide demographic characteristics about themselves. In addition, information on the interviewer's contacts with the respondent (number of attempts at interviewing the respondent, for instance) is available. View Interview Remarks, Characteristics and Contacts.
The codebook provides information on item nonresponse, that is, which questions respondents declined to answer, answered as "don't know," or were skipped through. All missing data are clearly flagged in the NLSY97 data set with five negative values: (-1) refusal, (-2) don't know, (-3) invalid skip, (-4) valid skip, and (-5) noninterview. The created variable "Reason for Noninterview" (RNI) is available in each survey round and provides counts for the different reasons (unable to be located, refusal, deceased, etc.) a respondent is not interviewed. The extent of non-participation in each survey round is illustrated in Retention & Reasons for Noninterview. Also available in the database are timing variables that provide information on how long it took respondents to complete the total interview and to complete individual sections. In 2019, a separate timings dataset was added that includes a more extensive set of timings for select variables and survey years. View Item Nonresponse and Interview Timings.
Approximately 10% of completed interviews are validated every round. The validation reinterviews are conducted with randomly selected respondents to assess the quality of the interview data and to make sure the interview was done properly. Validation data for many of the rounds are available to public users and offer opportunities for studying response variance, item reliability, and other methodological issues. View Interview Validation .
Variables present in the NLSY97 main file are documented via (1) codebook; (2) accompanying supplemental documentation; and (3) error updates. This section describes these three components of the NLSY97 documentation and discusses the important types of information found within each. View NLSY97 Documentation.
Last Modified Date: April 24, 2020