NAEP Science: About the NAEP Science Assessment

About the NAEP Science Assessment

The National Assessment of Educational Progress (NAEP) is a congressionally mandated project administered by the National Center for Education Statistics (NCES) within the U.S. Department of Education and is the largest continuing and nationally representative assessment of what our nation's students know and can do in select subjects. NCES first administered NAEP in 1969 to measure student achievement nationally. The NAEP science assessment measures students’ knowledge of three broad content areas—Physical Science, Life Science, and Earth and Space Sciences—and four science practices—Identifying Science Principles, Using Science Principles, Using Scientific Inquiry, and Using Technological Design. These four practices describe how students use their science knowledge by measuring what they are able to do with the science content. Results for grades 4, 8, and 12 are reported for the nation.

In 2019, the NAEP science assessments at grades 4, 8, and 12 transitioned from being paper-based assessments (PBA) to digitally based assessments (DBA). A multi-step process was used for the transition from PBA to DBA, which involved administering the assessments in both formats. Students were randomly assigned to take either the digitally based or paper-based assessment in 2019. The assessment results in this report are based on the combined performance of students who took the paper-based and digitally based assessments. The transition was designed and implemented with careful intent to preserve trend lines that show student performance over time. Thus, the results from the 2019 science assessments can be compared to results from previous years. Read more about the NAEP Digitally Based Science Assessment.

Survey Questionnaire Indices

NAEP Survey Questionnaires

As part of the NAEP science assessment, survey questionnaires are given to students, teachers, and school administrators at grades 4 and 8 and to students and school administrators only at grade 12. These questionnaires collect contextual information to provide a better understanding of educational experiences and factors that are related to students' learning, both in and outside of the classroom, and to allow for meaningful student group comparisons. Learn more about NAEP survey questionnaires.

The highlighted findings in this report demonstrate the range of information available from the NAEP science survey questionnaires. They do not provide a complete picture of students' learning experiences inside and outside of school. The NAEP science student, teacher, and school questionnaire data can be explored further using the NAEP Data Explorer. Explore the 2019 NAEP science student (grade 4, grade 8, grade 12), teacher (grade 4, grade 8), and school (grade 4, grade 8, grade 12) questionnaires.

NAEP survey questionnaire responses provide additional information for understanding NAEP performance results. Although comparisons in students' performance are made based on student, teacher, and school characteristics and educational experiences, these results cannot be used to establish a cause-and-effect relationship between the characteristics or experiences and student achievement. NAEP is not designed to identify the causes of performance differences. There are many factors that may influence average student achievement, including local educational policies and practices, the quality of teachers, and available resources. Such factors may change over time and vary among student groups; therefore, results must be interpreted with caution.

Development of NAEP Survey Questionnaire Indices

While some survey questions are analyzed and reported individually (for example, the numbers of books in students' homes), several questions on the same topic are combined into an index measuring a single underlying construct or concept. More information about the 2019 NAEP science indices and their corresponding questions can be found in the 2019 NAEP science student (grade 4, grade 8, grade 12) questionnaires.

The creation of 2019 indices involved the following four main steps:

Selection of constructs of interest. The selection of constructs of interest to be measured through the survey questionnaires was guided in part by the National Assessment Governing Board framework for collection and reporting of contextual information. In addition, NCES reviewed relevant literature on key contextual factors linked to student achievement in science to identify the types of survey questions and constructs needed to examine these factors in the NAEP assessment.
Question development. Survey questions were drafted, reviewed, and revised. Throughout the development process, the survey questions were reviewed by external advisory groups that included survey experts, subject-area experts, teachers, educational researchers, and statisticians. As noted above, some questions were drafted and revised with the intent of analyzing and reporting them individually; others were drafted and revised with the intent of combining them into indices measuring constructs of interest.
Evaluation of questions. New and revised survey questions underwent pretesting whereby a small sample of participants (students, teachers, and school administrators) are interviewed to identify potential issues with their understanding of the questions and their ability to provide reliable and valid responses. Some questions were dropped or further revised based on the pretesting results. The questions were then further pretested among a larger group of participants and responses were analyzed. The overall distribution of responses was examined to evaluate whether participants were answering the questions as expected. Relationships between survey responses and student performance were also examined. A method known as factor analysis was used to examine the empirical relationships among questions to be included in the indices measuring constructs of interest. Factor analysis can show, based on relationships among responses to the questions, how strongly the questions "group together" as a measure of the same construct. Convergent and discriminant validity of the construct with respect to other constructs of interest were also examined. If the construct of interest had the expected pattern of relationships and nonrelationships, the construct validity of the factor as representing the intended index was supported.
Index scoring. Using the item response theory (IRT) partial credit scaling model, index scores were estimated from students' responses and transformed onto a scale which ranged from 0 to 20. As a reporting aid, each index scale was divided into low, moderate, and high index score categories. The cut points for the index score categories correspond to the survey question response categories, and students were classified into a category based on their average responses to the survey questions in each index. For each index survey question, response categories were scored as numerical values (e.g., for an item with five response categories, category A was scored as 1, B was scored as 2, C was scored as 3, D was scored as 4, and E was scored as 5). In general, high average responses to individual questions correspond to high index score values, and low average responses to individual questions correspond to low index score values. As an example, for a set of index survey questions with five response categories (such as not at all, a little bit, somewhat, quite a bit, and very much), students with an average response of less than 3 (somewhat) would be classified as low on the index. Students with an average response greater than or equal to 3 (somewhat) to less than 4 (quite a bit) would be classified as moderate on the index. Finally, students with an average response of greater than or equal to 4 (quite a bit) would be classified as high on the index.