NAEP Science: About the NAEP Science Assessment

About the NAEP Science Assessment

The National Assessment of Educational Progress (NAEP) is a congressionally mandated project administered by the National Center for Education Statistics (NCES) within the U.S. Department of Education and is the largest continuing and nationally representative assessment of what our nation's students know and can do in select subjects. NCES first administered NAEP in 1969 to measure student achievement nationally. The NAEP science assessment measures students’ knowledge of three broad content areas—Physical Science, Life Science, and Earth and Space Sciences—and four science practices—Identifying Science Principles, Using Science Principles, Using Scientific Inquiry, and Using Technological Design. These four practices describe how students use their science knowledge by measuring what they are able to do with the science content. Results for grades 4, 8, and 12 are reported for the nation.

In 2019, the NAEP science assessments at grades 4, 8, and 12 transitioned from being paper-based assessments (PBA) to digitally based assessments (DBA). A multi-step process was used for the transition from PBA to DBA, which involved administering the assessments in both formats. Students were randomly assigned to take either the digitally based or paper-based assessment in 2019. The assessment results in this report are based on the combined performance of students who took the paper-based and digitally based assessments. The transition was designed and implemented with careful intent to preserve trend lines that show student performance over time. Thus, the results from the 2019 science assessments can be compared to results from previous years. Read more about the NAEP Digitally Based Science Assessment.

Reporting the Results

Reporting NAEP Results

NAEP science results are reported as overall average scores on a 0–300 scale at grades 4, 8, and 12. In addition to an overall average scale score, results are also reported as average subscale scores, on a 0–300 scale, for each content area (Physical Science, Life Science, and Earth and Space Sciences). Because the content area subscales are developed independently, content area scores cannot be compared to one another or to the overall score. Science results are also reported as percentages of students performing at or above three NAEP achievement levels: NAEP Basic, NAEP Proficient, and NAEP Advanced. Because NAEP scores and NAEP achievement levels are developed independently for each subject, results cannot be compared across subjects. In addition, although average scores are reported on a 0–300 scale at all three grades, the scale scores were derived separately, and therefore scores cannot be compared across grades. Read more about the NAEP scaling process in the Technical Documentation.

Results are reported for students overall and for selected demographic groups, such as by race/ethnicity, gender, and students' eligibility for the National School Lunch Program (NSLP). Results for the NSLP have been reported since 2003, when the quality of the data on students' eligibility for the program improved. As a result of the passage of the Healthy, Hunger-Free Kids Act of 2010, schools can use a new universal meal service option, the "Community Eligibility Provision" (CEP). Through CEP, eligible schools can provide meal service to all students at no charge, regardless of economic status and without the need to collect eligibility data through household applications. CEP became available nationwide in the 2014–15 school year; as a result, the percentage of students categorized as eligible for NSLP has increased in comparison to 2013. Because students' eligibility for NSLP may be underreported at grade 12, the results are not included in this report. Therefore, readers should interpret NSLP trend results with caution.

Read more about how student groups are defined and how to interpret NAEP results from the science assessment.

NAEP reports results using widely accepted statistical standards; findings are reported based on a statistical significance level set at .05, with appropriate adjustments for multiple comparisons. Only those differences that are found to be statistically significant are referred to as "higher" or "lower."

Comparisons over time of scores and percentages or between groups are based on statistical tests that consider both the size of the difference and the standard errors of the two statistics being compared. Standard errors are margins of error, and estimates based on smaller groups are likely to have larger margins of error. For example, a 2-point change in the average score for the nation may be statistically significant, while a 2-point score change for a student group may not be, due to the size of the standard errors for the score estimate. The size of the standard errors may also be influenced by other factors, such as the degree to which the assessed students are representative of the entire population. Standard errors for the estimates presented in this report are available in the NAEP Data Explorer (NDE). For the 2019 analysis, an additional component was included for the standard error calculation when linking scores across the two delivery modes.

Average scores and percentages of students are presented as whole numbers in the report; however, the statistical comparison tests are based on unrounded numbers. In some cases, the scores or the percentages have the same whole-number values, but they are statistically different from each other. For example, the percentage of fourth-grade Asian/Pacific Islander students was 5 percent in 2019, which was statistically different from 5 percent in 2009. The "Customize data tables" link at the bottom of the page provides data tables from the NDE. The tables offer detailed information on more precise values for the scores and percentages and explain how the two comparison estimates differ from each other.

A scale score that is significantly higher or lower in comparison to an earlier assessment year is reliable evidence that student performance was different. NAEP is not, however, designed to identify the causes of change in student performance. Although comparisons are made in students' performance based on demographic characteristics and educational experiences, the comparisons cannot be used to establish a cause-and-effect relationship between the characteristic or experience and achievement. Many factors may influence student achievement, including educational policies and practices, the quality of teachers, and available resources. Such factors may change over time and vary among student groups; therefore, results must be interpreted with caution.