NAEP Long-Term Trends: About the NAEP Long-Term Trend Assessment

About the NAEP Long-Term Trend Assessment

The National Assessment of Educational Progress (NAEP) has monitored student performance since the early 1970s through its long-term trend (LTT) assessments. Results from the 2022/2023 LTT assessments in reading and mathematics are based on nationally representative samples of 9- and 13-year-olds. Since its beginning in 1969, the primary mission of NAEP has been to measure academic progress by regularly administering various subject-area assessments to nationally representative samples of students. The existence of two national assessment programs—LTT and main NAEP—makes it possible to meet two major objectives: (1) to measure students' educational progress over a long period of time (LTT), and (2) to measure students' knowledge and skills based on the most current curricula and standards (main NAEP). It should be noted that results from the LTT assessments cannot be directly compared to those from the main NAEP assessments because the LTT assessments use different questions and because students are sampled by age rather than by grade. Learn more about the differences between the LTT and main NAEP assessments.

Several changes were made to the LTT assessment in 2004 to align it with current assessment practices and policies applicable to the main NAEP assessments. A bridge study was conducted to ensure that the trend line could be continued over time. The 2004 bridge study involves administering two assessments: one that replicates the assessment given in the 1999 and prior assessments (a bridge assessment or the original assessment format), and one that represents the new design (a modified assessment or the revised assessment format). Results for 1997 through 1999 presented in this report are from the original assessment format, and results for 2004 through 2023 are from the revised assessment format. In addition, results for both the original and revised assessment formats are presented for the 2004 LTT assessment. Read more information about the two assessment formats and changes made to the LTT assessment.

Reporting NAEP Long-Term Trend Results

NAEP began administering long-term trend assessments periodically in the 1970s. Long-term trend reading and mathematics results are reported as average scores on a 0 to 500 scale and as percentages of students performing at or above NAEP long-term trend performance levels. Although the scale range is the same for both reading and mathematics, scores cannot be directly compared across subjects because the scales were developed independently of each other.

NAEP assessments are designed to best support certain types of inferences. In the case of long-term trend, subsequent to the baseline Item Response Theory scaling that established the cross-age scales, the assessment has been scaled within age. These within-age scalings involve jointly analyzing the data from the current and most recent NAEP long-term trend assessments. These separate within-age scalings are then linked to the cross-age scale that was originally established. This approach strengthens the evidence that the assessment provides to support within-age comparisons across time. Because the assessment was explicitly scaled in a cross-age manner only in the base year, cross-age comparisons are most strongly supported in that year rather than in subsequent assessment years. While within-age scales from subsequent years have been aligned to the initial cross-age scale, and cross-age comparisons may be reasonably well supported, the emphasis continues to be on within-age comparisons. It should be borne in mind, however, that NAEP is not a cohort or longitudinal design, and the LTT assessments have not been given at intervals that coincide with the age span (4 years apart) in the assessment and have been given at different times of the year for the three ages. As a result, inferences about the performance of cohorts of students over time should not be made based on NAEP LTT results. Read more about the NAEP scaling process in the NAEP Technical Documentation.

Setting LTT Performance Levels

Results are also presented in terms of the percentages of students reaching performance levels. The long-term trend performance levels are distinct from the achievement levels that have been set for main NAEP assessments. To help interpret NAEP long-term trend results, the reading and mathematics scales were each divided into five successive levels of performance (150, 200, 250, 300, and 350). A "scale anchoring" process was used to define what it meant to score at each of these levels. Questions were identified that were more likely to be answered correctly by students performing at each level on the scale and less likely to be answered correctly by students performing at the next lower level. Students at a given level had to have at least a 65 to 80 percent probability of answering the question correctly; students at the next lower level had a much lower probability of answering it correctly. The difference in probabilities between adjacent levels had to exceed 30 percent. Content specialists for each subject examined these empirically selected question sets and used their professional judgment to characterize each level. The reading scale anchoring was conducted on the basis of the 1984 assessment, and the scale anchoring for mathematics trend reporting was based on the 1986 assessment.

Interpreting Statistical Significance

NAEP reports results using widely accepted statistical standards; findings are reported based on a statistical significance level set at .05, with appropriate adjustments for multiple comparisons. Only those differences that are found to be statistically significant are referred to as "higher" or "lower."

Comparisons over time of scores and percentages or between groups are based on statistical tests that consider both the size of the difference and the standard errors of the two statistics being compared. Standard errors are margins of error, and estimates based on smaller groups are likely to have larger margins of error. The size of the standard errors may also be influenced by other factors, such as the degree to which the assessed students are representative of the entire population. Standard errors for the estimates presented in this report are available in the NAEP Data Explorer (NDE).

Average scores and percentages of students are presented as whole numbers in the report; however, the statistical comparison tests are based on unrounded numbers. In some cases, the scores or the percentages have the same whole number values, but there is a statistically significant difference between them. The "Customize data tables" link at the bottom of the page provides data tables from the NDE. The tables offer detailed information on more precise values for the scores and percentages and explain how the two comparison estimates differ from each other.

A scale score that is statistically significantly higher or lower in comparison to an earlier assessment year is reliable evidence that student performance has changed. NAEP is not, however, designed to identify the causes of change in student performance. Although comparisons are made in students' performance based on demographic characteristics and educational experiences, the comparisons cannot be used to establish a cause-and-effect relationship between the characteristic or experience and achievement. Many factors may influence student achievement, including educational policies and practices and available resources. Such factors may change over time and vary among student groups.

NAEP Reporting Groups

Race/Ethnicity

Results by students' race/ethnicity are presented in this report based on information collected from two different sources:

Observed Race/Ethnicity. Students were assigned to a racial/ethnic category based on the assessment administrator's observation. A category for Hispanic students did not exist in 1971, but was included in subsequent years. The results for the 2004 original assessment format and all previous assessment years are based on observed race/ethnicity.

School-Reported Race/Ethnicity. Data about students' race/ethnicity from school records were collected in 2004 but were not collected for any of the previous NAEP long-term trend assessments. The results presented in this report for the 2004 revised assessment format and for 2008 and later assessment years are based on school-reported race/ethnicity.

Parents' Education Level

Students were asked to indicate the extent of schooling for each of their parents, choosing among the following options: did not finish high school, graduated from high school, had some education after high school, or graduated from college. The response indicating the highest level of education for either parent was selected for reporting. The questions were presented only to students at ages 13 and 17. (Results for parental education are not reported at age 9 because research has shown that students' reports of their parents' education level are less reliable at this age.) Although students in previous long-term trend assessments were asked about their parents' level of education, the wording of the question in the revised format of the reading assessments administered in 2004 and later was different from previous years. Consequently, results from the 2004 and later reading assessments are reported for the parents' education level variable in this report. However, this is not the case for the long-term trend mathematics assessment. Results for this variable in mathematics go back to 1978.

Grade Attended

The long-term trend assessments are administered to samples of students defined by age rather than by grade. Nine-year-olds are typically in fourth grade, 13-year-olds are typically in eighth grade, and 17-year-olds are typically in eleventh grade. Some students in each age group, however, are in a grade that is below or above the grade that is typical for their age. For example, some 13-year-olds are in the seventh or ninth grade rather than the eighth grade. Different factors may contribute to why students are in a lower or higher grade than is typical for their age. Such factors could include students having started school a year earlier or later than usual, having been held back a grade, or having skipped a grade.

See more information about the student groups that NAEP reports in the long-term trend assessments.