About the NAEP Mathematics Assessment

The National Assessment of Educational Progress (NAEP) mathematics assessment at grades 4 and 8 measures students’ knowledge and skills in mathematics and their ability to solve problems in mathematical and real-world contexts. Results are reported for the nation overall, for states and jurisdictions, and for districts participating in the Trial Urban District Assessment (TUDA). In 2017, the NAEP mathematics assessment was administered for the first time as a digitally based assessment (DBA) at grades 4 and 8; prior to 2017, paper-based assessments (PBA) were administered. A multi-step process was used for the transition from PBA to DBA in order to preserve trend lines that show student performance over time. The process involved administering the assessment in both the DBA and PBA formats to randomly equivalent groups of students in 2017. The results from the 2017 assessment can therefore be compared to those from previous years, showing how students’ performance in mathematics has changed over time.

Survey Questionnaire Indices

Development of NAEP Survey Questionnaire Indices

As part of the NAEP mathematics assessment, survey questionnaires are given to students, teachers, and school administrators. These questionnaires collect contextual information to provide a better understanding of educational experiences and factors that are related to students’ learning both in and outside of the classroom and to allow for meaningful student group comparisons.

While some survey questions are analyzed and reported individually (for example, amount of books in students’ homes), several questions on the same topic are combined into an index measuring a single underlying construct or concept. The creation of 2017 indices involved the following four main steps:

  1. Selection of constructs of interest. The selection of constructs of interest to be measured through the survey questionnaires was guided in part by the National Assessment Governing Board framework for collection and reporting of contextual information. In addition, NCES reviewed relevant literature on key contextual factors linked to student achievement in mathematics to identify the types of survey questions and constructs needed to examine these factors in the NAEP assessment.
  2. Question development. Survey questions were drafted, reviewed, and revised. Throughout the development process, the survey questions were reviewed by external advisory groups that included survey experts, subject-area experts, teachers, educational researchers, and statisticians. As noted above, some questions were drafted and revised with the intent of analyzing and reporting them individually; others were drafted and revised with the intent of combining them into indices measuring constructs of interest.
  3. Evaluation of questions. New and revised survey questions underwent pilot testing, whereby a small sample of participants (students, teachers, and school administrators) is interviewed to identify potential issues with their understanding of the questions and their ability to provide reliable and valid responses. Some questions were dropped or further revised based on the pilot test results. The questions were then field tested among a larger group of participants and responses were analyzed. The overall distribution of responses was examined to evaluate whether participants were answering the questions as expected. Relationships between survey responses and student performance were also examined. A method known as factor analysis was used to examine the empirical relationships among questions to be included in the indices measuring constructs of interest. Factor analysis can show, based on relationships among responses to the questions, how strongly the questions “group together” as a measure of the same construct.
  4. Index scoring. Using the item response theory (IRT) partial credit scaling model, index scores were estimated from students’ responses and transformed onto a scale which ranged from 0–20. As a reporting aid, each index scale was divided into low, moderate, and high index score categories. The cut points for the index score categories were determined based on the average response to the set of survey questions in each index.