The Assessment Design
Science Interactive Computer Tasks (ICTs) and Hands-On Tasks (HOTs) were administered as probe assessments in 2009 at grades 4, 8, and 12. The two assessments were given to separate national representative samples; therefore, the results are not linked to each other or to the main operational science assessment.
As described in the science framework, the ICT assessment contains instruments uniquely suited to the computer. For grades 4, 8, and 12, each student received two sections of computer-based tasks. One section included a single extended task that required 40 minutes to complete; the other section contained two short tasks, each requiring 20 minutes to complete.
Both short and extended ICTs recorded students' typed responses to constructed-response questions and their answers to multiple-choice questions. In addition, the extended ICTs captured student actions during the tasks that provided in-depth information about student performance related to the inquiry process. For instance, the number of plant trays used by a student in conducting an experiment during the Mystery Plants task was captured by the system. Scoring for the typed response followed procedures similar to those used in scoring other NAEP assessments. Scoring of student actions for the extended ICTs was done automatically using computer algorithms.
All of the students in the ICT assessment also received a section of general survey questions; however, due to system limitations, it was not possible during the ICT sessions to collect information about science-related classroom learning activities from students (as is often performed in other NAEP assessments).
For the HOT assessment, each student received a booklet containing two sections. Each section consisted of a hands-on task with related paper-and-pencil questions and materials students needed to complete the task. Students in all three grades were allowed 40 minutes to complete each of the two HOT sections. The booklets in the HOT assessment included two sets of student survey questions. The first consisted of general questions, as in the ICT assessment, while the second consisted of science-related questions about classroom learning activities. Students were given five minutes to complete each of the two sets of questions.
Overview of interactive computer tasks, by grade: 2009
|Here Comes the Sun: Predict path of the sun and number of daylight hours to determine best planting location.
||Cracking Concrete: Predict the effect of the freeze/thaw cycle on a concrete sidewalk.
||Mystery Plants: Determine optimum amount of light and nutrients for plant growth.
Investigate attributes of two soil samples to determine the best site for building a playground.
Investigate flow rates of four liquids to determine best temperature for bottling honey.
||Planning a Park: Evaluate impact of planned recreation park on specific organisms.
Investigate relationships between the luminosity and temperature of different stars.
Investigate energy transfer between substances to determine best metal for cooking pot.
||The Phytoplankton Factor: Investigate ocean conditions that support phytoplankton growth.
Overview of hands-on tasks, by grade: 2009
|How Seeds Travel: Investigate the characteristics of seeds to determine how seeds are spread.
||Electrical Circuits: Design electrical circuit to investigate the conductivity of objects.
Investigate magnetic properties of metals to identify the metals and compare their magnetic strength.
Investigate physical and chemical properties of cooking ingredients to identify the ingredients in a mixture.
|Maintaining Water Systems:
Investigate water samples to determine the better site for a new town.
Investigate pigments in unknown organisms to determine the identity of organisms.
The results for the 2009 Science ICT and HOT probe assessments are based on administration procedures that allowed accommodations for students with disabilities (SD) and English Language Learners (ELL) selected to participate in the two assessments. Appropriate accommodations were determined by school officials. Read-aloud accommodations were provided for HOTs and short ICTs, but were not provided for the extended ICTs. As a result, a small portion of students in the ICT assessment who required read-aloud accommodations were only given the two short ICTs at that grade level. Approximately 4 percent, 2 percent, and 1 percent of the students sampled at grades 4, 8, and 12, respectively, took two short ICTs.
Percentage of students with disabilities (SD) and/or English language learners (ELL) identified, excluded, and assessed in NAEP ICT and HOT assessments, as a percentage of all students, by grade: 2009
Sampling and Weighting
The target population for the ICT and HOT assessments consisted of 4th-, 8th-, and 12th-graders enrolled in public and private schools nationwide. The national samples were chosen using a multistage design that involved drawing students from the sampled public and private schools across the country. Within each grade, the results from the assessed public and private school students were combined to provide accurate estimates of student performance in the nation.
Each school that participated in the assessment, and each student assessed, represents a portion of the population of interest. Results are weighted to make appropriate inferences between the student samples and the respective populations from which they are drawn. Sampling weights account for the disproportionate representation of some groups in the selected sample. While part of the sample, there were insufficient American Indian/Alaska Native students assessed to permit reporting. In addition, participation rates fell below the 70 percent guideline for private schools, and therefore results cannot be reported separately. The table below provides a summary of the school and student participation rates and sample sizes for the ICT and HOT assessments. The numbers reported include both public and private school students.
School and student participation rates in NAEP science ICT and HOT assessments by grade: 2009
National School Lunch Program
NAEP collects data on student eligibility for the National School Lunch Program (NSLP) as an indicator of low
income. Under the guidelines of NSLP, children from families with incomes below 130 percent of the poverty
level are eligible for free meals. Those from families with incomes between 130 and 185 percent of the poverty level are eligible for reduced-price meals. (For the period July 1, 2008, through June 30, 2009, for a family of four, 130 percent of the poverty level was $27,560, and 185 percent was $39,220.) Some schools provide free meals to all students irrespective of individual eligibility, using their own funds to cover the costs of noneligible students. Under special provisions of the National School Lunch Act intended to reduce the administrative burden of determining student eligibility every year, schools can be reimbursed based on eligibility data for a single base year. Participating schools might have high percentages of eligible students and report all students as eligible for free lunch. Because students' eligibility for free or reduced-price school lunch may be underreported at grade 12, the results are not included on this website.
Reporting Results for ICTs and HOTs
As with all other NAEP assessments, student responses to constructed-response items were scored according to standard scoring procedures. For examples of how tasks were scored, see the interactive features for tasks at each grade contained on this website. The data from scoring was then analyzed to create summaries of student performance as shown on this website. In particular, percent correct statistics were calculated to summarize performance in the major findings across grades and tasks, and a process analysis was conducted to examine how students proceeded through selected tasks. Both statistical methods are described below, and some cautionary notes about the data are also provided.
Item Percentage Correct
Item percentage correct (P+) is a question-level descriptive statistic that ranges from 0 to 100. P+ for a multiple-choice or dichotomous constructed-response question is the percentage of examinees who received a correct score on the question. For a multi-level constructed-response question, P+ is calculated by summing a weighted percentage of students attaining each score (or level). The weight is based on the number of levels for the question.
Student Percent Correct Scores
Student performance across the three ICTs or two HOTs per grade level were also summarized as a student percent correct score. This percentage was calculated as the total score for a student across multiple tasks in the assessment and then divided by the maximum possible score for the questions the student attempted and multiplied by 100. For example, suppose a student attempted five questions in the first ICT task, four in the second task, and four in the third task, yielding a total score of 30. (Note that constructed-response items are "weighted" based on the number of score categories, e.g., a 4-category item has a weight of 3 with students getting 0, 1, 2, or 3 points credit on the item.) In addition, suppose that the maximum possible score for the 13 items the student attempted is 45. Then the student’s percent correct score would be 30 divided by 45 multiplied by 100, which equals 67. The sum of scores for those items that students attempted, not all the items that appeared in an assessment, is used as the denominator of the student percent correct score. This method is used because NAEP assessments are intended to be non-speeded, implying that students should not be penalized for failing to reach particular questions because of time limitations.
Following the description of question-level student performance on the ICTs and HOTs, a more in-depth depiction of student profiles were created on a sequence of items within a task using "process analysis." The process analysis grouped students into various categories according to their response patterns to a pre-specified item sequence. This approach is intended to provide a more detailed view of how students engaged in various aspects of scientific inquiry and problem solving as they moved through the task. Note that only the students who responded to all the items defined in a process sequence were included in the process analysis.
An example is provided below for a set of questions in the Grade 8 short ICT task, Bottling Honey. Four out of the six questions in the task were grouped into a three-step process. In the first step, students used the simulation to compare the flow rates of the four liquids at 20ºC and determined which liquids had the same flow rate at 30ºC. Students’ responses to this question were categorized as "Complete or Essential" and "Partial, Inadequate, or Incorrect." Step 2 of the process asked students to describe the steps of an investigation to determine which liquids flow more quickly at a higher temperature than at a lower temperature. Students’ responses to this question were categorized as "Correct", "Partial", and "Incorrect." In step 3, students used the procedure they developed to collect data and drew conclusions to determine which liquids flow more quickly at a higher temperature than a lower temperature. Students’ responses to this question were categorized into "Complete", "Partial", and "Incorrect." Taken together, these three process steps provided a total of 18 student profiles, signifying different combinations of responses. For example, one group of students provided complete responses for liquids flow rate comparisons, correctly designed the liquids flow investigation, but had a conclusion to the liquids flow question at step 3 that was rated as incorrect.
Treatment of Missing Responses
In computing item percentage correct statistics, missing responses at the end of a task are considered not reached items and are treated as if they had not been presented to the respondent or not administered. Missing responses to items before the last observed response in a task are considered intentional omissions. Such omission is treated as wrong for a multiple-choice item, and the lowest category for a constructed-response item. For the extended ICT tasks, however, tasks were designed in such a way that students cannot intentionally omit items. Instead, they have to select at least one of the options for a multiple-choice item or type in at least seven characters for any item that requires a typed response before being enabled by the testing system to move to the next question. Therefore, any missing response to an item before the last observed response that appeared in the extended tasks is due to computer delivery system or administration errors, and such missing responses were treated as if the item had not been administered.
Interpreting the Results
NAEP reports results using widely accepted statistical
standards; findings are reported on a statistical
significance level set at .05 with appropriate
adjustments for multiple comparisons. Only those
differences that are found to be statistically significant
are discussed as higher or lower. Statements about differences between groups in percentages should be interpreted with caution, if at least one of the groups being compared is small in size and/or if "extreme" percentages are being compared. Because of the relatively small sample sizes in the ICT and HOT assessments, the number of students might not be sufficiently high to permit accurate estimation of subgroup performance results.