University of Oregon

Oregon Extended Assessment-Vertical Scaling Project

Goal:

  • Establish a computer-based vertically scaled English language arts (combined reading and writing) and mathematics test with three levels of difficulty – low, mid, and high – that students can take after completing 15 items for placement into these levels. This type of testing provides an adaptive form to ensure students do not take items for which they have a low probability of answering correctly or do not take items that are excessively easy (g., waste time and fatigue students). All tests will be linked to the Common Core State Standards.
  • Ensure the test has the possibility of being administered with paper-pencil format and that this administration is comparable to the computer administered format.

General Assumptions

  • Test development will occur with (a) development of test item blueprint linked to CCSS, (b) completion of a technical specifications document, (c) development of an item writing training document, (d) completion of prototype items, and (e) collection of items from field-based teachers that are then standardized with graphics. Assistance from ODE will be necessary in recruiting teachers from the field to assist in item writing.
  • Test linkage with CCSS will be formally analyzed using another group of teachers to participate in a distributed item review study that collects information on alignment, bias (sensitivity), and perceived difficulty. Assistance from ODE will be necessary in recruiting teachers from the field to assist in alignment to standards.
  • We anticipate that it will take two years to provide vertical scales in grades 3-8 to be ready by 2016. Vertical scaling will only occur in grades 3-8 and only in English language arts and mathematics; Grade 11 does not have contiguous grade levels (before or after) to provide a vertical scale. For the same reasons, Science test development will not be vertically scaled.

N.B. Although BRT can develop all of the items within the first year, there are not a sufficient number of students to field test and scale the items to place them into equivalent alternate forms within one year. Analyses of previous years of extended data, approximately 1,100 students per grade taking the reading and math measures in grades 3-5, 900 students take the reading and math measures in grades 6-7, 700 in grade 8 & 500 students in grade 11. Grade 12 students may participate in the Grade 11 assessment, as part of the Essential Skills requirements. However, their test results will not be included in this project.

  • The 2014-2015 administration will be paper-pencil with scores entered into a secure ODE data entry website. The field tests will be fixed PDFs that teachers can access through BRT servers after successful completion of the online training and proficiency tests. This training will include a new module for training teachers on the access and administration of the new tests.
  • Field-testing will occur throughout the test window (February through May) requiring a waiver that allows participation but not Annual Measurable Objectives (AMO).
  • Each item appearing in field-test will have at least 200 student responses to be used in scaling the items both horizontally and vertically.
  • An unbalanced design will be used: Some test forms will not have the same number of items as others due to the fact that we are only linking to the next grade level (not having items overlap in prior grades, only in subsequent grade).
  • Each field-testing form will be composed of 35-40 items (25 unique items, 5 items vertically linked to the next grade, and 5-10 items horizontally linked to other ‘same grade’ forms).
  • After field-testing, there will be 210 items to be divided into three levels of difficulty (low-mid-high) with approximately 70 items per level, per grade. The final test (at each level) will be comprised of 40 score reporting items, with potential for three forms per level of difficulty and the opportunity to rotate items within forms in successive years.
  • The first 5 items will be exclusively informational. They will be used to determine pre-requisite skills, communication level, the typical level of assistance a student might need during test administration (Note: this is a departure from prior use of the LoI score, which established a level of support that an assessor could not exceed. The assessors will provide whatever of support is needed for each item as the test demands require).
  • The next 15 items will be used to place the students into the low-mid-high difficulty levels. Placement will be based on distribution values from the Rasch scaling to create two cut-off score values that distinguish low from mid (e.g. one standard deviation below the mean of 0) and mid from high (e.g., one standard deviation above the mean of 0). This placement test will contribute to the total performance score used for reporting purposes. Students who are unable to receive any points will have testing discontinued.
  • We plan to empirically validate the placement test by reviewing the distributions. For example, students responding within the range of 1-5 items are likely to be placed into the low range of difficulty; students responding within the range of 6-10 items correct are likely be placed into the middle level difficulty; students responding within the range of 11-15 items correct are likely to take the high difficulty test.
Total Points Distribution of Points Likely Placement
Low Mid High
0 0 0 0 None
1 to 3 0 to 1 0 to 1 0 to 1 Low
4 to 5 1 to 2 1 to 2 0 to 1 Low
6 to 8 2 to 3 2 to 3 1 to 2 Mid
9 to 10 3 to 4 2 to 3 2 to 3 Mid
11 to 13 4 to 5 3 to 4 3 to 4 High
14 to 15 4 to 5 4 to 5 4 to 5 High
  • The 25 score reporting items will be sampled from the 100 items within each strata (low-mid-high difficulty); each year, new forms will be sampled with replacement to avoid practice effects, ensure generalizability, and ensure successive values are independent; repeated items will be used to anchor successive administrations (when they occur).
  • In summary, the test will be comprised of 40 items: (a) 5 pre-requisite skills items, (b) 15 items to determine placement into an appropriate level of difficulty, and (c) the final 25 items comprising the appropriate difficulty-level for the score to be based upon.
BRT Computer Distribution in Field Testing
  • BRT has developed a computer administration algorithm to ensure temporary (field-testing) forms have the maximum number of students taking the field test items. This algorithm seeds the ‘form’ available for the first n (e.g., 10) teachers to take the form and then forces the next n (e.g., 10) to take the next form. This pattern continues through all 5 forms and then begins again (beginning with ‘form 1’). In this way, 20 waves of teachers take the test, ensuring that each ‘form’ has 200 students.
  • Form distribution will be nested within teachers so that the same form is administered to all students for any given teacher before moving to another form. To provide comparability in count, teachers will need to sign up for an order, specifying the number of students to be given the assessment in each grade level and subject area.
  • A planned total of 1,000 students will take each of five forms (200 students per form), depending upon grade level frequencies; a secure file sharing system automatically assigns student the assessment form that ensures that form frequencies are balanced.
4th – 8th Grade Unbalanced Design in English Language Arts (combined reading and writing) and Mathematics YEAR 1