SERA2:
Evaluating Generalization Boundaries of Repeated Reading
The goal of the SERA2 project is to develop and pilot processes and supports for using an integrated fractional factorial design to estimate generalizability boundaries for repeated reading practices on students’ reading fluency. The approach involves using subject-matter theory of repeated reading to specify causal estimates of interest and hypothesized effect moderators. The factorial combination of moderators forms a “response grid” of effects for the purpose of examining generalizations. Because estimating effects for each cell in a response grid is usually not feasible, we apply fractional factorial design principles to select integrated replication designs that introduce systematic variations across units, settings, and times. Observed intervention and moderator effects estimated from the replication studies are then used to predict unobserved effects in the grid for examining generalizability boundaries.
In this phase of the project, we are collaborating with the Expert Consensus Panel to finalize the “response grid” for planning a series of studies and seek guidance to develop guidelines for piloting the intervention and control group protocols to be carried out at study sites. Specifically, we will seek Consensus Panel members’ feedback on the studies’ research aims, the prioritized moderators and their measurement for evaluating generalizability boundaries, eligibility criteria for schools and students, and the appropriate control conditions for assessing repeated reading practices.
Details on the “Issues for Consideration” to be discussed at the Consensus Panel meeting are provided in Section 6.
Click the arrows to the right of each section header to expand section content.
Section 1: Consensus Panel Meeting Summary (May 2023)
During our 2-day meeting in Charlottesville, our primary activity was engaging in two rounds of a Nominal Group Technique to identify key moderator variables for repeated reading in the areas of Units (learner characteristics), Treatments (intervention elements/variations), Outcomes, and Settings (UTOS). The activity involved brainstorming potential moderators in pairs, taking turns nominating moderator variables in each of the UTOS categories, discussion of each of the nominated variables in terms of its influence on reading outcomes and its practical relevance, and rank ordering the top five moderator variables in each UTOS category. At the end of the second round of the Nominal Group Technique, the top two-ranked moderator variables in each category were:
Units 1. Word reading/ decoding 2. Oral language | Outcome 1. ORF system (using novel ORF vs. previously read passages) 2. Reading comprehension |
Treatment 1. Difficulty of text 2. Modeling | Setting 1. Group size 2. Intervention agent |
Section 2: Qualitative Synthesis
Following the consensus panel, we conducted six online focus-group interviews to broaden input regarding key moderator variables for repeated reading to inform the development of the effect grid and integrated research design. Three focus groups were conducted with researchers who had conducted research related to repeated reading (involving a total of eight researchers with expertise in repeated reading) and three focus groups of practitioners with experience implementing repeated reading for students with learning disabilities (involving a total of eight practitioners across focus groups). Focus-group interviews involved open-ended questions about what variables participants saw as most influential and relevant for amplifying or dampening the effects of repeated reading (i.e., moderator variables) in each of the UTOS categories.
We then worked with an external researcher who qualitatively analyzed and summarized key themes and findings across the consensus panel (field notes and rankings were analyzed) and focus-group interviews (transcripts were analyzed). The analysis followed Miles and Huberman’s (1984) qualitative data analysis framework, which consists of three main stages: data reduction, data display, and conclusion drawing and verification. Key themes from the consensus panel, the research focus groups, and the practitioner focus groups are summarized below.
Consensus Panel (voting; documentation and field notes) | Researcher Focus Groups (transcripts) | Teacher (Practitioner) Focus Groups (transcripts) | |
---|---|---|---|
Units | 1. Word reading/ decoding level 2. Oral language level 3. Grade level (2-3, 4-6) | 1. Engaged and motivated 2. Decoding/ word reading 3. Reason for dis-fluency (including grade and specific barriers) 4. Oral language | 1. Reason for dis-fluency (more focus on specific barriers) 2. Decoding/ word reading 3. Oral language 4. Engaged and motivated |
Treatment | 1. Difficulty of text 2. Modeling 3. Dosage | 1. Quality of intervention agent’s training/ knowledge 2. Immediate corrective feedback 3. Modeling 4. Text selection (background knowledge and difficulty) | 1. Text selection (background knowledge and difficulty) 2. Immediate corrective and affirmative feedback 3. Student goal setting 4. Modeling |
Outcome | 1. ORF system 2. Reading comprehension 3. Combined fluency and comprehension | 1. CBM (ORF, etc.) vs. CBM plus prosody rubric 2. Cold read vs. familiar passage 3. Difficulty of text | 1. Cold read vs. familiar passage or topic (percent of words overlapping) 2. Difficulty of text 3. Includes prosody and/or comprehension |
Settings | 1. Group size 2. Intervention agent 3. School demographics | 1. Group size: one-on-one, small group, whole group | 1. Group size: one-on-one, small group, whole group |
The SERA Team then met and proposed potential moderator variables, as summarized in the following section, to prioritize in initial effect grids based on the qualitative summary and considerations of feasibility and impact.
Section 3: Identifying Key Moderators
The SERA2 Team carefully considered the results of the qualitative synthesis of recommendations from the consensus panel and the six focus-group interviews to identify key moderator variables for potential inclusion in preliminary effect grids, as well as operationalize the targeted variables.
Click the arrows to the right of each moderator to expand content.
Units
Grade level
- Lower Elementary: 3rd and 4th grade
- Upper Elementary: 5th and 6th grade
- Note: Ideally, grade-level gap would be bigger to create larger contrast, but this is the most we can do and stay in elementary schools.
Disability (Yes/No)
Behavior/disruptions (Yes/No)
- Falling in high-risk category on the academic behaviors scale (6 teacher-rating items) on SAEBRS
- Note: Our thinking is to measure and covary for this but not prioritize it as a basis for assignment to groups.
Treatment
Text difficulty
- One grade-level below reading-level
- One grade-level above reading-level
- Assess students’ reading fluency grade level on GORT-5 (fluency score)
Modeling (Yes/No)
- Researcher reads passage first
Error correction (Yes/No)
- Providing correct word for errors
Outcomes
- Measured by DIBELS ORF (or similar ORF passages)
- At grade-level or reading-level?
Setting
Group size (1:1, small group)
- Small group = 3 (will be logistically challenging, recommend not using it in pilot)
Section 4: Developing Effect Grid
Based on feedback from the Consensus Panel and our focus group interviews, the research team developed an initial response grid incorporating all the key moderators. Based on the seven two-level moderators identified by the Consensus Panel and focus group members, the team created a grid representing 128 potential treatment effects at a single site (2x2x2x2x2x2x2), with 32 observed effects selected using a fractional factorial design. This initial grid is shown in Figure 1.
Although this effect grid is comprehensive of all key moderator variables, it is unwieldy and unrealistic to examine fully even when strategically sampling only selected cells for investigation. The research team then created two smaller, more manageable effect grids (see Figures 2 and 3) by simplifying the original grid and fixing certain moderators at specific levels.
Figure 2 sets two moderator variables—modeling and error correction—at specific levels related to the treatment. This ensures that both modeling and error correction are consistently included in the delivery of repeated reading. As a result, we generate an effect grid of 32 cells, which can be explored through an integrated research design by estimating treatment effects in eight of the potential 32 cells. In this design, the focus is on estimating generalizability boundaries based on student characteristics, while keeping treatment implementation variations fixed.
Figure 3 presents an alternative approach, reducing the effect grid to 16 cells by fixing certain student-level moderators—such as grade level, disability type, and behavior disruptions—at specific levels. For instance, we sample only 3rd and 4th graders identified with reading-related learning disabilities who do not frequently exhibit disruptive behaviors. In this design, we prioritize estimating generalizability boundaries based on treatment implementation variations, while fixing student characteristic levels.
The final research design (to be selected by the Consensus Panel) will likely incorporate a mix of student and intervention characteristics that the Consensus Panel prioritizes for estimating generalizability boundaries. For example, in the hybrid design represented in Figure 4, we would examine generalizability across four school sites, two grade levels (upper and lower elementary), students with and without disabilities, texts that are at and above grade level, and instruction delivered in both small group and one-on-one settings. We would fix the variable of whether students’ behavioral disruptions (including only students who do not), and the intervention protocol would consistently include modeling and error correction. The hybrid research design in Figure 4 includes 64 potential cells, 16 observed effects, and 4 effects per site.
During the Consensus Panel meeting, we seek to finalize moderators (and measurement of moderators) for generating the effect grid that we will use to guide the pilot studies.
Section 5: Intervention & Control Protocols
We are considering three possible treatment contrasts for evaluating the generalizability of repeated reading effects. In each contrast, the protocol for repeated reading remains the same. Here we describe the three potential control contrasts, briefly highlighting the advantages and disadvantages of each. Click here for a detailed description of the intervention condition.
Click the arrows to the right of each option to expand content.
Option 1: Repeated Reading vs. Business-As-Usual (BAU)
The BAU control condition follows typical classroom reading routines without specific rereading activities, prompts, or structured feedback. In other words, the researchers do not work with students in the control group. This approach is feasible to implement, as it requires the researcher to work only with students assigned to the repeated reading condition, and it reflects “real world” classroom settings, enhancing ecological validity. However, because the BAU condition can vary widely across schools, this variability may introduce additional sources of variation in moderator effects, potentially biasing our estimates and making replication or application of findings challenging across diverse educational contexts.
Option 2: Repeated Reading vs. Non-Reading Activity
In this contrast, the comparison condition engages in an activity that would not logically affect reading fluency (e.g., an activity in math). This approach standardizes the control condition, enabling a consistent comparison across classrooms and schools. If we chose another intervention for the comparison condition, that would allow for exploration of the effectiveness of two different interventions with distinct outcomes, which could also enhance parental consent as both interventions show potential efficacy. Additionally, offering a reading and a math intervention may appeal to parents and school personnel, and increase engagement in study efforts. Limitations to this approach include having to train researchers on a second protocol and additional time/personnel needed to implement two separate protocols, and potential for attrition due to preference for one condition over another.
Option 3: Repeated Reading vs. Wide Reading
Here, the comparison group reads three different passages one time each during each session. This setup isolates the specific effect of repeated readings on fluency by directly comparing it to reading passages only one time. Advantages of this approach include straightforward implementation and standardization across studies, as well as high relevance for instructional practice. Additionally, the contrast allows for a clear assessment of the impact of rereading specific passages on fluency, enhancing causal interpretation. However, requiring the interventionist to work with both groups necessitates pulling more students from class, and doubling personnel resources need to implement the study. Existing literature also suggests that the benefits of repeated reading over single readings may be limited, potentially affecting the study’s impact. It is also possible to add a control group that does not receive any reading intervention to gauge the effects of repeated reading and wide reading in comparison to no intervention, but that would require a larger sample and may be problematic in terms of designing the effect grid.
Each contrast presents trade-offs related to implementation costs and resources. The level of training, monitoring, and fidelity required will vary across contrasts, especially if controls are not consistent across sites. Additionally, parental and school buy-in may differ depending on perceived value (e.g., BAU could raise concerns if seen as a “non-intervention” with varied classroom conditions). These considerations are key to planning a study design that balances feasibility, rigor, and relevance for both research and practical applications.
Section 6: Issues for Consideration
Research Aims
Do we wish to prioritize examination of generalizability across variations in intervention characteristics, generalizability across variations in student characteristics, or generalizability across variations in the hybrid of different moderator characteristics?
Moderators
Thinking about the selection of moderators for testing and levels/measurement of moderators (see Section 3):
- Is grade level above assessed reading level too high?
Control Condition
Select control condition and identify critical elements of intervention and control conditions (see Section 5).
- How many sessions is sufficient to expect an effect on reading fluency?
Student Eligibility Criteria
Student eligibility criteria – some initial thoughts:
- Grade level: Depends on final design, but early elementary = 3rd and 4th grade and upper elementary
- Reading disability and at risk: Disability = on IEP with (a) primary diagnosis of LD and (b) IEP goal related to reading and scoring in risk range on screener; At risk = not on IEP or 504 plan, but scoring in at-risk range on screener.
- Floor for reading level on screener
- English proficiency
Effect Grid
Consider which area(s) we want to focus on when examining generalizability of repeated reading effects.
- For example, is examining generalizability across variations in student characteristics more important than variations in the intervention characteristics?