A Meta-Analysis on the Effects of Text Structure Instruction on Reading Comprehension in the Upper Elementary Grades

In this meta-analysis, the authors synthesize results from 44 (quasi-) experimental studies on informational and narrative text structure interventions involving students in grades 4–6 in regular school settings. Findings show that text structure instruction had positive immediate effects on students’ reading comprehension but that effect sizes varied largely across outcome measures: questions (Hedges’ g = 0.25), summarization (g = 0.57), recall (g = 0.37), and knowledge about text structure (g = 0.38). However, students who received text structure instruction no longer outperformed control groups at delayed posttests. Content-related features, such as a focus on paragraph-level structure, active construction of graphic organizers, and teaching rule-based summarization techniques, moderated the effectiveness of text structure instruction, but these effects also varied across outcome measures. Instructional features moderated delayed effects: Interventions with opportunities for individual student practice resulted in higher delayed effects for comprehension questions. The authors argue that text structure instruction deserves a place in the primary school curriculum so the positive effects on reading will be maintained. Good reading comprehension skills are crucial for understanding text and play a pivotal role in academic, social, and economic success (Oakhill, Cain, & Elbro, 2015; Rapp, van den Broek, McMaster, Kendeou, & Espin, 2007). However, reading comprehension is a complex skill, requiring both fluent decoding abilities and good language proficiency (Gough & Tunmer, 1986; Scarborough, 2001), both of which need to be promoted through instruction (Bus & van IJzendoorn, 1999; Oakhill et al., 2015). Our aim in the current meta-analysis was to examine whether text structure instruction can successfully improve the reading comprehension of students in the upper elementary grades and to determine what content and instructional components are related to the best outcomes. According to national standards, by the end of primary education, students should be able to understand simple narrative and informational texts, distinguish various genres, and learn from texts (e.g., Expertgroep Doorlopende Leerlijnen Taal en Rekenen, 2009; National Governors As sociation Center for Best Practices & Council of Chief State School Officers, 2010). Despite intensive instruction, a substantial number of elementary school students struggle with reading comprehension (Kendeou & van den Suzanne T.M. Bogaerds-Hazenberg Jacqueline Evers-Vermeul


A B S T R A C T
In this meta-analysis, the authors synthesize results from 44 (quasi-) experimental studies on informational and narrative text structure interventions involving students in grades 4-6 in regular school settings.Findings show that text structure instruction had positive immediate effects on students' reading comprehension but that effect sizes varied largely across outcome measures: questions (Hedges' g = 0.25), summarization (g = 0.57), recall (g = 0.37), and knowledge about text structure (g = 0.38).However, students who received text structure instruction no longer outperformed control groups at delayed posttests.Content-related features, such as a focus on paragraph-level structure, active construction of graphic organizers, and teaching rule-based summarization techniques, moderated the effectiveness of text structure instruction, but these effects also varied across outcome measures.Instructional features moderated delayed effects: Interventions with opportunities for individual student practice resulted in higher delayed effects for comprehension questions.The authors argue that text structure instruction deserves a place in the primary school curriculum so the positive effects on reading will be maintained.

G
ood reading comprehension skills are crucial for understanding text and play a pivotal role in academic, social, and economic success (Oakhill, Cain, & Elbro, 2015;Rapp, van den Broek, McMaster, Kendeou, & Espin, 2007).However, reading comprehension is a complex skill, requiring both fluent decoding abilities and good language proficiency (Gough & Tunmer, 1986;Scarborough, 2001), both of which need to be promoted through instruction (Bus & van IJzendoorn, 1999;Oakhill et al., 2015).Our aim in the current meta-analysis was to examine whether text structure instruction can successfully improve the reading comprehension of students in the upper elementary grades and to determine what content and instructional components are related to the best outcomes.
According to national standards, by the end of primary education, students should be able to understand simple narrative and informational texts, distinguish various genres, and learn from texts (e.g., Expertgroep Doorlopende Leerlijnen Taal en Rekenen, 2009; National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010).Despite intensive instruction, a substantial number of elementary school students struggle with reading comprehension (Kendeou & van den Suzanne T.M.

Huub van den Bergh
Utrecht University, Utrecht, The Netherlands A Meta-Analysis on the Effects of Text Structure Instruction on Reading Comprehension in the Upper Elementary Grades Broek, 2007).Comprehension problems especially arise when students enter fourth grade and have to make the transition from learning to read to reading to learn, an effect known as the fourth-grade slump (Chall & Jacobs, 1983).
One of the factors that contribute to poor text comprehension is readers' inability to perceive the meaningful relations between information units (e.g., events, facts, settings) in a text (van den Broek & Kremer, 2000).As a result, readers construct a representation of the textbasepropositions that are directly derived from the text at the sentence level-but fail to understand how these propositions are organized on a global level.According to the construction-integration model (Kintsch, 1988(Kintsch, , 2013;;van Dijk & Kintsch, 1983), it is precisely this (re)organization of propositional information at a global level and the successive integration with prior knowledge that are crucial for text comprehension (Stine-Morrow, Gagne, Morrow, & DeWall, 2004;van der Schoot, Horsley, & van Lieshout, 2010).The better the information is organized in mental schemata and elaborated with relevant prior knowledge, the more coherent the readers' situation model of the text is (Kintsch, 1988(Kintsch, , 2013;;Kintsch & van Dijk, 1978) and, hence, the better their understanding is.
When readers are sensitive to the hierarchical organization of information in texts, this can facilitate the construction of the situation model (Kintsch, 2013).Several rhetorical patterns in the organization of information appear in many texts, such as cause-and-effect and compareand-contrast (Meyer, 1975).Pyle et al. (2017) defined text structure as "the organization of ideas, the relationship among the ideas, and the vocabulary used to convey meaning to the reader (Armbruster, 2004;Shanahan et al., 2010)" (p. 469).In short, text structure influences how readers read and how writers write (Jiang & Grabe, 2007) and, hence, not only describes the text itself but also characterizes readers' cognitive coherence representations (e.g., Meyer & Freedle, 1984;Sanders & Noordman, 2000).
Empirical research has shown that struggling readers typically fail to rely on text structure to guide their reading (Meyer, Brandt, & Bluth, 1980;Rapp et al., 2007).By contrast, proficient readers make active use of text structure to organize their memory for textual content; they attend to both the external physical organization of the text (e.g., headings, table of contents) and the internal structure of ideas for a better understanding (Anderson & Pearson, 1984;Kendeou & van den Broek, 2007;Meyer & Rice, 1984).They construct a higher order structure of text while reading, which "guides encoding, recall, and reproduction of the essential points of the text" (Armbruster, Anderson, & Ostertag, 1987, p. 332).Proficient readers' knowledge about genre or structure influences their expectations about the text and helps them better predict and organize textual content while reading (Zwaan, 1994).They are better able to ask relevant questions about the essential points of the text while reading, which helps them monitor their comprehension more effectively (Gersten, Fuchs, Williams, & Baker, 2001).From an instructional perspective, explicit teaching of text structure therefore seems a fruitful way to help students better anticipate, predict, and monitor their understanding of a text (Ogle & Blachowicz, 2002).
These insights have inspired a wide variety of interventions aimed at teaching the internal structure of texts as a strategy to improve students' comprehension and recall of texts.Two meta-analyses have shown that text structure instruction is a promising approach for improving text comprehension for learners of various ages (i.e., kindergarten to high school) and abilities (i.e., with and without learning disabilities), with overall effect sizes of Hedges' g = 0.56 (Hebert, Bohaty, Nelson, & Brown, 2016) and Cohen's d = 0.95 (Pyle et al., 2017).In our meta-analysis, we refined these meta-analyses by narrowing the focus to students in the upper elementary grades, who are faced with increasing literacy demands as they transition from learning to read to reading to learn (Chall & Jacobs, 1983).
At the same time, we broadened the scope by including studies on both narrative and informational text structures, and interventions focused on narrow text structure instruction (i.e., recognizing structures).In addition, we included closely related studies addressing structure-based summarization training, paragraph-level structure (e.g., topic sentences), and graphic organizer (GO) instruction.Although these studies did not explicitly train students in naming specific text structures, they focused students' attention on top-level structures.Moreover, summarization techniques and GOs are often part of text structure interventions, as such activities can promote students' sensitivity to hierarchical discourse patterns in texts, which could facilitate situation model construction (Kintsch, 2013).
Although Pyle et al. (2017) pointed out that it is important to find out whether and how instructional features (e.g., including collaborative activities) moderate the effectiveness of these interventions, they did not include an analysis of such features.Therefore, in our meta-analysis, we also examined the moderating impact of instructional features such as teacher modeling and collaborative versus individual student practice.This fits in well with research suggesting that more attention should be paid to the instructional context of text structure instruction (Beerwinkle, Wijekumar, Walpole, & Aguis, 2018;Pyle et al., 2017;Turcotte, Giguère, & Godbout, 2015;Wijekumar, Beerwinkle, Harris, & Graham, 2019;Williams, 2018).
A growing body of evidence suggests that reading comprehension is not a unitary construct and that different comprehension tests (e.g., questions, recall, summarization, text structure knowledge) measure different aspects of the reading process (Keenan, Betjemann, & Olson, 2008;Nation & Snowling, 1997).Although previous meta-analyses generated smaller effects on standardized measures than on researcher-developed measures (Hebert et al., 2016;Pyle et al., 2017) and showed that the largest effects were found on GO tasks (Pyle et al., 2017), these meta-analyses merged outcome measures when it came to moderator analyses of specific content-related variables.Therefore, it remains unclear whether, for instance, a focus on paragraph-level structure affects recall or only summarization.
In terms of maintenance, previous meta-analyses displayed positive delayed effects on reading achievement, although these effects were smaller and typically lacked consistency, and the median delay between immediate and delayed posttest was only seven days (Hebert et al., 2016).However, it is not yet clear how delayed effects are affected by intervention characteristics and if they vary per outcome measure.This is why we examined the impact of contentrelated and instructional features per outcome measure on immediate and delayed effects, which provides a valuable theoretical and methodological addition.In sum, in our meta-analysis, we evaluated the moderating effects of various content-related and instructional variables in interaction with various outcome measures on students' reading comprehension to refine and expand our knowledge of the ingredients that make text structure instruction worthwhile.Two research questions guided our meta-analysis: 1. What are the immediate and delayed effects of text structure instruction on students' text comprehension in grades 4-6, as measured by comprehension questions, recall, summarization, and knowledge of text structures? 2. How are these immediate and delayed effect sizes moderated by content-related and instructional features?
In the following sections, we provide an overview of the content-related and instructional features that were taken into account in this meta-analysis, thereby highlighting the state of the art of research on text structure interventions.These features are necessary to describe the categories of analysis and the specific issues that we investigated in our meta-analysis.

Content-Related Features of Text Structure Instruction
Reading instruction in the primary grades often starts with narrative texts, followed by an increasing number of expository texts in the upper elementary grades (Chall & Jacobs, 2003).In the 1970s, story grammar instruction was developed to aid students in their comprehension of stories (e.g., Hansen, 1978).However, it was soon discovered that students struggled most with expository text comprehension (Taylor & Beach, 1984), possibly because expository texts contain a high amount of specific vocabulary and many unfamiliar concepts and vary more in their underlying text structure than stories typically do (Hiebert & Mesmer, 2013;Pyle et al., 2017).In addition, the reading curriculum in primary schools was strongly focused on narrative texts (Duke, 2000;Durkin, 1978;Moss & Newton, 2002), which resulted in limited exposure to expository texts.As a consequence, primary school students displayed fewer spontaneously developed intuitions about expository text structure (Goldman, 1997).Since the early 1980s, many interventions on narrative and expository text structures have been developed, which often consist of combinations of the following features: In this section, we discuss these features, before turning to the instructional variables frequently found in text structure interventions.

Structure Recognition
Most text structure interventions focus on teaching students how to recognize the top-level structure of expository texts (e.g., compare-and-contrast, cause-and-effect).
Research on this topic started in the mid-1970s and was strongly influenced by information-processing theories focusing on cognitive processes that affect storage and retrieval of information (Kelly, 2019).One pioneer study was carried out by Bartlett (1978), who found that a training in text structure recognition increased ninth graders' ability to identify a text's top-level structure and use it for recall.Text structure recognition seems to raise text comprehension, especially when multiple text structures are taught (Hebert et al., 2016;Pyle et al., 2017).Meyer and Ray (2011) provided an excellent overview of interventions focused on structure recognition.Typically, these interventions consist of teaching questions that are answered (e.g., what are the differences between A and B?) and practice items in which students categorize short texts as belonging to one structure or another.Also, students learn about cue words or signaling words that frequently appear in these types of structures (e.g., similar or likewise in compare-and-contrast texts), as these words instruct readers in how to process an upcoming information segment and how to relate it to a previous one, thereby assisting them toward building coherent text representations (Sanders & Spooren, 2007;van Silfhout, Evers-Vermeul, & Sanders, 2015).In most interventions, cue words are simply listed in booklets, highlighted or mentioned as characteristics of a specific text structure, but in some interventions, students actively highlight (Bohaty, 2015), annotate (Gentry, 2006;Short & Ryan, 1984), or write down cue words found in a text (Broer, Aarnoutse, Kieviet, & van Leeuwe, 2002).
Especially in the context of informational texts, structure recognition training can also be focused on the paragraph-level of the text, such as by teaching students how paragraphs are typically structured in topic sentences, supporting details, and concluding sentences.Often, students receive explicit instruction about the main idea or on topic sentences and then read a text and select for each paragraph the sentence or phrase that captures the most important information at the highest level (Broer et al., 2002;de Jou & Sperb, 2009;Vidal-Abarca, 1990), or students learn through teacher modeling how to invent a good summarizing phrase when there is no clear topic sentence (Braxton, 2009).Sometimes, students learn more than simply distinguishing between main ideas and details.For instance, Gentry (2006) taught students to make annotations in the margins of paragraphs, such as writing "Ex." when the text discussed an example.In many of these interventions, paragraph-level instruction was combined with top-level structure recognition (i.e., instruction on the features of different structures).
In other studies, students learned about the blueprint of narrative texts, also called story grammar.This often involves teaching students how to identify the protagonists, their goals or problems, the actions, and the outcome (Gersten et al., 2001;Zwaan & Radvansky, 1998).Students typically receive instruction on the basic elements of a story and use this knowledge to analyze a short story (e.g., Idol & Croll, 1987).Research on these interventions in the upper elementary grades with typically developing students has been scarce, as many of these interventions are focused on younger students or students with learning difficulties (Gersten et al., 2001).In general, it seems that students with knowledge of story grammar are better able to make predictions about a text, recognize what information is crucial for the plot (Wolman, 1991), and recall more about the main story elements, such as the setting or protagonist (e.g., Hansen, 1978;Mandler & Johnson, 1977;Weaver & Dickinson, 1982).
In this meta-analysis, we examined the effects of text structure recognition on comprehension.In particular, we analyzed whether informational text structure instruction and narrative text structure instruction have similar effects and whether the number of text structures taught matters.

Structure Visualization
Another family of strategies for improving comprehension and recall is to teach students how to visualize the organization of main ideas via GOs.GOs can be used as visualizations of the hierarchical relations of textual information (Griffin, Malone, & Kameenui, 1995), in which relations between concepts are communicated through the visual placement of concepts relative to each other (Robinson & Molina, 2002).Some common GOs are Venn diagrams, matrices, knowledge maps, and tree diagrams (Manoli & Papadopoulou, 2012), but not outlines and lists, as these miss a visual argument and are more textlike (Hoffman, 2010).
GO development probably started in the early 1970s in Tokyo, Japan, where Ishikawa (1971) developed so-called fishbone diagrams, early cause-and-effect GOs that were used to control the supply chain and manufacturing process in the shipbuilding industry.In educational contexts, GOs first became vogue as a variation on advance organizers (Barron, 1980).For a long time, GOs were used in prereading activities to activate and organize students' prior content knowledge (Moorf & Readence, 1984), but nowadays, GOs are also used as a visualization strategy during reading (Leutner, Leopold, & Sumfleth, 2009) or as postreading summarization activities.The improved access to digital resources has made computer-based GOs and conceptmapping software (e.g., Kidspiration, Webspiration) increasingly popular in schools (Ciullo & Reutebuch, 2013;Smith & Okolo, 2010).Even so, most studies on structure visualization have been centered on print-based modalities.
GOs constitute a valuable way to enhance comprehension because they can show the main information of a text at a glance and simultaneously clarify relations between these ideas (Jones, Pierce, & Hunter, 1988).Structure visualization in GOs constitutes a major way to make students aware of text structure, as they provide a visual map of the structure (Jiang & Grabe, 2007;Manoli & Papadopoulou, 2012;Pyle et al., 2017).In addition, GOs can reveal the inferential relations among text elements (Graesser, Singer, & Trabasso, 1994) and facilitate students' skill in quickly locating specific information (Robinson & Skinner, 1996).
GOs vary in how strictly they map the text structure (Jiang & Grabe, 2007).Some GOs are previously established molds that more closely represent the discourse structure of the text, such as Venn diagrams for compare-and-contrast texts and timelines for chronological texts (Ocasio, 2006).These GOs are often instrumentally used to directly teach expository text structures (Alvermann & Boothby, 1986;Armbruster, Anderson, & Meyer, 1991).Typically, students read a text and fill in the empty slots in a partially completed GO, which afterward serves as input for a class discussion about text structures (Alvermann & Boothby, 1984;Boothby & Alvermann, 1984;Ermis, 2008;Moore, 1996;Van Steenbrugge, 2006).In some interventions, GO activities were complemented with writing down signaling words and main ideas (Broer et al., 2002), or students performed writing tasks based on information in their GOs (Moore, 1996;Raphael, Englert, & Kirschner, 1986).
Other GOs emphasize the hierarchical relations of textual information (e.g., between main ideas and supporting details), without focusing on a specific discourse structure.One example is mapping (Armbruster & Anderson, 1982;Berkowitz, 1986;Griffin et al., 1995), in which main ideas and their relevant relations are represented in a diagram.
For instance, Berkowitz (1986) taught students to write the title of a text in the middle of a sheet of paper, surrounded by boxes in which they noted one main idea per paragraph.Interventions were only included in our meta-analysis if the mapping involved representing the discourse structure or at least the hierarchical relations between ideas (e.g., main ideas vs. details).
In the context of narrative text structure instruction, story maps-schematic representations of the key information in narrative texts (Gardill & Jitendra, 1999)-are used.A story map can, for instance, include boxes for setting, goal, plot, and outcome, respectively (Tackett, Patberg & Dewitz, 1984).Story maps can be used postreading for summarization or during reading to help students monitor comprehension and/or highlight main events (Gardill & Jitendra, 1999;Gersten et al., 2001;Idol & Croll, 1987;Tackett et al., 1984).Story grammar can also help students formulate (Short & Ryan, 1984) or answer questions during reading that help in identifying the main constituents of the story (Gordon & Pearson, 1983).Because story maps also represent the structure of narrative texts, we included them in this meta-analysis.
So far, GO research has generated mixed results.Some studies have revealed positive results (e.g., Broer et al., 2002;Hoffman, 2010;Ulper & Akkok, 2010;Wijekumar et al., 2014;Wijekumar, Meyer, & Lei, 2012), but other interventions have been less effective (e.g., Alvermann & Boothby, 1984;Raphael et al., 1986) or showed that students need a great deal of instructional support to actually benefit from GOs (Griffin et al., 1995).One major issue concerns whether students benefit more from exposure to author-constructed or teacher-constructed organizers than from self-constructed organizers.Some researchers have stated that simple exposure to GOs may not be sufficient; rather, students may need extended instructional training and practice with GOs before they are able to recognize text structure and make use of this knowledge while reading (Jiang & Grabe, 2007).Various studies have shown, for instance, that the active involvement of students in constructing GOs, even when they are already partially complete, facilitates reading comprehension (e.g., Berkowitz, 1986;Spiegel & Barufaldi, 1994;Van Steenbrugge, 2006).However, Stull and Mayer (2007) argued that authorprovided organizers are more effective, as student construction of GOs might create cognitive overload.
In this meta-analysis, we therefore examined the effects of structure-based visualizations on comprehension and analyzed whether active construction of structure-based GOs has a positive or negative additional impact on comprehension.

Structure-Based Summarization
To get the gist of a text, readers must overcome the limitations of working memory by ignoring extraneous or re dundant information and focusing specifically on macrolevel information, such as topic sentences (Bean & Steenwyk, 1984;Kintsch & van Dijk, 1978).This process of eliminating and reworking information can be promoted through summarization instruction (Armbruster et al., 1987;Bean & Steenwyk, 1984;Elledge, 2013;Frey, Fisher, & Hernandez, 2003;Taylor, 1985;Westby, Culatta, Lawrence, & Hall-Kenyon, 2010).Teaching summarization improves both the quality of written summaries and students' overall text comprehension (Duke, Pearson, Strachan, & Billman, 2011;Taylor & Beach, 1984).In their literature review, Miyatsu, Nguyen, and McDaniel (2018) stated that the defining characteristic of successful summarization training is the emphasis on main idea identification and text structure recognition.In fact, text structure can scaffold students' summarization skills, as it provides tools and heuristics to distinguish main ideas from unimportant information (Hogan, Bridges, Justice, & Cain, 2011;Meyer et al., 1980;E.A. Stevens, 2018;Taylor, 1985;Winograd, 1984).In addition, it helps students understand how these main ideas are organized, which helps them write coherent summaries (Miyatsu et al., 2018).Structure-based summarization strategies might also facilitate text recall, as the text structure might function as a mnemonic aid (Taylor, 1982).
In most elementary schools, teachers refrain from providing explicit instruction about specific summarization techniques (Beerwinkle et al., 2018;Elledge, 2013;McKeown, Beck, & Blake, 2009) or struggle themselves with the identification of main ideas (Kucan, Hapgood, & Palincsar, 2011;Turcotte et al., 2015).It is therefore not remarkable that many students struggle to identify main ideas (Baumann, 1983;Hare & Borchardt, 1984), seldom formulate summarizing topic sentences (Garner, 1987;Hare & Borchardt, 1984), and use deletion of propositional expression as their main summarization strategy (Winograd, 1984).Since the early 1980s, researchers have come up with various summarization techniques that are less intuitive, instead relying on the external or internal structure of texts.
The hierarchical summarization strategy (e.g., Taylor, 1982Taylor, , 1985) ) consists of skimming the external organizational text structure (i.e., the headings and subheadings) first and then preparing a skeletal outline based on the headings.Next, students write one main idea per section in their outline.However, the strategy might be limited to texts with an unambiguous heading/subheading format (Armbruster et al., 1987).The rule-based summarization strategy (Brown, Day, & Jones, 1983;McNeil & Donant, 1982) relies more heavily on the internal structure of paragraphs (i.e., identifying topic sentences).This strategy provides students with a set of six summarization rules, based on the work of Kintsch and van Dijk (1978), such as delete redundant information and invent a topic sentence.These rules help students first eliminate information and then rework the remaining bits into a coherent summary (Brown et al., 1983;McNeil & Donant, 1982).For instance, students are taught to highlight topic sentences, circle words that must be replaced by superordinate concepts, and cross out trivia (Braxton, 2009).Rulebased summarization is often combined with text structure recognition.For instance, students learn how signal words and structure-specific questions can be used to identify main ideas (e.g., Elledge, 2013).
Not only the internal structure of the paragraph but also that of the whole text can function as a framework for summarization (Armbruster et al., 1987;de Jou & Sperb, 2009;Ocasio, 2006;E.A. Stevens, 2018;Vidal-Abarca, 1990).For instance, in the studies by Armbruster et al. (1987) and E.A. Stevens (2018), students were taught about the characteristics of the problem-and-solution text structure.Then, students received a specific problemand-solution frame in which they could summarize the main point of a text, as well as a list of structure-specific guidelines for their summary (e.g., "Sentence 1 -Tells who had a problem and what the problem is"; Armbruster et al., 1987, p. 337).Vidal-Abarca (1990) and de Jou and Sperb (2009) explained various text structures and modeled where and how to find the main idea in these texts.
In this meta-analysis, we examined the effects of structure-based summarization techniques.In particular, we analyzed whether the rule-based summarization approach, with its emphasis on internal text structure, has an additional impact on text comprehension.

Instructional Variables
Teaching students about text structures can be done in many different ways.Previous meta-analyses have shown that the implementer plays a crucial role.Researchertaught interventions are often more effective than teachertaught interventions (e.g., Dignath & Büttner, 2008), such as in text structure instruction (Pyle et al., 2017).
Over the past decade, more attention has been paid to the ecological component of text structure instruction by investigating how teachers explain text structure and other evidence-based strategies in their classroom (Beerwinkle et al., 2018;Wijekumar et al., 2019) and how teachers can be trained to teach text structures (Reutzel, Jones, Clark, & Kumar, 2016).Meyer and Ray (2011) emphasized that teachers should have access to adequate instructional materials for modeling and direct instruction, such as by providing them with "intelligent tutors or scripted lessons" (p.138).Williams (2018) also made a plea for second-generation text structure research that goes beyond developing excellent instructional materials and focuses more on the context in which the instruction occurs.In addition, the previous meta-analyses stated that future research should examine the mediating role of instructional features, as both the implementer and the type of instructional activities might affect the effectiveness of text structure interventions (Hebert et al., 2016;Pyle et al., 2017).
The gradual release of responsibility model (Fisher & Frey, 2013;Pearson & Gallagher, 1983) gives a useful framework for describing and comparing current teaching practices.This model suggests sequencing various instructional activities such that the responsibility for the learning process relies mainly on the teacher first (e.g., direct instruction, modeling) and is then gradually transferred to the student with decreasing levels of scaffolding (e.g., guided practice, collaborative activities, individual activities).This also reflects the idea that reading comprehension lessons should follow a pattern of stepwise phasing out the teacher while phasing in the students.They gradually take over the lead from the teacher by applying comprehension strategies first in small groups, then in pairs, and finally individually (Nolte & Singer, 1985;Singer & Donlan, 1980).Many studies on reading instruction have emphasized the importance of direct instruction of cognitive and metacognitive strategies for reading (e.g., Rosenshine & Meister, 1994;R.J. Stevens, Slavin, & Farnish, 1991).
Research has suggested that modeling also plays a pivotal role in increasing reading comprehension, especially when the demonstration of the model includes conditional knowledge: why the model is doing something, as well as metacognitive and motivational aspects (Kostons, Donker, & Opdenakker, 2014).Modeling can raise students' selfefficacy to carry out tasks on their own (van Gog & Rummel, 2010), which may be particularly beneficial to students with low reading self-efficacy.In various studies, positive effects have been found for explicit instruction combined with teacher modeling (Duffy, 2002).
For reading and many other areas of instruction, collaboration has been a successful way to enhance learning.If well implemented, it can improve students' time on task (E.G.Cohen & Benton, 1988), raise academic performance (Slavin, 1987), and increase the quantity and quality of student interactions (Fuchs, Fuchs, Mathes, & Simmons, 1997;Garibaldi, 1979;Vaughn, Klingner, & Bryant, 2001), among other benefits.
Except for analyzing the effects of online tutors versus teachers (Pyle et al., 2017), prior meta-analyses did not analyze the effects of instructional components on the effectiveness of text structure instruction.Therefore, in our meta-analysis, we examined whether it matters if teacher and student activities follow a pattern of gradual release of responsibility.

Inclusion and Exclusion Criteria
We developed a set of inclusion and exclusion criteria to guide our iterative search and selection procedure.Studies were deemed eligible for inclusion in the meta-analysis if the following five criteria were met: 1.The study was published in English, German, French, Dutch, or Spanish between 1974 and 2018 and was available online or could be retrieved directly from the author.2. The study focused on students in general primary education in grades 4-6.Participants could be students with mild reading difficulties but not students with severe learning or reading difficulties, students with hearing problems, and/or secondlanguage learners.3. The treatment group was taught to recognize informational text structures (e.g., description, compareand-contrast, problem-and-solution, cause-and-effect, sequence; see Meyer, 1975) or story structure.Interventions in which text structure was explicitly used as part of a summarization or visualization technique were also included.4. The treatment group was compared with a businessas-usual control group or a control group receiving an alternative instruction.There was no restriction on sample size or sampling procedures as long as the procedure was well documented.5.The study included at least one posttest focused on the comprehension of texts.The posttest could consist of one or several (non)standardized comprehension question tests, cued or free recalls, GO completion, and/or summarization tasks and text structure knowledge tests.
Studies were excluded if the researcher(s) did not provide the statistics necessary to calculate a weighted effect size or if results were summarized over multiple age groups, such that it was impossible to calculate effect sizes for grades 4-6 exclusively, even after contacting those authors.Correlational and qualitative studies were excluded, as well as studies with a within-subjects design or a multiplebaseline design (e.g., Haria & Midgette, 2014).

Search and Selection Procedure
We used a four-step process to conduct a comprehensive search for text structure intervention studies.First, we located studies by using the electronic databases of ERIC, Web of Science, PsycINFO, and Google Scholar.Second, we searched various databases of theses and dissertations (e.g., PQDT Open, EThOS, OpenThesis) to locate unpublished studies.Third, we used a series of Dutch, French, German, and Spanish search engines to identify relevant studies in languages other than English.Fourth, we conducted a cited reference search of previous reviews and meta-analyses (Hebert et al., 2016;Jiang & Grabe, 2007;Meyer & Ray, 2011;Pyle et al., 2017) and checked reference lists of the studies that were judged eligible.
In the literature search, we used the key terms text structure or top-level structure and reading (comprehension) combined with keywords specifying the age group (i.e., primary, elementary, fourth/fifth/sixth grade).In a second search, we also included search terms about the type of text structure (i.e., description, enumeration, classification, cause effect, compare contrast, sequence, chronology, problem solution, story (grammar), narrative) and about the potential ingredients in text structure interventions (i.e., topic sentence, signaling words, cue words, graphic organizer, schematizing, main idea, outline, summary) to maximize the number of articles located.This search yielded approximately 2,900 results.
After removal of the many duplicates from the list and a first title screening, we exported 408 abstracts for closer examination.We read abstracts and Method sections to determine whether these studies qualified for inclusion.We removed 355 articles because participants did not match the required age and aptitude profile (n = 78) or because studies did not meet the criteria for text structure instruction (n = 208; e.g., reciprocal teaching interventions or semantic mapping studies without a focus on text structure), were focused on text manipulations (n = 45), did not measure text comprehension (n = 20), or were published in another language (n = 4), which left us with 52 eligible studies.Eight studies could not be located or provided insufficient statistics for effect size calculation, such that the final set consisted of 44 studies published between 1982 and 2018.

Study Feature Coding
We coded studies on publication type, participants (number and grade), research design (quasi-experimental vs. experimental), type of outcome measure (i.e., comprehension questions, free recall tasks, summarization tasks, text structure knowledge tests), test type (standardized vs. nonstandardized), measurement timepoint (immediate vs. delayed), and reference group (business as usual vs. alternative intervention as control).We also coded text features, such as genre (narrative and/or informational) and the number and exact type of structures (e.g., narrative, description, cause-and-effect, compare-and-contrast, problem-and-solution, sequence).In addition, we coded various content-related and instructional features of both the treatment and control conditions (see Table 1).
We described the instructional content of the interventions by using three higher level descriptors that formed the basis for further analysis.These descriptors were not mutually exclusive: One intervention, for instance, could include both training in text structure recognition and structure visualization.To refine these categories, we also added variables describing whether there was an additional focus on paragraph-level structure, active construction of GOs, and/ or instruction in the rule-based summarization technique.
We defined two instructional variables: teacher activities and student activities.Instruction activities could be primarily focused on telling (i.e., explicit instruction) or showing (i.e., modeling).Student activities were either individual or collaborative.The analysis of these instructional components was based on the description of the intervention and procedures in the article or on an examination of the examples of materials that were provided (e.g., screenshots, examples, teaching materials in appendixes).If the text mentioned that teachers used thinkaloud protocols or demonstrated/modeled the strategy, this was coded as modeling.Similarly, if the authors mentioned working in pairs or small-group activities, we scored this as a collaborative activity.If nothing was mentioned or could be derived logically, this was coded as absence of the instructional feature at hand.
Apart from the distinction between standardized and nonstandardized tests, we also distinguished among four types of outcome measures: comprehension questions, free recall tasks, summarization tasks, and text structure knowledge tests.Outcome measures coded as comprehension questions were all tests that involved multiplechoice items and short-answer questions with the text present (e.g., literal comprehension, referential comprehension, interpretation questions).Recall tasks included tasks in which students were typically asked to read a text and then perform a memory task without the text present.This could be writing or telling everything that they remembered or only the most important information, or cued recall tasks in which they answered factual questions about text content (e.g., where did the story take place?).Summarization tasks included all tasks in which students had to summarize part of a text with the text present, such as student-generated written summaries, GO completion, and tasks focused on highlighting the main ideas.Text structure knowledge tasks included all tests focused on knowledge about specific text structures, such as tasks in which students had to identify the structure of a text segment (e.g., E.A. Stevens, 2018), match segments to a text with a similar structure (e.g., Broer et al., 2002), or select the right structure-specific signaling words in cloze tasks (e.g., Wijekumar et al., 2013).
All studies were coded by the first author.A research assistant was trained in the coding procedure and coded a random sample of four studies (10% of the total sample), with 94% total interobserver agreement and Cohen's kappa values ranging from .83 (instructional approach) to .95 (methodological descriptors).Table 2 provides an overview of all studies and shows immediate and delayed effect sizes per outcome measure.

Descriptor Description
Structure recognition and/or focus on the paragraph level Students are explicitly taught about the internal structure of texts or paragraphs and practice with recognition.The focus on the paragraph level might entail instruction about topic sentences and supporting and concluding sentences.

Structure visualization and/or active construction
Students receive instruction on schematic representations of text structure and content.They study and/or fill out or actively create maps and graphic organizers.

Structure-based or rule-based summarization
Students learn how to summarize a text on the basis of headings and other hierarchical outlining principles or learn techniques for paraphrasing main ideas or condensing text by strictly following a set of rules.

Teacher-led instruction ± modeling
The teacher provides explicit instruction and/or demonstrates structure recognition, structure visualization, or structure-based summarization techniques by thinking aloud in front of the class.

Student activities ± individual practice
Students have opportunities to practice their skills, collaboratively and/or individually.

Effect Size Calculation and Statistical Analyses
For calculation of effect sizes, we used Hedges' g, which is almost similar to Cohen's d (J.Cohen, 1988;Fritz, Morris, & Richler, 2012) but provides an unbiased estimate of effect sizes for the few studies with smaller sample sizes (n < 20) that were included in this meta-analysis (Borenstein, Hedg es, Higgins, & Rothstein, 2009;Cumming, 2012).Although effect sizes of approximately 0.20 are generally classified as small, we interpret them as meaningful because they were obtained in educational contexts, where even effect sizes that are generally classified as small are of interest (see Durlak, 2009;Hedges & Hedberg, 2007).Moreover, the effect sizes that we report in this meta-analysis should be interpreted as additive effects because they represent what students in a text structure condition gained on top of what students learned in an alternative intervention or the regular reading curriculum.
For each study, we calculated the standardized mean differences (Hedges' g) for immediate and delayed effects separately.For the calculation of immediate effects, we subtracted the mean difference in performance of the control group (immediate posttest − pretest) from the mean difference in performance of the treatment group (immediate posttest − pretest), divided by the pooled standard deviation of the two groups.As we were interested in long-term differences between experimental and control groups, we also calculated delayed effect sizes.These are indicative of the effects of text structure instruction that remain, over and above what students learned in business as usual.This calculation was based on comparing the mean performance of the control group versus the treatment group on delayed posttests, divided by the pooled standard deviation of the two groups.A delayed effect of 0 would mean that there were no lasting differences between both groups, whereas a delayed effect of +1 would indicate that the experimental group outperformed the control group by one standard deviation at delayed posttests, either because the experimental group made more progress than the control group or because the control group performed worse while the experimental group maintained their acquired skill with respect to the immediate posttest.
For the six studies that did not report the exact means and standard deviations, we calculated effect sizes based on analyses of variance.Because the sample sizes of these six studies were quite large (n > 42), the calculated value of Cohen's d was practically identical to the Hedges' g value (Borenstein et al., 2009;Fritz et al., 2012).
When multiple outcome measures were used, we calculated separate effect sizes per type of outcome measure (i.e., comprehension questions, recall, summarization, text structure knowledge) so we could show the impact of the intervention variables on each measure of text comprehension separately.Also, we calculated separate effect sizes when multiple text structure interventions were compared within studies.For instance, Ulper and Akkok (2010) investigated the effectiveness of structurebased summarization strategies and the effectiveness of this approach in combination with training in text structure recognition.In these instances, we calculated separate effect sizes per condition, even though these are presented as averaged effect sizes in Table 2.
Because the handling of multiple effect sizes from one study led to statistical dependencies in the data, we aggregated data sets per outcome measure.By taking this approach, we calculated no more than two effect sizes per study: one delayed and one immediate.In the analyses, we allocated more weight to studies with larger sample sizes (Borenstein et al., 2009).We ran all mixed-effect model analyses per outcome measure, so we actually conducted four parallel meta-analyses, in which we subsequently added the aforementioned methodological, content-related, and instructional variables as moderators.After constructing full factorial models, we simplified the models with only the relevant parameters so we could estimate the additive effect of all parameters, without running the risk of overfitting (see the tables in the Appendix for more details).
Due to differences in types of participants, and methodological, content-related, and instructional characteristics, we could not make the assumption of one common effect size.Therefore, we used random effects models, which assume not one true effect size but an effect size distribution.This made it possible for us to generalize to populations beyond the included studies (Borenstein et al., 2009).We also examined the within-class goodness of fit by conducting homogeneity tests (Cooper, 1998) to check whether the variability in effect sizes was so large that moderator analyses were needed.We tested the differences in fit of subsequent (nested) random effects models by means of log likelihood ratio tests.All effect size calculations and moderator analyses were conducted in R using the metafor package (version 3.3.3;Viechtbauer, 2010).A full overview of our stepwise model fitting can be found in the Appendix.

Results
For each outcome measure, we used a random-effects model to assess the overall average effect size.The overall effect size for comprehension questions was g = 0.14, 95% confidence interval (CI) [0.03, 0.25]; for recall, g = 0.30, 95% CI [0.19,0.41];for summarization, g = 0.43, 95% CI [0.24, 0.61]; and for text structure knowledge, g = 0.34, 95% CI [0.28,0.41].We had to be careful with the interpretation of these mean effect sizes, as for all outcome measures, there was statistically significant heterogeneity in effect sizes, Q Questions (88) = 310.37,p < .001;Q Recall (84) = 248.38,p < .001;Q Summarization (51) = 328.47,p < .001;Q Knowledge (26) = 79.87,p < .001.Of course, this heterogeneity can partly be attributed to the fact that the overall effect sizes still include both immediate and delayed effects and various types of research designs.

Control Condition
In some studies (n = 18), the authors explicitly mentioned that the text structure intervention was compared with an alternative instruction instead of to a business-as-usual control group.For instance, text structure instruction was compared with vocabulary instruction (e.g., Fitzgerald & Spiegel, 1983;Gentry, 2006), instruction in cognitive reading strategies such as making predictions and inferences and activating prior knowledge (e.g., Gordon & Pearson, 1983;McLaughlin, 1990;Ocasio, 2006), or more intuitive summarization strategies (Bean & Steenwyk, 1984;Braxton, 2009).However, in most studies, the control groups continued to follow their usual reading curriculum during the intervention.Business as usual typically involved a traditional approach to reading instruction, with students answering questions about the text and the teacher leading class discussions focused on text content, although business as usual also contained elements prevailing in the aforementioned alternative instruction programs (e.g., activating prior knowledge, explaining vocabulary).Both types of control groups occurred with all types of outcome measures.
Stepwise model fitting showed that effect sizes were not systematically different when the intervention was compared with an alternative program instead of business as usual: comprehension questions, ∆χ 2 (1) = 1.18, p = .28;recall, ∆χ 2 (1) = 0.44, p = .51;summarization, ∆χ 2 (1) = 2.16, p = .14;and text structure knowledge, ∆χ 2 (1) = 1.14, p = .29.For example, the estimated effect size of text structure instruction on recall was g = 0.32 (standard error [SE] = 0.06) when compared with an alternative instruction, and g = 0.24 (SE = 0.12) when compared with business as usual, which was not statistically significant.For the remaining analyses, we therefore did not distinguish between the two types of control conditions.

Standardized and Nonstandardized Measures
We used both standardized and nonstandardized measures to evaluate the effects of text structure instruction.For comprehension questions, approximately 22% of the effect sizes were based on standardized measures.For recall, summarization, and text structure knowledge tests, only nonstandardized measures were used.We could not demonstrate an effect of standardization, ∆g = −02, SE = 0.11; ∆χ 2 (1) = 0.04, p = .84.Differences in effect sizes due to text structure instruction were similar when measured with standardized and nonstandardized comprehension questions.

Immediate and Delayed Effects
Not all studies provided data on the maintenance of effects.Delayed posttests were administered in approximately one third of the studies (n = 16).Most of these concerned comprehension questions (n = 12) and recall (n = 6) and took place two or three weeks after completing the intervention (63%).The immediate effect sizes were above 0.20 for all outcome measures and were therefore meaningful: Students who received text structure instruction outperformed the control group on comprehension questions (g = 0.25, SE = 0.07), recall (g = 0.38, SE = 0.06), summarization (g = 0.58, SE = 0.09), and text structure knowledge (g = 0.34, SE = 0.03).However, there was a statistically significant decrease in delayed effects (see Table 3): comprehension questions, ∆χ 2 (1) = 6.89, p < .001;recall, ∆χ 2 (1) = 4.83, p = .03;and summarization, ∆χ 2 (1) = 9.73, p = .002.For text structure knowledge, no delayed effects could be calculated because of a limited number of observations.
In fact, for each of the outcome measures, the differences between groups with or without text structure instruction completely disappeared at the delayed posttests, as none of the estimated effect sizes reached significance: comprehension questions, g = −0.05,SE = 0.11; recall, g = 0.13, SE = 0.11; and summarization, g = −0.06,SE = 0.20.In other words, although the students in the text structure condition outperformed the controls on each outcome measure at the immediate posttest, this difference between conditions was not maintained at the delayed posttest, where students' performance in both conditions was similar.In most cases, the performance of the experimental groups showed a stronger decrease after the immediate posttest than that of the control groups, whose performance remained rather stable or showed a small decrease.

Text Variables
To determine whether the type and number of text structures affected the effect sizes, we successively added genre (informational or narrative) and the number of different text structures taught to the four models.

Genre
Most studies discussed the effects of teaching informational text structures (n = 34).In some studies, students also received instruction in narrative story structure (n = 5) or in narrative story structure only (n = 5).Interventions on informational text structures included all types of outcome measures, whereas interventions involving narrative story structure mainly used comprehension questions and recall tasks.We could not demonstrate an effect of genre on these outcome measures: comprehension questions, ∆χ 2 (1) = 0.12, p = .73;and recall, ∆χ 2 (1) = 0.11, p = .74.There was also no interaction effect of genre on delayed posttest performance: comprehension questions, ∆χ 2 (1) = 0.89, p = .34;and recall, ∆χ 2 (1) = 0.11, p = .74.Interventions focusing on informational text structures only versus studies (also) including narrative texts were comparable in terms of their effects on text comprehension.Because the number of studies focusing on narrative story structure was limited, we could not examine genre effects on summarization and text structure knowledge.

Number of Text Structures
Some intervention studies focused on only one text structure (n = 20), whereas other studies taught up to five structures (n = 9).Overall, the description and compare-and-contrast text structures were taught most frequently.More text structures were taught in interventions with a text structure knowledge test (mean [M] = 3.81, standard deviation [SD] = 1.47) or a summarization task (M = 2.37, SD = 1.44) as the outcome measure than in interventions with comprehension questions (M = 1.53,SD = 1.07) or recall tasks (M = 1.64,SD = 1.35).There was a small negative effect of the number of different text structures on text structure knowledge, ∆g = −0.06,∆χ 2 (1) = 4.64, p = .03;but not on comprehension questions, ∆χ 2 (1) = 0.54, p = .46;recall, ∆χ 2 (1) = 0.003, p = .96;or summarization, ∆χ 2 (1) = 0.21, p = .65.The more different text structures were taught during an intervention, the lower the scores on the text structure knowledge test were (i.e., ∆g = −0.06times the number of different text structures taught; we could not show a curvilinear effect).The number of text structures taught did not matter when students made a summary, answered comprehension questions, or carried out a recall task.
We also checked for an interaction effect between the number of text structures taught and the measurement timepoint to see if the number of text structures mattered for students' performance on delayed posttests.This interaction effect was not found for any of the outcome measures: comprehension questions, ∆χ 2 (2) = 3.44, p = .18;recall, ∆χ 2 (2) = 0.02, p = .99;summarization, ∆χ 2 (2) = 0.21, p = .90;and text structure knowledge, ∆χ 2 (1) = 4.57, p = .03.The number of different text structures taught did not affect maintenance effects.

Content Features and Instructional Components
We analyzed whether the effects of text structure instruction were affected by the content features and instructional components listed in Table 1.Because of a limited number of observations, we did not perform this moderator analysis for text structure knowledge.All final models were an improvement relative to the models without moderating content-related and instructional variables: comprehension questions, ∆χ 2 (5) = 14.13, p = .015;recall, ∆χ 2 (5) = 15.49,p = .008;and summarization, ∆χ 2 (3) = 20.81,p = .001.The four final models explain 25-27% of the variance in effect sizes.Next, we discuss the outcomes of these moderator analyses.Note.Delayed effect sizes are the difference between delayed and immediate posttest scores.Due to power insufficiencies (n = 3), no delayed effect sizes were calculated for text structure knowledge.

Content Features
Table 4 shows the estimated immediate effects per outcome measure and the estimated additional effects of various content features and instructional components.The parameter estimates from the final models (see the Appendix) show that not all features contributed evenly to the effects on the different outcome measures.Training in text structure recognition had a statistically significant effect on answering comprehension questions (g = 0.98, SE = 0.30, p = .001)and recall (g = 1.03,SE = 0.39, p = .009).For recall, instruction on paragraph-level structure also mattered (∆g = 0.57, SE = 0.29, p = .03),but this was not the case for comprehension questions.For summarization skills, it was specifically a focus on paragraphlevel structure that mattered (∆g = 0.91, SE = 0.22, p < .001),whereas training in only top-level text structure recognition did not improve students' summarization skills (∆g = 0.22, SE = 0.40, p = .58).Apparently, when it comes to summarizing, students benefit most from text structure instruction that also focuses on the internal structure of paragraphs (g = 0.91 + 0.22 = 1.13).
General attention to structure visualizations had no demonstrable impact on students' performance on text comprehension questions unless intervention programs emphasized the actual construction of and story maps (∆g = 0.39, SE = 0.15, p = .009).For recall, simple exposure to GOs had a negative effect (∆g = −0.44,SE = 0.18, p = .02),whereas active construction had a positive effect (∆g = 0.51, SE = 0.17, p = .002).This shows that when interventions asked students to actively create or fill out maps or GOs in addition to practicing text structure recognition, this had an effect on comprehension questions (g = 0.64) and recall (g = 1.03).
Structure-based summarization training in general had no demonstrable additional effect on comprehension question answering, recall, or summarization skills over and above training in text structure recognition.However, specific training in the rule-based summarization technique positively affected summarization skills (∆g = 0.64, SE = 0.21, p = .005)and recall (∆g = 0.34, SE = 0.12, p = .004)but had no statistically significant impact on comprehension questions.When interventions trained students to apply a fixed set of structure-based rules to summarize text, this resulted in net immediate effects of g = 0.73 on summarization and g = 1.12 on recall.
None of the content-related features had a demonstrable impact on delayed posttest performance.As we reported earlier, the differences between students receiving text structure instruction and students in the control groups that were visible immediately after finishing the intervention program disappeared at the delayed posttests.

Instructional Components
There was no demonstrable additional effect of instructional features on immediate measures.Interventions including teacher modeling or individual student practice resulted in similar effect sizes as interventions with only explicit instruction or collaborative activities.However, instructional components affected the delayed effect on comprehension questions, ∆χ 2 (4) = 9.39, p = .05.Although teacher modeling did not have a demonstrable effect, we found that when interventions lacked individual activities, students performed much worse on comprehension questions in the delayed posttest (∆g = −1.04,SE = 0.39, p = .007).In other words, individual activities worked as a protecting factor against the relapse in scores on delayed posttests.With individual practice during the intervention program, the delayed effect for text structure instruction on comprehension questions would be g = 0.82, instead of g = −0.23 without individual practice.

Funnel Plot
Figure 1 shows the funnel plot of the four final models combined.The residuals of the final models with the explanatory variables are plotted against the standard errors.Most points were located in the region between the straight lines.For all outcome measures taken together, only 16 effect sizes (5.7% of the total sample) were identified as outliers.In both the lower bound and upper bound outliers, effect sizes were based on various outcome measures, although comprehension question effect sizes were slightly overrepresented as lower bound outliers.

Immediate and Delayed Effects of Text Structure Instruction (Research Question 1)
Our study reveals that text structure instruction can improve comprehension skills in grades 4-6.
Narrative and expository interventions seem equally effective in this age group.Contrary to previous meta-analyses (Hebert et al., 2016;Pyle et al., 2017), we could not demonstrate a difference between standardized and nonstandardized tests, but our meta-analysis shows that the type of outcome measure (comprehension questions, recall, summarization, and/or text structure knowledge) has a dramatic impact on effect sizes.When compared with regular reading comprehension instruction, text structure instruction has an overall immediate effect of g = 0.25 on comprehension questions, g = 0.38 on recall, g = 0.58 on summarization, and g = 0.34 on text structure knowledge.However, at delayed posttests, the differences between groups that received text structure instruction or regular reading instruction could no longer be demonstrated.At first glance, the overall effect of text structure instruction on comprehension questions might seem relatively low (g = 0.25), but this constitutes an additional effect over and above business-as-usual gains in reading comprehension.Moreover, the effect sizes need to be evaluated in context: They were obtained in authentic educational contexts, not in controlled lab settings, so effect sizes of approximately 0.20 that are often classified as small are actually of policy interest (Durlak, 2009;Hedges & Hedberg, 2007).Furthermore, the effect size on comprehension questions is similar to the effect of other educational interventions (e.g., g = 0.24 in Lipsey et al., 2012).Therefore, text structure instruction should be considered as a way to support the transition to reading-to-learn skills so the persistent fourth-grade slump effect can be reduced (Chall & Jacobs, 1983).We found that text structure instruction had a similar additional effect when compared with either business-asusual reading instruction or a heterogeneous subset of alternative interventions (e.g., vocabulary instruction, cognitive reading strategies).This does not imply that, in general, these alternative interventions are not better than business-as-usual instruction.In fact, various metaanalyses have demonstrated the positive effect of vocabulary instruction (Stahl & Fairbanks, 1986) and (meta)cognitive learning strategies (e.g., Donker, de Boer, Kostons, van Ewijk, & van der Werf, 2014) on comprehension.Our result might be due to the fact that the control condition consisted of a very heterogeneous subset of alternative interventions, resulting in a baseline reference group that is comparable to business-as-usual instruction.
The effect size on comprehension questions (g = 0.25) was lower than on the other outcome measures; on tests in which students had to apply their knowledge of text structure more directly (e.g., recognizing text structures), the immediate effect size was g = 0.34.Although comprehension questions are presented as one outcome measure, they can target various aspects of a text and thereby might still measure different things (Keenan et al., 2008;Nation & Snowling, 1997).Because text structure instruction provides students with the knowledge and tools to process text into more coherent and organized mental schemata (Kintsch, 1988(Kintsch, , 2013;;Kintsch & van Dijk, 1978), it seems to matter whether comprehension questions involve surface code, textbase, or situation model comprehension skills (Kintsch, 1988(Kintsch, , 2013)).Unfortunately, most of the studies did not specify whether comprehension questions concerned situation model questions (e.g., main idea questions) or local text issues (e.g., referential coherence) that might be answered without full understanding at the level of the situation model (van den Broek & Kremer, 2000).
Tasks that tap more into situation model comprehension, such as recall and summarization, might be more suitable candidates for evaluating effects of text structure instruction.Indeed, summary and recall measures yielded larger overall effects of text structure instruction (g = 0.37 and .57,respectively), which is in line with the metaanalytic findings of Pyle and colleagues (2017), who showed that the overall effect of text structure instruction was larger on GO tasks than on comprehension questions.However, even for summarization and recall tasks, it is important to note that they yield qualitatively different results: Summarization tasks typically evoke more main ideas, whereas recall tasks might evoke details as well (Riley & Lee, 1996).
Our meta-analysis shows that outcome measure matters for evaluating the effectiveness of text structure interventions.As each type of outcome measures might rely on a slightly different constellation of comprehension skills and thereby measure different aspects of the reading process (Brown et al., 1983), future text structure research should include multiple outcome measures (Bohaty, Hebert, Nelson, & Brown, 2015), as in the studies by Meyer et al. (2002;Meyer, Wijekumar, & Lin, 2011) and Wijekumar et al. (2012Wijekumar et al. ( , 2013Wijekumar et al. ( , 2014)).Another important step would be to disentangle how different outcome measures for reading tap into the numerous skills involved in comprehension processes (e.g., vocabulary, strategic knowledge; Graesser et al., 1994;Keenan et al., 2008), so reading interventions can be evaluated more adequately.
An interesting but disappointing finding is the fact that at delayed posttests, the students in the experimental condition no longer outperformed the control group.This resonates with the findings by Hebert and colleagues (2016) that delayed effects of text structure instruction are much smaller and less consistent than immediate effects.In fact, we found that in many studies, the performance of the experimental group decreased between immediate and delayed posttest, whereas the control group performed rather similarly on immediate and delayed posttests or showed a small decrease as well.
A methodological factor that might contribute to this finding is the fact that delayed posttests sometimes required transfer when students were tested on untaught text structures.Due to a limited number of studies with delayed posttests, we could not examine this transfer effect, but Hebert and colleagues (2016) showed much smaller effects in far-transfer cases.Another explanation for the lack of maintenance is the fact that the intervention studies made no effort to promote maintenance; the highlights of text structure instruction were not repeated in the period between the immediate and delayed posttests.Finally, the quality of business-as-usual instruction might be insufficient to help students maintain their newly acquired knowledge about text structures.Teachers in the upper elementary grades often fail to employ evidence-based approaches (Duke et al., 2011;Wijekumar et al., 2019), and struggle themselves with text structure recognition (Beerwinkle et al., 2018;Bogaerds-Hazenberg, Evers-Vermeul, & van den Bergh, 2019;Reutzel et al., 2016) and main idea identification (Kucan et al., 2011).Also, regular textbooks for grades 4 and 5 typically have not included much text structure instruction (Beerwinkle et al., 2018).

Content-Related and Instructional Variables Moderating the Effects (Research Question 2)
The ability to recognize text structures is beneficial for increasing reading comprehension, irrespective of outcome measure and genre.Our meta-analysis also shows that the effects of moderating content-related and instructional variables are different per outcome measure, which moves the field beyond the question of what elements to include in text structure instruction (e.g., including GOs) by providing some insight into how to include and refine these elements (e.g., how to offer GOs).

Content-Related Variables
When it comes to the effect sizes for summarization and recall, students particularly benefit from text structure instruction that focuses on the paragraph-level structures (g = 1.60 on recall and 1.13 on summarization), not just on top-level structures.This corroborates the claim that successful summarization training combines text structure recognition and main idea identification within paragraphs (Miyatsu et al., 2018).The large effect of text structure instruction on summarization supports the hypothesis that text structure provides students with the necessary knowledge and tools to distinguish important from unimportant information (Hogan et al., 2011;Meyer et al., 1980;Taylor, 1985;Winograd, 1984) and that text structure helps students see how these main ideas are organized at a higher level (Miyatsu et al., 2018).Moreover, it can function as a mnemonic aid that improves text recall (Taylor, 1982).Alternatively, it might be easier to establish differences in main ideas at the paragraph level than at the text level, as a paragraph-level focus invites students to produce multiple main ideas instead of a single main idea for the text as a whole.Because all of the interventions included instruction about signaling words, we could not analyze whether a focus on signaling words moderated the effects of text structure instruction.
Structure-based summarization also improves students' text comprehension, especially when students learn specific rules and tricks for paraphrasing the main idea (e.g., rulebased summarization).This yielded net immediate effects of g = 1.15 for recall and 0.43 for summarization.This corroborates the idea that explicit knowledge about text structure (e.g., structure-specific questions, signaling words) can provide students with useful tools to identify and formulate main ideas and to reorganize these in a coherent way (Elledge, 2013;Meyer et al., 1980;Miyatsu et al., 2018;E.A. Stevens, 2018;Taylor, 1985) while eliminating redundant information (Brown et al., 1983;McNeil & Donant, 1982).Whereas structure-based summarization techniques were not included as a moderating variable in previous metaanalyses (Hebert et al., 2016;Pyle et al., 2017), our metaanalysis shows that it is an important ingredient of successful text structure interventions.Summarization strategies based on the internal structure of paragraphs seem more helpful in improving students' performance than strategies based on external markers of text structure (i.e., headings and subheadings), possibly because the former provide students with the necessary skills to distill main ideas from the text, even in the absence of unambiguous external markers of text structure.
Structure visualizations often have been part of larger text structure strategy interventions but also have been used in various studies on GOs and story mapping.Over the years, GO research has generated mixed results, which has often been attributed to the types of GOs used, to the level of instructional support that was provided, or to whether they were used as a prereading or postreading activity (Griffin et al., 1995;Jiang & Grabe, 2007).Although the authors of previous meta-analyses suggested that GOs might increase the effectiveness of text structure interventions (Hebert et al., 2016;Pyle et al., 2017), they did not examine their presence as a moderating variable.Our meta-analysis shows that the inclusion of structure-based visualizations has positive effects on comprehension and recall, as long as students actively fill out these maps and GOs.Simple exposure has no demonstrable effects or even a negative effect on recall.It seems crucial that text structure instruction provides ample opportunities for students to practice filling out structurebased GOs and maps after teacher-led instruction.This underscores the importance of ensuring an instructional approach that is characterized by a gradual release of responsibility from teacher to student (e.g., Fisher & Frey, 2013;Pearson & Gallagher, 1983).
The importance of the active construction of GOs contradicts the conclusion of Stull and Mayer (2007), who found that constructing GOs increased the extraneous cognitive processing load and interfered with learning.In the studies included in our meta-analysis, we found that constructing GOs consisted of students filling out missing information in a teacher-supplied GO and did not require them to draw the whole structure by themselves (i.e., choosing the right boxes, adding arrows).Possibly, this is a less complex task for students, as the text structure is scaffolded in the GO, and students' only concern is to find the right ideas to put in the boxes.This type of task might reduce extraneous load but still fits with the theory that deep learning occurs when students are encouraged to engage in productive learning activities (e.g., Mayer, 2003), so even finishing a partially completed GO provides an opportunity for deep text processing (Jiang & Grabe, 2007).With current technological trends and the development of digital mapping software, it seems relevant to explore digital opportunities for incorporating more learning activities focused on text structure visualizations in classrooms.Although previous meta-analyses showed that teaching multiple text structures has a positive impact on students' performance (Hebert et al., 2016;Pyle et al., 2017), we could not demonstrate this effect.Only for text structure knowledge did we find that students' performance tended to be slightly lower when they encountered more different structures.This seems logical, as students need to remember more different types of structures on these text structure knowledge tests.Still, teaching multiple structures has at least no demonstrable negative impact on the other comprehension measures.Therefore, we believe that teaching multiple text structures is useful, as students can learn from comparing and contrasting the characteristics of various structures, become more aware of the differences between structures, and possibly even transfer knowledge to untaught text structures (Hebert et al., 2016).Also from a practical perspective, it is important for students to recognize more than one structure, as most (educational) texts are a combination of multiple text structures nested within one another (Jiang & Grabe, 2007).As Pyle et al. (2017) also suggested, we believe that it is worthwhile to engage in further research that addresses the order and complexity of different text structures that are taught.

Instructional Components
Although it matters how a skill is taught to students, most meta-analyses on literacy research have not evaluated the effect of instructional components such as modeling or collaborative practice, possibly because intervention descriptions often extensively focus on content.As Pyle et al. (2017) pointed out, the term explicit instruction is often used in describing text structure interventions but is in itself very broad in terms of instructional features present (see also Archer & Hughes, 2011).Despite the limited descriptions of the instructional approach in most research articles, our study still demonstrates that the instructional approach moderates delayed but not immediate effects: Interventions including individual student activities generated slightly higher effects on comprehension questions.Although several instructional models hypothesize practice with peers to be an important step in the gradual release of responsibility from teacher to student (e.g., Fisher & Frey, 2013;Pearson & Gallagher, 1983), we could not demonstrate a moderating effect of collaborative activities.Due to the poor description of instructional features in most of the studies, it is hard to interpret this finding.Students who do not practice alone, only with their peers, may not fully acquire the skill that they have to learn, easily forgetting it and therefore failing on delayed posttests.Alternatively, activities labeled as collaborative in the intervention might not actually have met the criteria for effective cooperation (see Johnson & Johnson, 2017).
Several researchers have expressed the need to pay more attention to the fidelity of implementation in text structure intervention studies (Bohaty et al., 2015).We believe that the quality of instructional components should also be included in such evaluations.More specifically, parallel to the recommendations made by writing researchers (Bouwer & De Smedt, 2018), future reading research articles should systematically provide details on the intervention context and on the design principles of the intervention, at both a macrolevel (i.e., focus and mode of instruction, sequencing of content) and microlevel (e.g., instructional activities, learning activities, materials).This will increase transparency of intervention results and might also promote implementation of concrete activities in educational contexts (Fidalgo, Harris, & Braaksma, 2018).The gradual release of responsibility model (Fisher & Frey, 2013;Pearson & Gallagher, 1983) provides a useful framework for reading researchers to more systematically describe, test, and evaluate the quality of the instructional components.
Given that we know from various meta-analyses that text structure instruction is effective (the what) for students of various ages (the when), it is important that future studies focus on instructional practice (the how).Now is the time to examine the effectiveness of a greater variety of instructional features in the context of reading instruction so the field can ameliorate the context in which text structure instruction is given (Williams, 2018).A first attempt was already made by Meyer et al. (2002Meyer et al. ( , 2010) ) and Wijekumar et al. (2012Wijekumar et al. ( , 2013Wijekumar et al. ( , 2014)), who tried to unravel the effects of providing, for instance, individualized feedback or the effects of tutoring in the context of a web-based intervention.Qualitative research has been undertaken to qualitatively describe teacher instructional practices and pedagogical content knowledge in the context of text structure interventions (Beerwinkle et al., 2018;Bogaerds-Hazenberg et al., 2019;Wijekumar et al., 2019), which provided more insight into the instructional components that influence intervention success.
In sum, our meta-analysis shows that text structure instruction has a positive effect on students' reading comprehension skills over and above regular reading programs: It improves their performance on comprehension questions, recall, and summarization tasks and has a positive effect on their text structure knowledge.However, at delayed posttests, differences between experimental and control groups can no longer be demonstrated.Hence, it seems a promising avenue to incorporate text structure instruction into primary school curricula so students' comprehension skills can be strengthened and positive effects maintained over time.sentences, supporting details), active use of GOs, and teaching rule-based summarization.This improved model fit for all outcome measures: comprehension questions, ∆χ 2 (2) = 6.57, p = .04);recall, ∆χ 2 (5) = 15.49,p = .002;and summarization, ∆χ 2 (5) = 15.49,p = .002.For recall, we could not simplify the full factorial model because almost all parameter estimates were statistically significant.For the other outcome measures, we concluded that the simplified model had no reduced model fit when compared with the full factorial model: comprehension questions, ∆χ 2 (3) = 1.46, p = .69;and summarization, ∆χ 2 (2) = 0.79, p = .67.Subsequently, we added two instructional features to the model: teacher modeling (in addition to or instead of explicit instruction only) and individual student activities (in addition to or instead of collaborative activities only), both also in interaction with measure ment timepoint.Adding these instructional components only improved the model fit for comprehension questions, ∆χ 2 (4) = 9.39, p = .05,not for the other outcome measures.

FIGURE 1 Funnel
FIGURE 1 Funnel Plot of the Final Model This work was supported by a Research Talent grant (project 406.16.052) by the Netherlands Organization for Scientific Research (NWO).

TABLE 4 Immediate Effects (∆g) for Content-Related and Instructional Variables Variable Outcome measure Comprehension questions Recall Summarization
Note. ns = not statistically significant.Empty cells are nonsignificant.

Overview of Stepwise Model Fitting for Comprehension Questions
Note. df = degrees of freedom; MO = teacher modeling; NI = no individual activities; SN = structure number; SS = structure summarization; SV = structure visualization.The model fit for reduced M6c is not different from the full factorial M6b, ∆χ 2 (3) = 1.46, p = .69;similarly, the model fit for reduced M7b is not different from the full M7a, ∆χ 2 (1) = 1.83, p = .18.Parameter estimates are based on M7b.