Running head: Comprehension of science texts
Prior knowledge, reading skill, and text cohesion in the comprehension
of science texts
Yasuhiro Ozuru *, Kyle Dempsey, Danielle S. McNamara
University of Memphis, Psychology Building, Memphis, TN 38152-3230, USA
This study examined how text features (i.e., cohesion) and individual
differences (i.e., reading skill and prior knowledge) contribute to biology text
comprehension. College students with low and high levels of biology knowledge
read two biology texts, one of which was high in cohesion and the other low in
cohesion. The two groups were similar in reading skill. Participants’ text
comprehension was assessed with open-ended comprehension questions that measure
different levels of comprehension (i.e., text-based, local-bridging, global-bridging).
Results indicated: (a) reading a high-cohesion text improved text-based
comprehension; (b) overall comprehension was positively correlated with
participants’ prior knowledge, and (c) the degree to which participants benefited
from reading a high-cohesion text depended on participants’ reading skill, such that
skilled participants gained more from high-cohesion text.
Key words: Text comprehension; Text cohesion; Reading skill; Science learning
* Corresponding author.
E-mail: email@example.com or firstname.lastname@example.org (Y. Ozuru)
Comprehension of expository materials is a complex process that depends on
a number of factors. For example, past research on expository text comprehension
has established that how well an individual comprehends and learns from expository
texts is a function of a complex interaction between individual differences and text
features (Linderholm, Everson, van den Broek, Mischinski, Crittenden, & Samuels,
2001; McNamara, Kintsch, Songer, & Kintsch, 1996; O’Reilly & McNamara, 2006;
Voss & Silfies, 1996).
However, these studies have not been in complete agreement on how
individual differences (e.g., reading skill, prior knowledge) interact with text features
(e.g., text difficulty, text cohesion) in comprehension processes. The goal of this
article is to discern the specific nature of the contributions of two types of individual
difference factors and text features to science text comprehension. The two
individual difference factors examined are topic-relevant prior knowledge and
reading comprehension skill. The specific text feature we focus on is text cohesion,
which refers to the extent to which ideas conveyed in the text are made explicit
(Graesser, McNamara, Louwerse, & Cai, 2004).
Topic-relevant prior knowledge refers to readers’ preexisting knowledge
related to the text content and is often measured with open-ended and/or multiple
choice questions on vocabulary and relevant factual information (see Shapiro, 2004).
Readers’ topic-relevant knowledge is expected to have a large influence on text
comprehension because information explicitly stated in the text is often insufficient
for the construction of a coherent mental representation of the situation depicted by
the text, requiring the contribution of reader knowledge (Kintsch, 1988, 1998). In
support of this argument, empirical evidence has shown that a reader’s prior
knowledge facilitates and enhances text comprehension, in particular, of expository
materials (Afflerbach, 1986; Chi, Feltovich, & Glaser, 1981).
On the other hand, reading skill generally refers to cognitive skills associated
with the reading process in general (Gernsbacher, Varner, & Faust, 1990; Hannon &
Daneman, 2001; Walker, 1987), and may include a variety of abilities such as word
decoding (Perfetti, 1985), syntactic knowledge, and high-level inferential skills
(Oakhill & Yuill, 1996). Of all these abilities related to reading skill, one of the most
important elements is the ability and propensity to connect various concepts or ideas
contained in the text in a coherent manner (Hannon & Daneman, 2001; Oakhill &
Yuill, 1996; for a review also see Zwaan & Singer, 2003).
Currently, research remains inconclusive concerning precisely what kind of
cognitive factors underlie readers’ ability and propensity to integrate textual
information to maintain a high level of coherence. Ability to suppress irrelevant
information (Gernsbacher et al., 1997; cf. McNamara & McDaniel, 2004), working
memory capacity (Daneman & Hannon, 2001), metacognition (Hacker, 1998),
reading strategies (McNamara, 2007), and motivation (Guthrie & Wigfield, 1999)
are some of the major candidate factors underlying reading comprehension skill.
Although reading comprehension skill and prior knowledge may not be
completely separable, they are assumed to contribute to reading comprehension
processes in somewhat different ways (Walker, 1987; Hannon & Daneman, 2001).
Prior knowledge helps readers compensate for gaps in text-based information by
affording quick and relatively effortless access to relevant information in long-term
memory based on incomplete text-based information as cues. In contrast, reading
comprehension skill helps readers relate multiple ideas and concepts appearing in
different parts of a text through effortful inferential processes (Daneman & Hannon,
2001). This process, of relating multiple ideas, helps readers build more integrated
understanding of text content even when readers do not have high levels of prior
knowledge to facilitate knowledge-driven recognition of the text content. Given the
different roles of prior knowledge and reading skill in reading comprehension, the
relative contribution of prior knowledge and reading skill should change depending
on the type of text and the level of comprehension involved (i.e., the types of
questions used to assess comprehension).
1.1. Prior knowledge and reading skill in relation to text cohesion
Even within a specific genre and topic (i.e., an expository biology text), texts
can vary in a number of different ways. Cohesiveness of a text is one of the
important dimensions along which text varies. Cohesiveness of a text, an objective
feature of texts, is an important factor to determine text coherence, which is a
subjective psychological state of a reader (Graesser, McNamara, & Louwerse, 2003;
Halliday & Hasan, 1976). When comprehending a text, readers must establish and
maintain coherence between sentences (Oakhill, Cain, & Bryant, 2003; van den
Broek, 1994). When reading a highly cohesive text, the majority of information
necessary to maintain text coherence is provided by the text itself. On the other hand,
when reading a less cohesive text, readers need to rely more heavily on relevant
knowledge to maintain coherence.
Text cohesion changes by the way in which adjacent sentences are connected,
such as the degree of conceptual overlap (e.g., argument overlap) between sentences
and by presence of specific cues (e.g., connectives) that help readers connect ideas
across sentences. Text cohesion also varies as a function of the overall organization
which can be expressed by the temporal and causal sequence of the events in the text
(Linderholm et al., 2001) and the presence of headers and topic sentences
(McNamara et al., 1996). These features contribute to the maintenance of text
coherence by reducing the need for the reader to rely on knowledge. The
cohesiveness of text differs from text readability or difficulty which usually refers to
sentence length and individual word difficulty, e.g., Flesch Reading Ease (Flesch,
Given that text cohesion influences readers’ maintenance of text coherence,
readers’ prior knowledge and reading skill should interact with text cohesion in
different ways in influencing comprehension. With respect to readers’ prior
knowledge level, the benefit of text cohesion should be more pronounced for readers
with less knowledge. That is, whereas maintenance of text coherence in a less
cohesive text demands contribution of specific knowledge, a highly cohesive text is
more self-contained, hence, requires less contribution of topic-specific knowledge
for maintenance of text coherence. This notion is supported by the finding that low-
knowledge readers benefit from reading high-cohesion texts, whereas high-
knowledge readers’ comprehension often suffers when reading high-cohesion texts
(McNamara et al., 1996; McNamara & Kintsch, 1996).
On the other hand, reading skill is expected to interact with text cohesion
differently. Readers with poor reading comprehension skill may not benefit as much
as skilled readers from reading a high-cohesion text because increasing text cohesion
often involves adding more information (Beck, McKeown, Sinatra, & Loxterman,
1991), resulting in increased text length, density, and complexity. As a consequence,
comprehension of a highly cohesive text may require higher level of reading skill
because reading a highly cohesive text involves processing larger amounts of text-
This proposal is not entirely new. For example, the Cognitive Load Theory
(CLT) postulates that many types of cognitive tasks, including reading
comprehension, can be understood as a process by which task performers negotiate
with the task demands using two resources: a limited cognitive resource determined
by working memory capacity and the ability to access large amount of relevant
knowledge (e.g., schema activation) with relatively little cognitive resources
(Sweller, 1999). According to this view, reading comprehension performance is
likely to be affected not only by the extensiveness of readers’ knowledge but also by
information-processing demands determined by the text features (e.g., amount of
information that readers need to process using limited cognitive resources).
A study conducted by Voss and Silfies (1996) attempted to address this issue.
They reported that the ability to comprehend a less cohesive text is more closely
correlated with a reader’s prior knowledge level whereas the ability to comprehend
an expanded and more cohesive text is more closely related with reading skill.
However, the Voss and Silfies (1996) study is limited for several reasons. First, their
study does not provide much information about the benefit (or detrimental effect) of
cohesion. Therefore, who benefits more from reading a high as opposed to a low-
cohesion text remains unanswered. Another limitation is the text genre. Their study,
similar to many other studies of text revision (Beck et al., 1991; Linderholm et al.,
2001), was conducted with history texts (though the content was fictitious). Natural
science texts (e.g., biology texts) differ from social studies or history texts in that
natural science texts tend to present a number of new and abstract concepts (e.g.,
osmosis, mitosis, etc.) and their relations (e.g., relations between endotherms and
warm blooded animals). These scientific concepts are often difficult to ground in
everyday experiences (Graesser, Leon, & Otero, 2002). In contrast, history texts
often present relatively familiar information (e.g., conflict between groups, desire to
gain power, independence) in a novel context (e.g., Russian revolution). Use of
fictitious history texts cannot eliminate this effect of topic familiarity because people
may have general schemata on typical social issues such as conflict, politics, and
financial problems. Thus, the way in which cohesion manipulations influence the
comprehension of social science and natural science texts (e.g., biology texts) may
To explore this issue with science text, O’Reilly and McNamara (2006)
examined the interaction between text cohesion and two types of individual
differences (prior knowledge and reading skill) using a biology text. They found that
low-knowledge readers benefited from reading a high-cohesion text, whereas high-
knowledge readers benefited from reading a high-cohesion text only when they had a
relatively high level of reading skill. In contrast, unskilled, high-knowledge readers’
comprehension was worse for a high-cohesion text.
One limitation of the O’Reilly and McNamara (2006) study is that text
cohesion was manipulated as a between-subjects variable. This design of the study
limits the option of statistical analyses available to examine the interaction between
text cohesion and individual differences, that is, how the relative impact of prior
knowledge and reading skill on comprehension depends on text cohesion. As will be
described in greater detail later, the present study attempted to overcome this
limitation by employing a within-subject manipulation of text cohesion with a new
set of biology texts.
1.2. Effects of prior knowledge and reading skill on comprehension
The other issue that the present study explored is the relative contribution of
reading skill and prior knowledge to comprehension irrespective of text cohesion. As
discussed earlier, there is ample evidence showing that prior knowledge has a large
influence on expository text comprehension (Afflerbach, 1986; Chi et al., 1981). The
present study attempted to extend these findings by exploring whether, and how, the
relative contribution of prior knowledge changes depending on the level of
By level of comprehension we refer to distinctions made by “text base” and
“situation model” level comprehension (Kintsch, 1998). In this paper, we adopted a
rather loose distinction between text base and situation model by assuming some
continuity between them; that is, we assumed that some types of comprehension
involve less integration of information (closer to text base), and other types of
comprehension involve more extensive integration of information (closer to situation
model), and these different levels of comprehension can be assessed by different
types of comprehension questions.
According to this notion, closer-to-text base comprehension can be
operationally defined as performance on comprehension questions that require
minimal information integration (i.e., information explicitly stated within a sentence).
On the other hand, closer-to-situation model level comprehension can be
operationally defined by performance on comprehension questions that require more
extensive information integration (i.e., bridging that involves integration of
information across two or more sentences). We explored changes in the relative
contribution of prior knowledge to text comprehension as a function of different
types of comprehension questions.
1.3. Research questions – Hypotheses
Hence, this study explored two research questions: First, how does the
relative contribution of prior knowledge and reading skill to comprehension change
as a function of type of comprehension questions? Second, what is the relative
contribution of prior knowledge and reading skill to the benefit and/or detrimental
effect of text cohesion on comprehension?
To explore these research questions, participants read both low- and high-
cohesion versions of biology texts, and answered three types of comprehension
questions—that is, text-based questions, local-bridging questions, and global-
bridging questions—based on memory of the text content.
As regards the first research question, we predicted that prior knowledge has
a greater contribution to comprehension of science text than reading skill
(Hypothesis 1). In addition, we also hypothesized that the contribution of prior
knowledge to performance on comprehension questions would be larger for the types
of questions that require more extensive information integration—i.e., local- and
global-bridging questions as opposed to text-based questions (Hypothesis 2). This
prediction is based on the assumption that readers may not generate inferences to
attain global level comprehension unless such an inference can be drawn or readily
retrieved from preexisting knowledge (Kintsch, 1993) and with relatively little
expenditure of cognitive resources (McKoon & Ratcliff, 1992). The above
assumption is supported by the finding that readers often do not generate inferences
(e.g., causal backward inferences) involving multiple related concepts (i.e.,
synthesizing the multiple text-based ideas) online when reading unfamiliar science
materials (Noordman, Vonk, & Kempff, 1992). That is, when readers are able to
answer local- and global-bridging questions based on science materials, their answer
is likely to be largely based on retrieval of preexisting knowledge as opposed to the
product of resource consuming reasoning processes that involve linking multiple
ideas in the text while reading the text.
With respect to the second research question, we expected both prior
knowledge and reading skill to interact with text cohesion. Specifically, we expected
a benefit of increased cohesion on comprehension especially for low-knowledge
participants because cohesive text fills in conceptual gaps for low-knowledge readers
that cannot be resolved with prior knowledge (Hypothesis 3). We also expected the
benefit of reading high-cohesion text to be generally larger for more-skilled readers
because reading skill is necessary for taking advantage of added cohesion. In
particular, this interaction between reading skill and text cohesion was expected to be
pronounced for high-knowledge readers; that is, we expected high-knowledge
participants’ comprehension to remain the same across low- and high-cohesion texts
or even decline when reading high-cohesion texts (McNamara et al., 1996) unless the
participant has a high level of reading comprehension skill (O’Reilly & McNamara,
2006) (Hypothesis 4).
The formulation of Hypothesis 4 was based on the notion that a high-
cohesion text may contain information that is more familiar to high-knowledge
readers. As a consequence, they may more shallowly process the high-cohesion text
because of a false sense of understanding (McNamara et al., 1996). This will occur
unless they have a higher level of reading skill, which is typically associated with
tendency to carefully and systematically process textual information and to generate
inferences that relate multiple concepts in the text (O’Reilly & McNamara, 2006).
Thus, reading skill induces a high-knowledge reader to actively process the text
regardless of its cohesion.
The above explanation is on the surface somewhat similar to the “expertise
reversal effect” identified in the context of CLT (Kalyuga, Ayres, Chandler, &
Sweller, 2003). According to the expertise reversal effect, providing high-knowledge
participants with highly cohesive texts with “unnecessary details” can produce
adverse effects due to interference because these high-knowledge readers can
efficiently maintain coherence based on their own knowledge alone. The expertise
reversal effect also implies that interference from added cohesion is likely to be more
pronounced among less-skilled readers because less skilled readers are typically less
efficient in using limited cognitive resources when processing a large amount of
information (Daneman & Hannon, 2001). These two explanations, one based on less-
skilled high-knowledge readers’ tendency to shallowly process a high-cohesion text
and the other based on the expertise reversal effect, are not the same (Kalyuga et al.,
2003). But within the design of the current study, both explanations generate similar
Further, related to the second question, we also explored the level of
comprehension at which the benefit of text cohesion is observed. We expected that
the benefit of reading a high-cohesion text, in particular for low-knowledge readers,
will be limited to relatively lower levels of comprehension—e.g., answering
performance on text-based questions (Hypothesis 5). This proposal is based on the
notion that readers are unlikely to draw inferences (i.e., local- and global-bridging
inferences to attain higher level comprehension) unless they are already familiar with
information related to the text content (Noordman et al., 1992).
Participants were recruited from two distinct sources in order to manipulate
the level of biology knowledge relevant to the reading comprehension materials. One
group of participants was 108 undergraduate students enrolled in an introductory
Psychology course at the University of Memphis of which 93 were female and 15
were male. Mean age of this group was 21.1 years (SD = 3.6) with range of 18 to 37.
The other group of participants was 62 undergraduate students enrolled in an
introductory Biology course at Old Dominion University of which 53 were female
and 9 were male. Mean age of this group of participants was 23.3 years (SD = 2.3)
with range of 21 to 37. The participants from Old Dominion University were
recruited because it was possible to specifically recruit students enrolled in Biology
courses, and this was not possible at the University of Memphis. The two universities
are considered comparable according to the college rankings reported in the U.S.
News, 2007. Testing of the two groups of participants took place within the same
2.2. Design – Materials
The 2 x 2 x 3 experimental design included text cohesion (low and high) and
type of question (text-based, local-bridging, and global-bridging) as within-subjects
variables. Knowledge level of participants (2 levels) was included as a between-
subjects variable (Biology class and Psychology class participants) upon confirming
the distinct knowledge level of the two groups of participants based on prior
knowledge measures. In addition, participants’ reading skill was assessed and
included in the analysis as control variable using a median split technique. Effects of
the two individual differences factors were also analyzed with regression analyses.
2.2.1. Texts and cohesion manipulation
The two texts used for the reading comprehension task were taken from high-
school biology textbooks and were modified to produce low- and high-cohesion
texts. One text described a plant’s response to an external stimulus (Plant text), and
the other described internal distributions of heat in animals (Heat text).
Manipulations to increase cohesion included: (1) replacing ambiguous
pronouns with nouns; (2) adding descriptive elaborations that link unfamiliar
concepts with familiar concepts; (3) adding connectives to specify the relationships
between sentences or ideas; (4) replacing or inserting words to increase the
conceptual overlap between adjacent sentences; (5) adding topic headers; (6) adding
thematic sentences that serve to link each paragraph to the rest of the text and overall
topic; and (7) changing sentence structures to incorporate the additions and
modifications. Appendix A contains an example of the low- and high-cohesion
versions of one of the texts (i.e., Heat text) in which the specific changes are marked.
Table 1 provides key text features related to text cohesion and text difficulty.
As indicated in Table 1, the text revisions increased the text length by approximately
50%. However, this level of increase in the text length is common to past text
revision studies (Beck et al., 1991; Voss & Silfies, 1996). The levels of cohesion and
text difficulty of the two texts were monitored based on text features that are known
to be indicative of text cohesion and text difficulty. The features that are indicative of
text cohesion included argument overlap and Latent Semantic Analysis (LSA) cosine
between sentences. The features, which were indicative of more conventional text
difficulty and readability, included word frequency and Flesch-Kincaid grade level
Argument overlap between sentences represents the proportion of sentence
pairs (adjacent or all) in the text that share an argument. Hence, adjacent and all
sentence-argument overlap measures represent local and global aspects of cohesion,
respectively. The LSA cosine is a proxy measure of conceptual similarity between
linguistic units (Landauer & Dumais, 1997). The LSA approximates conceptual
similarity using a mathematical technique similar to a factor analysis. Thus, LSA
cosine represents text cohesion based on conceptual similarity, not solely by overlap
of a particular word (i.e., argument overlap).
Text difficulty varies due to sentence complexity and vocabulary difficulty.
Text difficulty is controlled by monitoring word frequency and Flesch-Kincaid grade
level. Word frequency is represented by the average word frequency of the lowest
word frequency word in each sentence. Hence, texts with lower word frequency tend
to have rare, less common, content. Flesch-Kincaid grade level is computed based on
word length and sentence length and thus represents difficulty in terms of both
sentence length and word difficulty.
Monitoring of these text features was achieved using the Coh-Metrix tool
(Graesser et al., 2004), a computer-based tool that calculates over 200 measures of
text features. As indicated in Table 1, in both the Heat and Plant text, there were
relatively large increases in Argument Overlap Adjacent, and Argument Overlap All
Sentences, as well as in the LSA Adjacent, and All Sentences, from the low- to high-
cohesion version of the texts. These increases indicate that the cohesion manipulation
increased text cohesion both locally and globally. On the other hand, the word
frequency measure was relatively similar across the low- and high-cohesion versions
of the texts, which suggests similarity of overall content before and after the
manipulation. Finally, Flesch-Kincaid grade level increased as the result of cohesion
manipulation. This was expected given that cohesion manipulation tends to increase
sentence length as the result of adding connectives and other cohesive elements.
Insert Table 1 about here
2.2.2. Reading comprehension questions
There were 12 comprehension questions for each text, of which 4 were text-
based questions, 4 were near- or local-bridging questions, and 4 were far- or global-
bridging questions (see Appendix B). A question was classified as text-based when
the question could be answered based on information explicitly stated within a
sentence. A question was classified as a near- or local-bridging question when the
answer to the question required an integration of information located within five
clauses across multiple sentences (generally adjacent sentences). Far- or global-
bridging questions were similar to local-bridging questions but involved the
integration of information located across larger distances, more than five clauses
apart, and more than two sentences apart.
In scoring the response to these questions, participants’ response to each
question was compared to answer keys which had been constructed prior to the
collection of the data. Whereas all the text-based questions were scored in a binary
manner (i.e., incorrect or correct), bridging questions were scored using continuous
scale involving half or quarter a point depending on the number of ideas involved in
the ideal answer for a specific question. For example, an ideal answer for Example 3
of global-bridging question (“According to the text, how would an endotherm
respond to an ambient temperature of 30 degrees Farenheit?”) should include these
two ideas: (a) An endotherm would increase voluntary/involuntary (e.g., shivering)
muscle movement to generate heat; (b) An endotherm would decrease the blood flow
to extremity to reduce the heat loss to cold surroundings. Participants were awarded
0.5 point for each of these ideas. Participants’ responses to the open-ended questions
were scored independently by two raters, and then compared. Interrater reliability
was greater than 95%. Discrepancies were resolved by discussion.
2.2.3. Individual difference measures
Three types of individual difference measures were collected: reading skill,
biology knowledge, and topic-specific knowledge on the topic of the text. Reading
skill was measured using the Nelson-Denny (Brown, Fishco, & Hanna, 1993)
reading comprehension ability test. The Nelson-Denny reading comprehension
ability test is a standardized reading comprehension test for college level students.
Cronbach’s alpha of the 38 questions based on 170 participants in this study was .90.
Biology knowledge was assessed with 21 multiple-choice questions on
anatomy, reproduction, and genetics, Cronbach’s alpha of these 21 questions was
.61. Topic-specific knowledge questions on plants and the distribution of heat was
measured with a total of 16 open-ended questions on the knowledge of plant biology
(8 questions) and animal circulatory systems (8 questions). Cronbach’s alpha of these
16 questions was .73. The questions in the topic-specific knowledge measure
involved information relevant to understanding the texts, but not provided in the
Participants were first administered the biology knowledge test, followed by
the Nelson-Denny reading skills assessment. Each test (i.e., prior knowledge and
Nelson-Denny) was restricted to 15 minutes. The participants then read the texts and
answered the questions, which were presented in a booklet. Participants read two
texts, one low-cohesion text and one high-cohesion text, on two different topics (i.e.,
plant or heat) and then answered comprehension questions for both texts, in the order
of text presentation. Pairing of the topic and cohesion was counter-balanced such
that half of the participants read the Heat text in high-cohesion condition (and the
Plant text in low cohesion) and the other half read the Plant text in high-cohesion
condition (and the Heat text in low cohesion). Order of the presentation was also
counterbalanced such that half of the participants read the high-cohesion text first
and the other half read the low-cohesion text first. Hence, participants were randomly
assigned to four counter-balancing conditions.
After the participants finished reading, the experimenter took the text away
from the participants so that they could not refer to the text to answer the questions.
The decision to use memory-based comprehension questions was based on two
factors. First, several studies upon which the present study builds used this technique
(Linderholm et al., 2001; O’Reilly & McNamara, 2006; Voss & Silfies, 1996).
Hence, use of this technique facilitated a more direct comparison with these studies.
Second, in classrooms, students read science texts to prepare for exams and access to
the textbook was generally restricted. Therefore, answering questions based on
memory for what they had read better simulates a classroom situation.
The participants were not allowed to return to the previous section of
questions after they had moved to the next set of questions. Topic-specific prior
knowledge questions were then presented. We presented the topic-specific
knowledge questions after the reading comprehension task because being exposed to
questions closely related to the text topic might influence the participants’ reading
behavior. Care was taken to ensure that the texts did not contain answers to the topic-
specific knowledge questions.