Using LSA to Automatically Identify
Givenness and Newness of Noun Phrases in Written Discourse
Christian F. Hempelmann, David Dufty, Philip M. McCarthy, Arthur C. Graesser,
Zhiqiang Cai, and Danielle S. McNamara
({chmplmnn, ddufty, pmmccrth, a-graesser, zcai, dsmcnamr}@memphis.edu)
Institute for Intelligent Systems/FedEx Institute of Technology
University of Memphis
Memphis, TN 38152
Abstract
To illustrate the basic distinctions of givenness, consider
the following example.
Identifying given and new information within a text has long
been addressed as a research issue. However, there has
(1) President Bush said on Friday he recognized that there
previously been no accurate computational method for
were other solutions to bolster Social Security than his
assessing the degree to which constituents in a text contain
given versus new information. This study develops a method
contentious proposal for personal retirement accounts,
for automatically categorizing noun phrases into one of three
but they would be part of a broader overhaul of the
categories of givenness/newness, using the taxonomy of Prince
country’s largest entitlement program.
(1981) as the gold standard. The central computational
technique used is span (Hu et al., 2003), a derivative of latent
In this example, Social Security is new when it is first
semantic analysis (LSA). We analyzed noun phrases from two
mentioned, while the country’s largest entitlement program is
expository and two narrative texts. Predictors of newness
coreferential with it. Thus, the constituent the country’s
included span as well as pronoun status, determiners, and word
largest entitlement program is given information even though
overlap with previous noun phrases. Logistic regression showed
there are lexical differences that have to be bridged
that span was superior to LSA in categorizing noun-phrases,
producing an increase in accuracy from 74% to 80%.
inferentially. Retirement accounts, on the other hand, is only
inferentially available from Social Security; that is, it is
Introduction
neither fully new nor unexpected in view of the previous
mention of Social Security. Thus, retirement accounts is
Successive constituents in text, such as sentences or noun
neither given nor new but somewhere in between.
phrases (NPs), vary in how much new versus given
We propose that any word in a text must be considered
information they contain. This distinction is not binary. For
situated on a continuum between wholly given and wholly
example, it is uncertain how to classify an idea that would
new. By extension, any phrase, clause or sentence analyzed in
have been inferred earlier in the text rather than explicitly
whole or part can be assessed for its degree of givenness. Our
stated, as will be discussed later. The aim of this paper is to
goal in this paper is thus to explore methods for automatically
assess the extent to which givenness and newness can be
extracting these degrees of givenness for particular sections of
computed algorithmically from features of the text. text. However, before discussing computational measures of
Automatic assessment of givenness is useful for a variety of
givenness in more detail, the theoretical basis for the relevant
NLP applications, including the assessment of student concepts will be addressed in the next section.
responses to automatic tutoring systems, paragraph
recognition, discourse feature identification, and recall
Theoretical accounts of the given/new
scoring. The present application was devised for
dimension
implementation in Coh-Metrix (Graesser et al., 2004b), a text-
processing tool that provides new methods of automatically
Halliday (1967) defines given information as “recoverable
assessing text cohesion, readability, and difficulty.
either anaphorically or situationally” from the preceding
When considering the dimension of familiarity, text discourse, and new information, conversely, as not
constituents can be classified into three partitions: given,
recoverable. Chafe (1975, 1987) defines given information as
partially given (based on various types of inferential “knowledge which the speaker assumes to be in the
availability), or not given (that is, new). When developing an
consciousness of the addressee” (1975: 30). In Chafe’s initial
automatic system, it is more natural to view new information
binary framework of given and new, given information is
as that information that is not given, rather than vice versa. So
previously activated, whereas new information is activated
we would need to first need to compute how much given
only by the current segment of text. Chafe then introduces a
(old) information is in a constituent and then regard the
distinction between new, given, and a third category, ‘quasi-
remaining information as new. Therefore, any automated
given’ (1977: 34). This third category is related to the
measure that describes how part of a text can be established
inferential availability of information, and has been a central
as given by a reader is valuable as it will increase the amount
concept in modern approaches. Clark and Haviland (1977)
of identified givenness.
extend the distinction using Gricean maxims, proposing a
941
‘given-new contract’ on the inferential processes involved in
Prince’s (1981) taxonomy of given/new
meaning construction. They argue that a speaker composes
A very influential definition of givenness is provided by
speech acts that have affiliated inferences and believes that
Prince (1981). Prince developed a systematic taxonomy of
the addressee has access to the inferences.
given, inferable, and new information that can be used to
hand-code written text for givenness (Donzel, 1994; Kruijff-
Terminology
Korbayová & Kruijff, 2004; Prince, 1988; Strube, 1998). This
The terms given and new are often used to refer to theme and
present paper is facilitated by three crucial advantages of
rheme, respectively, as well as other similar dichotomies that
Prince’s approach. First, in contrast to other
adopt a functional sentence perspective (Mathesius 1947).
conceptualizations of givenness, she crafts her familiarity
Such issues, including foregrounding, topicality or saliency,
scale on a theoretical basis that integrates previous theoretical
interact with givenness, and for this reason the terms are often
discussions (Chafe, 1975; Clark, 1967; Clark and Haviland,
used synonymously (see, Steedman, 2000; Kruijff-Korbayová
1977; Halliday, 1967). Second, Prince does so without
& Steedman, 2003. for a discussion of terminology and
diluting givenness with other focusing and discourse
distinctions). While the theme is usually given, and the rheme
structuring properties of text. Third, despite the complexity of
is usually new, the theme sometimes contains new the resulting model, she provides example analyses and a
information. One example of this is when there is a change of
systematic methodology to apply her model. Because of the
subject, as in (2a). Similarly the rheme can also, and
formal-theoretical nature, the clear focus of her approach, and
occasionally does, contain given information, as in the case of
the inclusion of a methodology, Prince’s work can be applied
contrasts like (2b).
to text analyses and ultimately implemented computationally.
Prince’s analysis is restricted to NPs, but we believe that a
(2) a. Men work hard in order to be successful.
more version of Prince’s theory that covers units other than
b Women work hard in order to be successful, too.
NPs, prominently VPs, should be developed.
Prince identifies three different sources of givenness. First,
On the basis of sentence (2a), in sentence (2b) the theme
Predictability/Recoverability (GivennessP) is based on the
women is new, while the rheme work hard in order to be
speaker’s assumption “that the hearer CAN PREDICT OR
successful is given. As can be seen from this example, it is
COULD HAVE PREDICTED that a PARTICULAR
entirely possible for a rheme to provide old information. We
LINGUISTIC ITEM will or would occur in a particular
are primarily interested in the contextual and semantic aspects
position WITHIN A SENTENCE” (1981; emphases in the
of the given/new distinction. Thus, we want to clearly
original). Second, Saliency (Givenness) is based on the
distinguish the given/new dichotomy from theme/rheme,
speaker’s assumption “that the hearer has or could
topic/comment, and so forth, rather than conflate them as
appropriately have some particular thing/entity/… in his/her
does, for example, the BEAT system (Cassell, Vilhjálmsson,
consciousness at the time of hearing the utterance.” Third,
& Bickmore, 2001).
Shared Knowledge (GivennessK) is based on the speaker’s
Another related line of research concerns notions such as
assumption “that the hearer ‘knows,’ assumes, or can infer a
primacy (Gernsbacher & Hargreaves, 1988) and recency
particular thing (but is not necessarily thinking about it).” On
effects (Caplan, 1972; Chang, 1980; von Eckhardt & Potter,
the basis of these three types, Prince proposes the following
1983). Primacy effects are related to the assumption that
taxonomy:
words mentioned first in sentences, and sentences mentioned
first in paragraphs, are more accessible in memory. Recency
(3) BN
brand-new
effects, on the other hand, are related to the assumption that
BNA[__] brand-new anchored [Anchortype]
words or sentences will be more accessible to memory when
U
unused
they have been more recently presented or when there are
I(__)/__ inferrable (entity inferrable fromtype)/
fewer words between them and the currently processed
inference-type
sentence.
IC(__)/__ containing inferrable (containing. entity
These concepts also have implications for what can be
inferrable fromtype)/inference type
considered given or new in a text. From a psychological
E
(textually)
evoked
perspective, a concept can only be considered given in any
ES situationally
evoked
practical sense if the reader remembers it. Although we
consider memory access relevant and fruitful avenues for
In this taxonomy, BN indicates an item that is neither
research in relation to givenness, it is beyond the scope of the
previously mentioned in the text nor readily and immediately
present research. Instead, our purpose is to operationalize the
available to the reader given the current situation. In the
given/new distinction purely in terms of semantic following example, Heat can move from one object or place
recoverability. Eventually, it will be possible to compare
to another, the NPs heat, one object, and place are all
given-new with other discourse structuring devices, such as
considered BNs. BNA marked items are BN NPs that are tied
theme-rheme, and recency-primacy.
to a given NP. For example, in the following sentence,
Chlorophyll traps the energy in sunlight, the NP energy in
sunlight is a BNA: the NP energy being a BN anchored to the
942
NP sunlight given in a previous sentence. consider the
1998). Differences occurred in about 18% of cases and were
following sentence: People use thermometers to measure the
resolved by consultation between the scorers. For an
temperature. People in this sentence is considered unused (U)
illustration of potential disagreements between judges,
because the concept of humans in general is readily available
consider the following sentence from our corpus: When some
to all participants regardless of textual context. Other of his friends came to say good bye, tears flowed down his
concepts such as the sun, the moon, and Genghis Khan,
face. One rater viewed the NP tears as a BN whereas the
would also count as unused items. Clearly, this element of the
other viewed it as an IC. Clearly there is a case for both. On
Prince taxonomy is open to some question due to the
the one hand, tears had not previously been mentioned
subjective judgment concerning concepts that people have
(therefore tears is new); on the other hand, saying goodbye is
available. That said, the raters of the texts in this study did not
often very sad, and sadness leads to tears (therefore tears is a
encounter any instances in which agreement could not be
containing inferable). Although these disagreements occurred,
reached.
judges were able to resolve disputes after some discussions.
ICs differ from IS in that they are inferences that can be
made from inferences, in other words, two-word inferences.
LSA-Based Automated Measures for
In this sense ICs are conceptually one step further removed
Given/New
from the textual item from which they are inferable. Consider
the following sentence: And he knew he would miss his home:
In earlier work (Dufty et al., 2005), we evaluated a range of
the nights in the den watching sports, the barbecue parties in
computational measures for given/new, including
the backyard, his hideout in the attic, and of course, his room.
constituent/lexical/stem/lemma overlap, a simplified version
Both raters judged the NPs the nights in the den watching
of coreference on the basis of ontological semantics
sports, the barbecue parties in the backyard, and his hideout
(Nirenburg & Raskin, 2004), as well as measures based on
in the attic as being I
Latent Semantic Analysis (LSA). In the present paper we
C items. The head of the NP the nights in
the den watching sports is the nights, which is not further explore the capabilities of LSA in more detail.
inferentially available from item such as his home. However,
LSA is a technique for computing the similarity of words
from his home we can infer that he would have a den, and
by representing them in a vector space and computing the
from den we can infer that he might spend nights there
cosine of the angle between vectors for pairs of words
watching sports. All other constituents are givens: An E has
(Landauer et al., 1998). Higher cosines represent greater
been previously mentioned, whereas an Es is situationally
similarity. The vector space is created by constructing a co-
given. For example, the word you in a text is a given because
occurrence matrix out of a large corpus of texts. The space is
you are in fact reading the text.
then reduced using singular value decomposition, such that
Prince’s implied hierarchy can be represented in an explicit
each word is represented in a space of approximately 300
familiarity scale (4a below). The scale posits that higher items
dimensions. The dimensions themselves have no meaning,
that are further to the left are more familiar to the hearer.
but are merely statistical constructs. Meaning is extracted by
Thus, the Gricean maxim of quantity can be applied: comparing the similarity of vectors in the space. LSA can be
Speakers choose the most familiar method to refer to a
used to evaluate the similarity of text segments of any size
constituent possible. If they choose one that is not as familiar
through vector addition. For example, the similarity of two
to the hearer as they assume, the hearer will not understand
paragraphs can be calculated by adding all the vectors for
(too little information). If they choose one that is too familiar
words in the first paragraph to create a paragraph vector,
to the hearer, they run the risk of sounding childish (too much
adding the vectors for words in the second paragraph to create
information).
a second vector, then taking the cosine of the two paragraph
We adopted Prince’s (1981) familiarity scale and translated
vectors as an estimate of the similarity between them. LSA
it into values of newness from 0 (fully given) to 1 (fully new)
has been used for a variety of applications such as automated
as follows (4b):
tutoring systems (Graesser et al., 2004a), essay grading (Foltz
et al., 1999), and evaluating text coherence (Foltz et al.,
(4) a. E/E
1998).
S > U >
I > IC > BNA > BN
b. 0 0.2 0.4 0.6 0.8 1
LSA might seem at first glance to be the ideal candidate for
evaluating the givenness of a segment of text. By comparing
It should be noted that these numbers are only used for
the vector of the current sentence with the vector for the
computational convenience. The scale is ordinal, not an
preceding text, some estimate can be gained of the similarity
interval or ratio scale. Type of scale affects the types of
of the current sentence with prior text. However, the concept
statistical analyses that can be conducted, as indicated later.
of givenness, while related, is distinct from the concept of
All NPs in the sample corpus described below were hand-
similarity. On the one hand, for a text item to be given, it need
scored according to the Prince taxonomy by two independent
only be coreferential with one previous item. LSA captures
experts in linguistics. Inter-rater agreement produced kappa
overall similarity with the text, rather than a particular
of .72. Differences occurred between raters because Prince’s
constituent. Thus, while the previous text may contain the
taxonomy is not unambiguous and frequently lead to a NP to
very item that is being compared for its similarity, the
be assigned to multiple categories (cf. Poesio & Vieira,
measure takes all the other items in the preceding text into
account as well. This dilutes the score considerably. On the
943
other hand, a text item can be partially given on the basis of
given nor entirely new, such as unused, inferrable,
its inferential availability and world knowledge. LSA is not a
containing-inferrable, and brandnew-anchored.
symbolic approach, but it can only roughly approximate this.
NPs were then coded for the following binary properties:
Our second main measure, based on a variant of LSA, was
whether the NP was a pronoun, whether the NP was preceded
developed for the specific purpose of detecting new by the definite1 article, and whether any content word in the
information. The method is called span (Hu et al., 2003). It
NP had occurred in a previous NP (a modification of
was formulated to test the accuracy of student answers in the
argument overlap; Kintsch & van Dijk, 1978). All binary
automated tutoring system AutoTutor (Graesser et al., 2004a).
variables were coded as 1=yes, 0=no.
Rather than simply adding vectors, span constructs a
Two computational measures were calculated based on
hyperplane out of all previous vectors. The comparison vector
LSA. The first was the LSA similarity between the NP and all
(in this case the current sentence in the text) is projected onto
previous noun-phrases in the text. The second was the span
the hyperplane. The projection of the sentence vector on the
measure between the NP and all previous noun-phrases.
hyperplane is considered to be the component of the vector
Table 1 shows descriptive statistics for all predictor variables.
that is shared with the previous text, or given (G). The
The relative frequencies for the criteria variable, Prince
component of the vector that is perpendicular to the category, across all 478 NPs were 317, 116, and 45, for given,
hyperplane is considered to be the component of the sentence
inferable, and new observations, respectively.
that is new (N). To calculate the newness of the information,
a proportion score is then taken: Span(new information) =
Table 1: Descriptive statistics for all predictor variables
N/(N+G). N is the component of the vector that is
Binary variables
Yes No
perpendicular to the hyperplane and G is the projection of the
Pronoun
111
367
vector along the hyperplane.
Definite article
71
407
Span captures newness in a more sophisticated way than
Word overlap
141
347
standard LSA. Standard LSA combines all previous text into
a single composite vector and compares the sentence to that
Continuous variables
Mean s.d.
vector. In doing so, much of the information contained in
LSA cosine with prev. NPs
.20
.27
vectors of individual sentences is lost, as the individual
Span with previous NPs
.29
.32
vectors can cancel each other out. Span constructs a
hyperplane out of all the vectors of all the sentences, and
Two ordinal logistic regressions were performed with the
compares the new sentence to that space. This method means
hand-coded Prince categories as the dependant variable. An
that no information in the individual vectors is lost.
alpha level of .05 was used for all significance tests. The first
analysis tested a predictive model consisting of the three
Materials, Method, and Results
binary variables (pronoun, definite article, and content word
We selected four texts of approximately equal size from 4th
overlap), as well as LSA cosines as predictors. The second
grade textbooks: two narrative texts, ‘Moving’ (McGraw-Hill
analysis tested the model in which span was added. The
Reading - TerraNova Test Preparation and Practice - coefficients generated from both these analyses are shown in
Teacher’s Edition) and ‘Orlando’ (Addison Wesley Phonics
Table 2.
Take-Home Reader Grade 2), and two expository texts, ‘The
As can be seen from Table 2, LSA contributed to the
Needs of Plants’ (McGraw-Hill Science) and ‘Effects of
categorization of NPs in the first model. As expected,
Heat’ (SRA Elementary Science). The texts contained 478
pronouns and definite articles were more likely to reflect
NPs in total, across 195 sentences.
given information. Pronouns tend to refer to earlier entities in
The NPs in the texts were hand-coded according to the
original categories postulated by Prince, conflating the two
1
types of evoked, as they are both fully given, resulting in six
Since the class of NPs that surface as definite is not at all
categories. There was an inter-rater reliability of .74 given by
coextensive with those that speakers assume can be familiar
kappa, with 88% of cases rated the same by both raters. The
to hearers, we will not focus on the notion of definiteness
six categories ultimately had to be collapsed into three
beyond its use as an auxiliary identifier for givenness. While
because of the sparseness of data. Two of the categories,
every definite NP is given under our definition, but not
unused and containing inferables, had very low counts (3 and
every given NP is definite.
8 respectively), rendering them unsuitable for categories in a
In general, ours is the opposite vantage point from that
logistic regression. We therefore decided to reduce our
of existing work on definiteness (e.g., Fraurud, 1990; Vieira
number of categories, and decided to use the common three-
& Poesio 2000). They are interested in definiteness, which
category system: given, new, and inferable. Hence we givenness and other semantic phenomena can help them
collapsed these both into the category of inferable, and
account for. We are rather interested in a semantic
collapsed brand new anchored into brand new. Thus, the
phenomenon, givenness, that surface phenomena like
intermediate category between fully new (0) and fully given
definiteness can help to identify. A third type of related
(1) subsumed all instances of NPs that were neither entirely
approaches may be looking for other classifications that
surface partially as definiteness and are partially caused by
givenness, e.g. Uryupina’s (2003) unique and discourse-new.
944
the text or pragmatically available information. New nouns
For comparison, an ordinal regression was also performed
are typically (but not always) preceded by the indefinite
without either span or LSA, but retaining the three binary
article when they represent new information , and preceded
variables, definite article, pronoun, and content word overlap.
by the definite article on subsequent mentions. Content word
The resulting model achieved 66% accuracy, which, given
overlap showed a modest positive relationship with newness,
that the most common category occurred 66% of the time, is
which is the opposite direction to its theoretical relationship
no different than chance.
with newness. This is probably a suppression effect caused by
LSA, since LSA and content word overlap attempt to capture
Discussion
a similar aspect of the text.
We developed a multivariate model of givenness and
The addition of span into the second model produced an
newness using word repetition, pronominalization, articles,
increase in predictive accuracy from 74% to 80%, which an
and a continuous measure of newness, span. The model
incremental chi-square test showed to be significant, allocated NPs to one of the three categories of newness with
χ2(1,478) = 183.07, p <.05. Span also displaced both LSA and
80% accuracy, when compared to human ratings. Agreement
content word overlap as significant predictors from the
between the human raters, in this case 88%, may be
second model.
considered to be the benchmark of performance. Against this
benchmark, span’s performance, with an 8% difference, is
Table 2: Ordinal logistic regression analysis of Prince
very promising. Completely automatable measures were able
categorizations using pronouns, definite articles, and content
to approximate hand-coded ratings by experts.
word overlap, and comparing LSA and span
In a separate analysis, standard LSA was also a significant
β
S.E.
Wald’s
df
predictor of newness, although it was 6% less accurate than
β
χ2
span. LSA was originally developed as a measure of
Model 1: Span not included as predictora
similarity between two items of text, while span is
Threshold
specifically a measure of the newness of one text in
Prince= 0 (given)
4.74
1.05
20.24** 1
comparison to another. The results confirm that span is a
Prince = 1 (intermed.)
6.68
1.07
39.04** 1
more appropriate measure when newness, rather than
Predictor
similarity, is the concept of interest in the text.
LSA -4.70
.67
49.77**
1
The analysis that only used simple algorithmic indicators
Pronoun -5.17
1.03
25.14**
1
such as whether the NP is a pronoun, whether the NP begins
Content word overlap
.83
.34
6.10** 1
with the, or whether the NP repeats content words from an
Definite article
-.79
.40
3.88* 1
earlier NP, did no better than chance. This demonstrates the
importance of similarity metrics such as span in determining
Model 2: Span included as predictorb
linguistic and psycholinguistic properties of text.
Threshold
The present results provide a bridge between theoretical
Prince= 0 (given)
8.94
1.12
56.89** 1
linguistics and computational linguistics, they provide a
Prince = 1 (intermed.)
12.12
1.18
89.86** 1
reliable mapping between categories of newness as described
Predictor
by Prince (1981), and computable text-based variables.
LSA
1
Span 6.79
0.54
158.06**
1
Acknowledgements
Pronoun -6.38
1.08
35.10**
1
Content word overlap
0.68
0.38
3.25 1
This research was funded by Institute for Educations Science
Definite article
-1.00
0.46
4.67* 1
Grant IES R3056020018-02. Any opinions, findings, and
* significant at .05 level
conclusions or recommendations expressed in this article are
** significant at .01 level
those of the authors and do not necessarily reflect the views
a model χ2(4,N=478) = 173.22, p<.05. Accuracy 74%
of the IES.
b model χ2(5,N=478) = 366.29, p<.05. Accuracy 80%
References
In the second model, the largest contribution to prediction
Caplan, D. (1972). Clause boundaries and recognition
of newness category was made by span, followed by
latencies for words in sentences. Perception and
pronominalization. This demonstrates the different
Psychophysics, 12, 73-76
contributions that these variables make to predicting newness
Cassell, J., Vilhjálmsson, J., and Bickmore, T. (2001). BEAT:
category. Span captures the semantic relationship between
the Behavior Expression Animation Toolkit. Proceedings
each NP and previous noun-phrases. This relationship is
of SIGGRAPH. New York: ACM. 477-486.
invisible when a pronoun is used, because of span’s reliance
Chafe, W. L. (1975). Givenness, Contrastiveness,
on lexical-semantic relationships between content words.
Definiteness, Subjects, Topics, and Point of View. In: C.
Conversely, pronouns capture indirect reference to earlier
N. Li (Ed.), Subject and Topic (pp 26-55). New York:
noun-phrases, which in turn is invisible to LSA and span.
Academic.
945
Chafe, W. L. (1987). Cognitive Constraints of Information
Kruijff-Korbayová, I. & Kruijff, G. J. M. (2004). Discourse-
Flow. In: R.S. Tomlin, (Ed.), Coherence and Grounding
Level Annotation for Investigating Information Structure.
in Discourse (pp. 21-51). Amsterdam, Philadelphia:
Paper presented at the ACL 2004 Workshop on Discourse
Benjamins.
Annotation, Barcelona, Spain, July 25-26, 2004.
Chang, F. R. (1980). Active memory processes in visual
Kruijff-Korbayová, I. & M. Steedman. (2003). Discourse and
sentence comprehension: Clause effects and pronominal
Information Structure. Journal of Logic, langauge and
references. Memory and Cognition, 8, 58-64
Information: Special Issue on Discourse and Information
Clark, H. H. & Haviland, S. E. (1977). Comprehension and
Structure, 12, 249-259.
the Given-New Contrast. In: R.O. Freedle (Ed.), Landauer, T.K., Foltz, P. W., & Laham, D. (1998). An
Discourse Production and Comprehension (pp. 1-40).
introduction to latent semantic analysis. Discourse
Norwood, NJ: Ablex.
Processes, 25, 259-284.
Dufty, D.F., Graesser, A.C., Hempelmann, C.F., & Mathesius, V. (1947). O tak zvaném aktuálnim clenení
McCarthy, P.M.(2005). Givenness and Newness of
vetném. /On the so-called actual articulation of the
Information: Automated Identification in Written
sentence./ In: Cestina a obecny jazykozpyt. Prague:
Discourse. Manuscript submitted for publication.
Melantrich.
Foltz, P.W., Kintsch, W., and Landauer, T.K. (1998). The
Nirenburg, S and Raskin, V. (2004). Ontological Semantics.
measurement of textual Coherence with Latent Semantic
Cambridge: MIT Press.
Analysis. Discourse Processes, 25, 285-307.
Poesio, M. & Vieira, R. (1998). A corpus-based investigation
Foltz, P.W., Laham, D., & Landauer,T.K. (1999). The
of definite description use. Computational Linguistics, 24-
Intelligent Essay Assessor: Applications to Educational
2, 183-216.
Technology. Interactive Multimedia Electronic Journal of
Prince, E. (1981). Towards a Taxonomy of Given—New
Computer-Enhanced Learning 1-2. Available at:
Information. In: Cole, Peter. Ed. Radical Pragmatics.
http://imej.wfu.edu/articles/1999/2/04/printver.asp.
New York: Academic: 223-255.
Fraurud, K. (1990). Definiteness and the processing of NPs in
Prince, E. (1992). The ZPG Letter: Subjects, Definiteness,
natural discourse. Journal of Semantics, 7, 395-433.
and Information-status. In: S. Thompson, & W. Mann
Gernsbacher, M.A. & Hargreaves, D. (1988). Assessing
(Eds.), Discourse Description: Diverse Analyses of a
sentence participants: The advantage of first mention.
Fund Raising Text. Philadelphia/Amsterdam: Benjamin’s.
Journal of Memory and Language, 27, 699-717.
295-325.
Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H., Ventura,
Steedman, M. (2000). Information Structure and the Syntax-
M., Olney, A., &. Louwerse, M. M. (2004a). AutoTutor:
Phonology Interface. Linguistic Inquiry 34, 649-689.
A tutor with dialogue in natural language. Behavioral
Strube, M. (1998). Never look back: An alternative to
Research Methods, Instruments, and Computers, 36, 180-
centering. In: Proceedings of the 36th Meeting of the
193.
Association for Computational Linguistics and the 17th
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai,
International Conference on Computational Linguistics,
Z. (2004b). Coh-Metrix: Analysis of text on cohesion and
1251-1257.
language. Behavior Research Methods, Instruments, and
Uryupina, O. (2003). High-precision Identification of
Computers 36, 193-202.
Discourse New and Unique Noun Phrases. In:
Halliday. M. A. K. (1967). Notes on Transitivity and Theme
Proceedings of the ACL Student Workshop, Sapporo,
in English. Journal of Linguistics, 3, 199-244.
2003.
Hu, X., Cai, Z., Louwerse, M. M., A. Olney, P. Penumatsa,
van Donzel, M. E. (1994). How to specify focus without
A. C. Graesser, & TRG. (2003). A revised algorithm for
using acoustic features. Proceedings, 18. Institute of
Latent Semantic Analysis. Proceedings of the 2003
Phonetic Sciences, University of Amsterdam: 1-17.
International Joint Conference on Artificial Intelligence:
Vieira R., & Poesio, M. (2000). An Empirically-Based
1489-1491.
System for Processing Definite Descriptions”,
Kintsch, W., & Van Dijk, T. A. (1978). Toward a model of
Computational Linguistics, 26-.4, 539-593.
text comprehension and production. Psychological
von Eckhardt, B. & Potter., M. C. (1985). Clauses and the
Review, 85, 363-394.
semantic representation of words. Memory and Cognition
13, 371-376.
946
Add New Comment