This is not the document you are looking for? Use the search form below to find more!

Report home > Education

Text as Scene: Discourse Deixis and Bridging Relations

0.00 (0 votes)
Document Description
This paper presents a new framework, "text as scene", which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus.
File Details
Submitter
  • Name: serge
Embed Code:

Add New Comment




Related Documents

Discourse Deixis and Coreference: Evidence from AnCora

by: rosie, 10 pages

Few empirical studies have been conducted on discourse deixis, and no such study exists for Catalanor Spanish. This paper presents an empirical analysis of 200000 words from the An- Coracorpora ...

DISCOURSE DEIXIS: REFERENCE TO DISCOURSE SEGMENTS

by: sebastian, 10 pages

Computational approaches to discourse understanding have a two-part goal: (1) to identify those aspects of discourse understanding that require process-based accounts, and (2) to characterize the ...

Discourse Analysis and Text Perspectives in Translation

by: clarimunda, 25 pages

Discourse Analysis and Text Perspectives in Translation. A Power Point Report.

Title COHESION AND DISCOURSE DEIXIS ON ENGLISH ARTICLES

by: lian, 22 pages

Most research on articles so far has treated them like accessories placed before nouns, which do not indicate any meaning or function. Words can be divided into two classes: content words and ...

Veins Theory: A Model of Global Discourse Cohesion and Coherence

by: hakem, 9 pages

In this paper, we propose a generalization of Centering Theory (CT) (Grosz, Joshi, Weinstein 1995) called Veins Theory (VT), which extends the applicability of centering rules from local to global ...

COHERENCE AND COHESION RELATIONS

by: rebeka, 39 pages

COHERENCE AND COHESION RELATIONS Connexion and Framing. A power point report.

Marxism and International Relations

by: alide, 26 pages

Marxism and International Relations provides you with resources to give your own answers to these questions. The most important of these resources is Marx's own writings. A thousand textbooks claim ...

Discourse, Ideology and Context

by: fadheela, 31 pages

This paper will develop some theoretical instruments that may disentangle these complex ideological influences on discourse. It does so by briefly summarizing my current conception of ideology, by ...

The Student As Historian - DBQ Strategies and Resources for Teaching History

by: joel, 53 pages

The Student As Historian - DBQ Strategies and Resources for Teaching History

Medical Device Classification - US and the EU as per MDD, CMDR and GHTF - Webinar By GlobalCompliancePanel

by: fazila, 2 pages

Medical Device Classification - US and the EU as per MDD, CMDR and GHTF - Webinar By GlobalCompliancePanel

Content Preview
Procesamiento del Lenguaje Natural, nº39 (2007), pp. 205-212
recibido 02-05-2007; aceptado 22-06-2007
Text as Scene: Discourse Deixis and Bridging Relations
Marta Recasens
M. Antònia Martí
Mariona Taulé
Universitat de Barcelona
Universitat de Barcelona
Universitat de Barcelona
Gran Via Corts Catalanes,585 Gran Via Corts Catalanes,585 Gran Via Corts Catalanes,585
08007 Barcelona
08007 Barcelona
08007 Barcelona
mrecasens@ub.edu
amarti@ub.edu
mtaule@ub.edu
Abstract: This paper presents a new framework, “text as scene”, which lays the foundations for
the annotation of two coreferential links: discourse deixis and bridging relations. The
incorporation of what we call textual and contextual scenes provides more flexible annotation
guidelines, broad type categories being clearly differentiated. Such a framework that is capable
of dealing with discourse deixis and bridging relations from a common perspective aims at
improving the poor reliability scores obtained by previous annotation schemes, which fail to
capture the vague references inherent in both these links. The guidelines presented here
complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with
coreference information, thus building the CESS-Ancora corpus.
Keywords: corpus annotation, anaphora resolution, coreference resolution.
Resumen: En este artículo se presenta un nuevo marco, “el texto como escena”, que establece
las bases para la anotación de dos relaciones de correferencia: la deixis discursiva y las
relaciones de bridging. La incorporación de lo que llamamos escenas textuales y contextuales
proporciona unas directrices de anotación más flexibles, que diferencian claramente entre tipos
de categorías generales. Un marco como éste, capaz de tratar la deixis discursiva y las
relaciones de bridging desde una perspectiva común, tiene como objetivo mejorar el bajo grado
de acuerdo entre anotadores obtenido por esquemas de anotación anteriores, que son incapaces
de captar las referencias vagas inherentes a estos dos tipos de relaciones. Las directrices aquí
presentadas completan el esquema de anotación diseñado para enriquecer el corpus español
CESS-ECE con información correferencial y así construir el corpus CESS-Ancora.
Palabras clave: anotación de corpus, resolución de la anáfora, resolución de la correferencia.
1
Introduction
and bridging relations2, call for a specific
analysis which takes into account their complex
Due to the lack of large annotated corpora with
peculiarities so as to determine the most
anaphoric
information,
the
field
of
appropriate set of attributes and values.
computational coreference resolution is still
We believe that the more consistent the
highly
knowledge-based,
especially
for
linguistic basis underlying the annotation
languages other than English. With a view to
scheme is, the easier it is to build a state-of-the-
building a corpus-based coreference resolution
art coreference resolution system. On the other
system for Spanish, our project is to extend the
hand, coreferential –anaphoric in particular–
morphologically, syntactically and semantically
relations are very much specific to each
annotated CESS-ECE corpus (500,000 words)
language. Unlike English, for instance, Spanish
with pronominal and full noun-phrase (NP)
has three series of demonstratives and pronouns
coreference information (thus building the
marked for neuter gender. The typology
CESS-Ancora corpus). The design of the
presented in this paper is the completion of a
annotation guidelines is presented in (Recasens,
flexible annotation scheme rich enough to cover
Martí & Taulé, 2007), but two types of
the cases of coreference in Spanish.
coreferential links, namely discourse deixis1


2 Our approach classifies as bridging (or
1 We define discourse deixis (or abstract
associative anaphors) those definite or demonstrative
anaphora) as reference to a discourse segment, that
NPs that are interpreted on the grounds of a
is, to a non-nominal antecedent.
metonymic relationship with a previous NP or VP.
ISSN: 1135-5948
© 2007 Sociedad Española para el Procesamiento del Lenguaje Natural

Marta Recasens, Antonia Martí Antonín y Mariona Taulé
Apart from being a useful resource for
referenced via deictic expressions. Nevertheless,
training and evaluating coreference resolution
a discourse entity corresponding to a textual
systems for Spanish, from a linguistic point of
segment is not added to the discourse model
view, the annotated corpus will serve as a
until the listener finds a subsequent deictic
workbench to test for Spanish the hypotheses
pronoun, in the so-called accommodation
suggested by Ariel (1988) and Gundel, Hedberg
process4. Works on parsing texts into discourse
& Zacharski (1993) about the cognitive factors
segments (Marcu, 1997) have not dealt with the
governing the use of referring expressions. The
problem of discourse deixis, i.e. delimiting the
only way theoretical claims coming from a
extent of the antecedent.
single person’s intuitions can be proved is on
With respect to corpus annotation, there are
the basis of empirical data that have been
not many annotation schemes that annotate
annotated in a reliable way.
antecedents other than NPs. The MUC Task
As a follow-up, this paper places the
Definition (Hirschman & Chinchor, 1997)
emphasis on the annotation guidelines for
explicitly defines demonstratives as non-
discourse deixis and bridging relations. Both
markables. Two notable exceptions are the
are considered from a common perspective:
MATE scheme by Poesio (2000) and the
what we call “text as scene”, that is, the text
scheme by Tutin et al. (2000), although both
taken as a scene in the sense that it builds up
point out the difficulty of delimiting the exact
both a textual and a contextual framework as
part of the text that counts as antecedent as well
the result of an interaction between the
as the type of object the antecedent is. Tutin et
discourse and the global context.
al. (2000) decide to select the largest possible
The rest of the paper proceeds as follows:
antecedent.
Section 2 reviews previous work on abstract and
Similarly to discourse deixis, authors seem
bridging anaphora. A description of the “text as
sceptical about the feasibility of the annotation
scene” framework is provided in Section 3.
task for bridging relations, especially since the
Specific guidelines for annotating discourse
empirical study conducted by Poesio & Vieira
deixis and bridging relations are given in
(1998), which reported an agreement of 31%.
Section 4. Finally, Section 5 presents our
The issue under debate is where the boundary
conclusions and discussion of the guidelines.
lies between a discourse-new NP and a bridging
one, that is, between autonomous and non-
2
Previous work
autonomous definite NPs. Fraurud’s (1990)
starting point for her corpus-based study is a
Given the difficulty of dealing with antecedents
two-way distinction between first-mentions and
other than NPs, most of the work on anaphora
subsequent mentions (coreferential NPs). On
resolution has ignored abstract anaphora and
realising that 60% of the definite NPs were
has limited to individual anaphora. However,
first-mention uses, she concludes that in
the work of Byron (2002) has emphasized that
addition to the syntactic (in)definiteness of an
demonstrative pronouns referring to preceding
NP,
the
lexico-encyclopaedic
knowledge
clauses abound in natural discourse3. In this
associated with the head noun of the NP
line, the corpus-based study of the use of
interacts with the general knowledge associated
demonstrative NPs in Portuguese and French
with present anchors in order to select one or
conducted by Vieira et al. (2002) has pointed
more anchors in relation to which a first-
out that a system limited to the resolution of
mention definite NP is interpreted. Anchors
anaphors with a nominal antecedent is likely to
may be provided in the discourse itself –either
fail on about 30% of the cases.
explicitly or implicitly–, by the global context,
In her seminal study, Webber (1988) coins
or by a combination of the two. Although
the term “discourse deixis” for reference to
Fraurud does not use the term, the first-mention
discourse segments and argues that these should
NPs that are interpreted in relation to an explicit
be included in the discourse model as discourse
anchor correspond to “bridging relations”.
entities, since they can be subsequently


3 Byron’s anaphora resolution algorithm
4 Accommodation results from the use of a
differentiates Mentioned Entities (those evoked by
singular definite, which is felt to presuppose that
NPs) from Activated Entities (those evoked by
there is already a unique entity in the context with
linguistic constituents other than NPs, involving
the given description that will allow a truth value to
global focus entities).
be assigned to the utterance (Lewis, 1979).
206

Text as Scene: Discourse Deixis and Bridging Relations
In their analysis of the use of pronouns and
objetivos es evitar que se repitan los
demonstrative NPs in bridging relations,
errores del pasado, que obligaron al
Gundel, Hedberg & Zacharski (2000) conclude
Gobierno a comprar créditos
that such cases are best analysed as minor
dudosos por un valor de 60.000
violations to the Giveness Hierarchy, in that the
millones de coronas –1.500 millones
listener gets away with an underspecified
de dólares. Esto permitirá al banco
referent on the basis of what is predicated in the
sanear su portafolio...6
text.
What do then discourse deixis and bridging
(2) “Las previsiones para los próximos
relations have in common? On the one hand,
diez días no son nada halagueñas”,
they are the anaphoric links with poorest
pronosticó ayer Eduardo Coca,
reliability scores. On the other hand –and
director del Instituto Nacional de
probably a cause of the former–, their
Meteorología. Tan sólo un pequeño
antecedents are rather fuzzy, either because
frente con poca agua debía cruzar el
their extension cannot be clearly determined or
norte de la península entre ayer y
because the semantic relation that links them
hoy. Por lo demás, seguirá la
with their anaphor cannot be easily identified.
situación anticiclónica. Pero la cosa
Taking into account the low inter-annotator
no acaba ahí.7
agreement together with the idea of vague
reference, we propose viewing the text as a
(3) El presidente de la Comisión del
scene in order to provide a wider contextual
Mercado de las Telecomunicaciones
framework that captures those cases in which a
mostró su preocupación por la falta
discourse entity alludes to something that is not
de competencia en la telefonía local,
literally mentioned in the discourse.
como consecuencia de que la
liberalización de las
3
Text as scene
telecomunicaciones se ha hecho por
principios jurídicos y no técnicos y
Previous aims at annotating coreference have
que “hay que abrir este mercado
shown the need for reconsidering the annotation
como sea”.8
of both discourse deixis and bridging relations,
since the reference of NPs such as esto, la cosa,

and
este mercado in (1), (2) and (3)
6 (1) The Komercni Banka –Commercial Bank –,
respectively5 cannot be accounted for from
one of the four biggest banks in the Cheque
approaches that insist on linking each anaphoric
Republic, announced today that it will dismiss 2,300
expression to an explicit textual antecedent.
more workers by the end of the year within the
(1) El Komercni Banka –Banco
reform process of the state entity. The director of the
Comercial–, uno de los cuatro
bank, Radovan Vrava, pointed out that the main
bancos más grandes de la República
reason is the restructuration of the bank. The State
possesses the 60 per cent of the shares of the
Checa, anunció hoy que despedirá a
Komercni Banka and the Cheque Government wants
2.300 empleados más antes de
to begin the privatisation process of this bank
finales del año dentro del proceso de
already this year and finish it in September 2001.
saneamiento de la entidad estatal. El
Another of the goals is to avoid the repetition of past
director del banco, Radovan Vrava,
mistakes, which forced the Government to buy
señaló que el motivo principal es la
doubtful credits for the price of 60,000 million
reestructuración del banco. El
crowns –1,500 million dollars. This will allow the
Estado dispone del 60 por ciento de
bank to reform its portfolio.
7
las acciones del Komercni Banka y
(2) “The forecasts for the next ten days are not
el Gobierno checo quiere comenzar
favourable at all”, forecasted yesterday Eduardo
Coca, director of the National Institute of
el proceso de privatización de este
Meteorology. Only a small front with little water
banco ya en este año y terminarlo en
should cross the north of the peninsula between
septiembre del 2001. Otro de los
yesterday and today. As for the rest, the anticyclonic

situation will persist. But the thing does not end
5 The reader is asked to please forgive the length
there.
of most of the examples used in this paper, but the
8 (3) The president of the Commission of the
anaphoric expressions we deal with make no sense
Market of Telecommunications showed his concern
unless the context is provided.
for the lack of competence in local telephony, as a
207

Marta Recasens, Antonia Martí Antonín y Mariona Taulé
Our coding scheme is defined from the
consensus as to the typology of referring
consideration of the text as a scene in two
expressions that can code discourse deixis and
different senses (see Figure 1), the scene being
bridging relations as well as the subtypes of
the cohesive element. On the one hand,
links that need to be annotated with a view to
discourse deixis captures those anaphoric
achieving a level of inter-annotator agreement
expressions that refer back to the textual scene,
as high as possible.
that is, to a discourse segment –either at the
sentence level or beyond the sentence– that
4
Corpus annotation
builds up a scene as a whole. On the other hand,
The CESS-ECE corpus is the largest annotated
bridging captures those implicit relations
corpus of Spanish, which contains 500,000
(between two discourse entities) that are
words mostly coming from newspaper articles.
enabled by the contextual scene activated by the
It has been annotated with morphological
involved entities. A contextual scene is taken to
information (PoS), syntactic constituents and
be the knowledge which does not explicitly
functions, argument structures and thematic
appear in the text, but that contributes to its
roles, tagged with strong and weak named
comprehension. Bridging is treated within
entities, and the 150 most frequent nouns have
coreference in the sense that the two discourse
their WordNet synset.
entities share the reference point on the basis of
Drawing from the MATE scheme (Poesio,
a contextual scene.
2000) and taking into account the information
already annotated, the enrichment of the corpus
Eduardo Coca, director
La falta de ompetencia
del Instituto Nacional
en todo el mundo en la
with coreference annotation is divided into two
de Meteorología
telefonía local, como
(INM). Tan sólo un
consecuencia de que la
ctx-sc
steps: a first automatic stage, and a second
pequeño frente con
liberalización de las
comunicaciones se ha
manual one. The former marks up all NPs of
poca agua debía cruzar
el norte de la península
hecho por principios
the corpus as <de> (discourse entity) with an ID
entre ayer y hoy. Pero
jurídicos, este mercado
la cosa no acaba ahí.
como sea.
number, and fills in the TYPE attributes with
morphological information (the kind of NP);
Discourse deixis Bridging relation
the latter step adds those <de> unidentified by
Figure 1: Textual and contextual scenes
the automatic annotation – and codes the
coreferential relations by incorporating the
Back to example (1), the discourse segment
<link> element.
picked up by the pronoun esto –that which is
It is at this second stage when antecedents
going to allow the Cheque Bank to reform its
expressed by phrases other than nominal are
portfolio– results not only from the last
marked manually as <seg> elements when
discourse segment, but from combining the
necessary. The <coref:link> elements serve to
content of the events that form the entire textual
show coreferential relations holding between
scene: the dismissal of 2,300 workers, the
two discourse entities, and the embedded
restructuration of the Bank, its privatisation,
<coref:anchor> element points to the ID of the
and the avoidance of past mistakes. Similarly,
antecedent. Each <coref:link> has a TYPE
the definite NP la cosa in (2) makes reference
attribute that specifies the kind of coreferential
to the textual scene previously described. It
relation. We distinguish seven types of links:
becomes a quasi-pronominal form in that it is
(i)
ident (identity)
almost semantically empty. Finally, example
(ii)
dx (discourse deixis)
(3) shows a case of bridging, where the
(iii)
poss (possessor)
interpretation of the demonstrative NP este
(iv)
bridg (bridging)
mercado is made possible by the contextual
(v)
pred (predicative)
scene activated by a former NP, la telefonía
(vi)
rank (ranking)
local, namely, the market opened by local
(vii) context (contextual)
telephony.
Text as scene provides a common
Given that the marking of both discourse deixis
framework within which we are able to reach a
and bridging relations is very useful for tasks

such as question answering (answer fusion),
consequence of the fact that the liberalisation of
information extraction (template merging) and
telecommunications has been done by juridical and
text summarization, but that the annotation of
not technical principles and that “this market must
these two links poses great difficulty, we
be opened at all costs”.
208

Text as Scene: Discourse Deixis and Bridging Relations
consider it necessary to devote the two
events (4), “sent-fact” for facts (5), and “sent-
following
sections
to
specifying
their
prop” for propositions (6).
annotation guidelines, which are based on our
(4) a. La ministra Anna Birulés animó a
conception of the text as scene.
las pymes a [invertir en
Investigación y Desarrollo] y *0*
4.1
Discourse deixis (dx)
mostró a los empresarios presentes
la disposición del Gobierno a
We consider an anaphoric NP to be in a dx
facilitar este camino.10
relation when its antecedent is a textual scene
expressed by a clause or a sequence of clauses.

b. La ministra Anna Birulés animó
NPs that have the potential to participate in dx
a las pymes a <seg ID=“seg_03”>
links are demonstrative pronouns, the neuter
invertir en Investigación y
personal pronoun lo, the relative pronoun que,
Desarrollo </seg> y *0* mostró a
demonstrative
full
NPs,
and
definite
los empresarios presentes la
descriptions (DD) of the kind la cosa, el
disposición del Gobierno a facilitar
fenómeno, la situación, etc. We call these NPs
<de type=“dd0ms0” ID=“de_06”>
“quasi-pronominal DDs”, as they can be
este camino </de>.
replaced by the pronoun esto and are almost
<coref:link ID=“de_06” type=“dx”
empty of semantic content of their own.

subtype=“sent-ev”> <coref:anchor
Textual scenes are not constituted as such
ID=“seg_03”/> </coref:link>
until a corresponding referring expression
appears in the discourse. The pronouns lo and
(5) Sin embargo, [los virus logran poner
que tend to refer to textual scenes within the
a su servicio al organismo vivo más
same discourse segment or introduced in the
desarrollado que existe: el ser
previous sentence, while demonstratives and
humano.] Es éste un hecho que hace
quasi-pronominal DDs can refer to scenes that
temblar el edificio que la humanidad
are more than one sentence away. Since it is not
ha construido.11
a trivial matter to decide the exact part of the
(6) [La Coordinadora de Organizaciones
text that serves as antecedent, we distinguish
de Agricultores y Ganaderos teme
between two SUBTYPE attributes for dx:
que la falta de lluvia afecte también
a los regadíos, dado que empieza a
(i)
subtype=“sent” (sentential)
reducirse el volumen de agua
This subclass covers the less problematic
embalsada.] Este temor es
cases of discourse deixis, i.e. those anaphoric
compartido por...12
NPs that refer to a textual scene whose extent is
no longer than one sentence (any discourse
(ii)
subtype=“text” (textual scene)
segment from period to period). We mark the
The textual scene subtype includes those cases
non-nominal antecedent as a <seg> element
discussed in Section 3 ((1) and (2)), where an
with
an
ID
number,
which
fills
the
anaphoric expression refers to the whole scene
<coref:anchor>. When in doubt about the exact
built up by the preceding text. These are cases
delimitation of the text segment, the entire
that result from global discourse effects, so the
sentence is marked-up. For ease of presentation,
precise anchor goes beyond the single sentence
(4a) shows the extent of the antecedent for the
level and is usually vague in reference.
anaphoric demonstrative NP este camino9,
whereas (4b) codes the link as it is done in the

annotation of the CESS-Ancora corpus.
10 (4) The minister Anna Birulés stimulated the
Taking into account that the pronoun alone
SMEs [to invest in Research and Development] and
is not enough to pick up its referent, but that
showed the present businessmen the Government’s
this is made clear from the predicate
willingness to facilitate this path.
11 (5) Nevertheless, [viruses manage to put at
complement information (Byron, 2000), we
their service the most developed living organism that
further determine the “sent” value with the
exists: the human being.] This is a fact that makes
semantic type of the antecedent: “sent-ev” for
the edifice that humanity has built tremble.

12 (6) [The Coordinator of Organisation of
9 In the examples, underlines correspond to
Farmers fears that the lack of rain also affects
anaphoric expressions, while square brackets
irrigations, given that the volume of dammed water
identify their antecedents.
is starting to decrease.] This fear is shared by...

209

Marta Recasens, Antonia Martí Antonín y Mariona Taulé
Therefore, as <coref:anchor> we indicate the ID
In our annotation scheme, we consider NPs
of the paragraph (<par>) to which the anaphor
such as that in (8) as generic. They are framed
belongs, thus indicating that the reference is
by the textual scene, but do not require any
made to the textual scene going from the
anchor for their interpretation. Therefore, first-
beginning of the paragraph to the anaphor. As
mentions of such NPs are considered to be SDs,
example, (7) shows the annotation for the
while subsequent mentions are annotated as
anaphoric NP in (1).
identity coreference.
(7) <de type=“pd0ns00” ID=“de_09”>
We limit the term bridging to NPs (either
Esto </de> permitirá al banco sanear
definite or demonstrative) that are metonymically
su portafolio.13
interpreted –to a greater or lesser extent– on the

<coref:link ID=“de_09” type=“dx”
basis of a former NP or VP. The second
subtype=“text” > <coref:anchor
discourse entity is anchored on the entity which
ID=“par_05”/> </coref:link>
contributes to activating the necessary scene for
Demonstratives which are part of idiomatic
its interpretation. Within the “text as scene”
phrases, such as the connectors de esta forma or
approach, all bridging relations are taken to be
en este sentido, are not considered as
contextual scene relations. So we only
markables, since they are mere linking phrases.
subspecify three very basic distinctions, which
tend to be widely agreed upon. The three
4.2
Bridging relations (bridg)
SUBTYPE attributes are:
Bridging relations only make sense if we
(i)
subtype=“part” (part-of)
understand them as occurring within the
The
antecedent
of
the
anaphoric
NP
contextual scene triggered by the interaction
corresponds to the whole of which the anaphor
between two discourse entities. The set of
is a part, as in (9).
bridging relations is still an open issue (see the
(9) La reestructuración de [los otros
classification schemes of Clark, 1977; Vieira,
bancos checos] se está acompañando
1998; Poesio, 2000; Muñoz, 2001; Gardent,
por la reducción del personal.16
Manuélian & Kow, 2003), since rather than a
binary distinction between first-mention and
(ii)
subtype=“member” (set-member)
bridging NPs, there is a scale ranging from
As illustrated by (10), the subsequent NP refers
those definite NPs which are uniquely
to one or more members of the set expressed by
interpretable by means of world knowledge (i.e.
the NP anchor.
self-sufficient definite descriptions (SD)14) to
(10) a. [la tropa]...uno de los soldados.
those definite NPs which depend on a previous
anchor. Inevitably, however,
many
real
b. Ante [unas mil personas], entre
examples remain in between, as in (8), where
ellas la ministra de Ciencia y
todas las administraciones does not mean “all
Tecnología, Anna Birulés, el alcalde
administrations” (in the world), but just the
de Barcelona, Joan Clos, la
subset relevant to this scene.
Delegada del Gobierno, Julia García
(8) La última edición de Barnasants, el
Valdecasas, y una representación del
ciclo de canción de autor, ha atraído,
gobierno catalán, Pujol dijo...17
según su director, Pere Camps, a
unas 2.000 personas. Camps destaca
el apoyo unánime de todas las
administraciones en la edición de

este año.15
director, Pere Camps, about 2,000 people. Camps
emphasizes the unanimous support of all the

administrations in the edition of this year.
13 (7) This will allow the bank to reform its
16 (9) The restructuration of [the other Cheque
portfolio.
banks] is accompanied by the reduction of the staff.
14 We consider as SD those NPs with the definite
17 (10) a. [the troop]...one of the soldiers.
article that depend on no antecedent, but on world
b. Before about [one thousand people], among
knowledge. Their autonomy can result from their
them the minister of Science and Technology, Anna
generic reference, their containing an explanatory
Birulés, the mayor of Barcelona, Joan Clos, the
modifier, or their general uniqueness.
Delegate
of
the
Government,
Julia
García
15 (8) The last edition of Barnasants, the singer-
Valdecasas, and a representation of the Catalan
writer song cycle, has attracted, according to its
government, Pujol said...
210

Text as Scene: Discourse Deixis and Bridging Relations
(iii)
subtype =“them” (thematic)
pronominal DDs” as discourse deictics together
The anaphoric NP is related to a VP (the
with the inclusion of demonstrative NPs into
anchor) via a thematic relation. In (11), for
the range of potential candidates for bridging
instance, estas inversiones is the patient of the
relations.
previous verb invertir. Like sentential anchors
These guidelines complete the annotation
in discourse deixis, antecedents corresponding
scheme designed to enrich the Spanish CESS-
to VPs are marked by hand with a <seg> tag.
ECE corpus with coreference information, thus
(11) *0* podría hacer que la empresa
giving birth to the CESS-Ancora corpus. It is a
dominante dejara de [invertir en la
scheme rich enough to cover the different types
red] por no considerarla como una
of coreference in Spanish. Nevertheless,
inversión atractiva, y el Gobierno
coreference annotation is such a complex task –
debe incentivar estas inversiones.18
involving several types of linguistic items and
different factors responsible for linking two
If no subtype is specified, it means that the
items as coreferential– that we are currently
anaphoric NP is interpreted on the basis of a
conducting a reliability study on a subset of the
contextual scene, but that it is not related to its
corpus to investigate the feasibility and validity
anchor via a clear part-of, set-member or
of our annotation scheme. The results obtained
thematic
relation.
This
includes
cases
might lead us to extend and refine it. One of the
commonly referred to as “discourse topic” or
issues whose reliability needs to be proved is
general “inference” bridging. Examples can be
the extent to which abstract antecedents can be
found in (3) and (12).
semantically classified into events, facts and
(12) El cambio de [17 acciones de
propositions.
Alcan]...los accionistas.19
We believe that a 500,000-word corpus
annotated from the morphological to the
5
Conclusions and discussion
pragmatic level may shed new light on key
factors about the nature and working of
In this paper we have developed the specific
expressions creating coreference. It has not
framework, “text as scene”, on which we base
been determined yet, for instance, the way
the annotation guidelines for both discourse
contextual scenes come into play or their scope
deixis and bridging relations. The former is
(Fraurud, 1990). The CESS-Ancora corpus will
annotated as coreferring with a certain textual
provide quantitative data from natural written
scene, while the latter is coded on the basis of a
discourse from which it will be possible to infer
contextual scene activated by the conjunction of
more
precise
and
realistic
linguistic
two discourse entities.
generalisations about the use of coreferential
Given the rather vague antecedents that
and anaphoric expressions in Spanish.
anaphoric expressions interpreted via either of
On the other hand, the rich tagset that
these relations have, the annotation of both
distinguishes seven types of coreferential
discourse deixis and bridging relations has
relations and that separates individual from
usually obtained considerably low inter-
abstract
anaphora
(each
with
different
annotator agreement. Our annotation scheme is
attributes) makes the CESS-Ancora corpus a
unique in that we deal with these two relations
very fruitful language resource. Being publicly
from a common framework. In contrast to other
released, it shall be used both for training and
annotation
schemes,
ours
assumes
two
evaluating coreference resolution systems, as
additional sources for the referent to be
well as in competitions such as ACE or ARE.
interpreted –a textual and a contextual scene–,
In brief, the goal of our project is twofold.
which allow broader categories and thus more
From a computational perspective, the CESS-
flexible annotation guidelines. Other interesting
Ancora corpus will be used to construct an
contributions
of
our
scheme
are
the
automatic corpus-based coreference resolution
consideration of what we call “quasi-
system for Spanish. From a linguistic point of

view, hypotheses on the use of coreferential
18 (11) S/he could make the dominant company
expressions (Ariel, 1988; Gundel et al., 1993)
stop [investing in the net] for not considering it as an
will be tested on the basis of the annotated data
attractive inversion, and the Government must
and new linguistic theories might emerge.
motivate these inversions.
19 (12) The change of [17 shares] of Alcan...the
shareholders.
211

Marta Recasens, Antonia Martí Antonín y Mariona Taulé
Acknowledgments
Semantics from a different point of view.
Springer Verlag, Berlin.
We would like to thank Mihai Surdeanu for his
helpful advice and suggestions.
Marcu, D. 1997. The Rhetorical Parsing,
This paper has been supported by the FPU
Summarization, and Generation of Natural
grant (AP2006-00994) from the Spanish
Language Texts. PhD Thesis, Department of
Ministry of Education and Science. It is based
Computer Science, University of Toronto.
on work supported by the CESS-ECE
Muñoz, R. 2001. Tratamiento y resolución de
(HUM2004-21127), Lang2World (TIN2006-
las descripciones definidas y su aplicación
15265-C06-06), and Praxem (HUM2006-
en sistemas de extracción de información.
27378-E) projects.
PhD Thesis, Departamento de Lenguajes y
Sistemas Informáticos, Universidad de
References
Alicante.
Ariel, M. 1988. Referring and accessibility.
Poesio, M. 2000. MATE Dialogue Annotation
Journal of Linguistics, 24(1):65-87.
Guidelines – Coreference. Deliverable D2.1.
Byron, D. K. 2000. Semantically enhanced
http://www.ims.uni-stuttgart.de/projekte/mate/mdag
pronouns. In Proceedings of the 3rd
Poesio, M. and R. Vieira. 1998. A corpus-based
Discourse
Anaphora
and
Anaphor
investigation of definite description use.
Resolution
Colloquium
(DAARC2000),
Computational Linguistics, 24(2):183-216.
Lancaster.
Recasens, M., M.A. Martí, and M. Taulé. 2007.
Byron, D. K. 2002. Resolving pronominal
Where anaphora and coreference meet.
reference to abstract entities. In Proceedings
Annotation in the Spanish CESS-ECE
of the 40th
Annual Meeting of the
corpus. In Proceedings of the International
Association for Computational Linguistics
Conference on Recent Advances in Natural
(ACL'02), Philadelphia, 80-87.
Language
Processing
(RANLP2007),
Clark, H. 1977. Bridging. In P.N. Johnson-
Borovets, Bulgaria, forthcoming.
Laird and P.C.Wason (editors), Thinking:
Tutin, A., F. Trouilleux, C. Clouzot, E.
Readings in Cognitive Science, Cambridge
Gaussier, A. Zaenen, S. Rayot, and G.
University Press.
Antoniadis. 2000. Annotating a large corpus
Fraurud, K. 1990. Definiteness and the
with anaphoric links. In Proceedings of the
processing of NPs in natural discourse.
3rd Discourse Anaphora and Anaphor
Journal of Semantics, 7:395-433.
Resolution
Colloquium
(DAARC2000),
Lancaster.
Gardent, C., H. Manuélian, and E. Kow. 2003.
Which
bridges
for
bridging
definite
Vieira,
R.
1998.
Definite
Description
descriptions? In Proceedings of the EACL
Processing in Unrestricted Texts. Ph.D.
2003
Workshop
on
Linguistically
Thesis, University of Edinburgh, Centre for
Interpreted Corpora, Budapest, 69-76.
Cognitive Science.
Gundel, J., N. Hedberg, and R. Zacharski. 1993.
Vieira, R., S. Salmon-Alt, C. Gasperin, E.
Cognitive status and the form of referring
Schang, and G. Othero. 2002. Coreference
expressions
in
discourse.
Language,
and anaphoric relations of demonstrative
69(2):274-307.
noun phrases in a multilingual corpus. In
Proceedings of the 4th Discourse Anaphora
Gundel, J., N. Hedberg, and R. Zacharski. 2000.
and
Anaphor
Resolution
Colloquium
Statut cognitif et forme des anaphoriques
(DAARC2002), Lisbon.
indirects. Verbum, 22:79-102.
Webber, B. 1988. Discourse deixis: reference to
Hirschman, L. and N. Chinchor. 1997. MUC-7
discourse segments. In Proceedings of the
coreference task definition. In MUC-7
26th Annual Meeting of the Association for
Proceedings.
Science
Applications
Computational Linguistics (ACL'88), New
International Corporation.
York, 113-122.
Lewis, D. 1979. Score keeping in a language
game. In R. Bäuerle et al. (editors),
212

Download
Text as Scene: Discourse Deixis and Bridging Relations

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Text as Scene: Discourse Deixis and Bridging Relations to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Text as Scene: Discourse Deixis and Bridging Relations as:

From:

To:

Share Text as Scene: Discourse Deixis and Bridging Relations.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Text as Scene: Discourse Deixis and Bridging Relations as:

Copy html code above and paste to your web page.

loading