Statistical Sentence Condensation using Ambiguity Packing and Stochastic
Disambiguation Methods for Lexical-Functional Grammar
Stefan Riezler and Tracy H. King and Richard Crouch and Annie Zaenen
Palo Alto Research Center, 3333 Coyote Hill Rd., Palo Alto, CA 94304
{riezler|thking|crouch|zaenen}@parc.com
Abstract
a sentence condensation module designed for com-
bination with sentence extraction systems (Knight
We present an application of ambigu-
and Marcu, 2000; Jing, 2000). The challenge for
ity packing and stochastic disambiguation
such systems is to guarantee the grammaticality and
techniques for Lexical-Functional Gram-
summarization quality of the system output, i.e. the
mars (LFG) to the domain of sentence
generated sentences need to be syntactically well-
condensation. Our system incorporates
formed and need to retain the most salient informa-
a linguistic parser/generator for LFG, a
tion of the original document. For example a sen-
transfer component for parse reduction
tence extraction system might choose a sentence
operating on packed parse forests, and
like:
a maximum-entropy model for stochas-
tic output selection. Furthermore, we pro-
The UNIX operating system, with implementations
pose the use of standard parser evaluation
from Apples to Crays, appears to have the advan-
methods for automatically evaluating the
tage.
summarization quality of sentence con-
from a document, which could be condensed as:
densation systems. An experimental eval-
uation of summarization quality shows a
UNIX appears to have the advantage.
close correlation between the automatic
parse-based evaluation and a manual eval-
In the approach of Witbrock and Mittal (1999),
uation of generated strings. Overall sum-
selection and ordering of summary terms is based
marization quality of the proposed system
on bag-of-words models and n-grams. Such mod-
is state-of-the-art, with guaranteed gram-
els may well produce summaries that are indica-
maticality of the system output due to the
tive of the original’s content; however, n-gram mod-
use of a constraint-based parser/generator.
els seem to be insufficient to guaranteee gram-
matical well-formedness of the system output. To
1
Introduction
overcome this problem, linguistic parsing and gen-
eration systems are used in the sentence conden-
Recent work in statistical text summarization has put
sation approaches of Knight and Marcu (2000) and
forward systems that do not merely extract and con-
Jing (2000). In these approaches, decisions about
catenate sentences, but learn how to generate new
which material to include/delete in the sentence
sentences from Summary, T ext tuples. Depend-
summaries do not rely on relative frequency infor-
ing on the chosen task, such systems either gener-
mation on words, but rather on probability models
ate single-sentence “headlines” for multi-sentence
of subtree deletions that are learned from a corpus
text (Witbrock and Mittal, 1999), or they provide
of parses for sentences and their summaries.
A related area where linguistic parsing systems
A second goal of our approach is to apply the
have been applied successfully is sentence simplifi-
standard evaluation methods for parsing to an auto-
cation. Grefenstette (1998) presented a sentence re-
matic evaluation of summarization quality for sen-
duction method that is based on finite-state tech-
tence condensation systems. Instead of deploying
nology for linguistic markup and selection, and
costly and non-reusable human evaluation, or using
Carroll et al. (1998) present a sentence simplifica-
automatic evaluation methods based on word error
tion system based on linguistic parsing. However,
rate or n-gram match, summarization quality can be
these approaches do not employ statistical learning
evaluated directly and automatically by matching re-
techniques to disambiguate simplification decisions,
duced f -structures produced by the system against
but rather iteratively apply symbolic reduction rules,
manually selected f -structures of a set of manually
producing a single output for each sentence.
created condensations. Such an evaluation only re-
The goal of our approach is to apply the
quires human labor for the construction and manual
fine-grained tools for stochastic disambiguation in
structural disambiguation of a reusable gold stan-
Lexical-Functional Grammar parsing to the task of
dard test set. Matching against the test set can be
sentence condensation. The system presented in this
done automatically and rapidly, and is repeatable for
paper is conceptualized as a tool that can be used as a
development purposes and system comparison. As
standalone system for sentence condensation or sim-
shown in an experimental evaluation, a close corre-
plification, or in combination with sentence extrac-
spondence can be established for rankings produced
tion for text-summarization beyond the sentence-
by the proposed f -structure based automatic evalu-
level. In our system, to produce a condensed ver-
ation and a manual evaluation of generated strings.
sion of a sentence, the sentence is first parsed us-
ing a broad-coverage LFG grammar for English. The
2
Statistical Sentence Condensation in the
parser produces a set of functional (f )-structures
LFG Framework
for an ambiguous sentence in a packed format. It
In the following, each of the system components will
presents these to the transfer component in a sin-
be described in more detail.
gle packed data structure that represents in one place
the substructures shared by several different inter-
2.1 Parsing and Transfer
pretations. The transfer component operates on these
In this project, a broad-coverage LFG gram-
packed representations and modifies the parser out-
mar and parser for English was employed (see
put to produce reduced f -structures. The reduced
Riezler et al. (2002)). The parser produces a set
f -structures are then filtered by the generator to
of context-free constituent (c-)structures and as-
determine syntactic well-formedness. A stochastic
sociated functional (f -)structures for each in-
disambiguator using a maximum entropy model is
put sentence, represented in packed form (see
trained on parsed and manually disambiguated f -
Maxwell and Kaplan (1989)). For sentence conden-
structures for pairs of sentences and their conden-
sation we are only interested in the predicate-
sations. Using the disambiguator, the string gener-
argument structures encoded in f -structures. For ex-
ated from the most probable reduced f -structure
ample, Fig. 1 shows an f -structure manually se-
produced by the transfer system is chosen. In con-
lected out of the 40 f -structures for the sentence:
trast to the approaches mentioned above, our system
guarentees the grammaticality of generated strings
A prototype is ready for testing, and Leary hopes to
set requirements for a full system by the end of the
through the use of a constraint-based generator for
year.
LFG which uses a slightly tighter version of the
grammar than is used by the parser. As shown in an
The transfer component for the sentence con-
experimental evaluation of the system output, sum-
densation system is based on a component previ-
marization quality of our system is high, due to the
ously used in a machine translation system (see
combination of linguistically fine-grained analysis
Frank (1999)). In this application, it consists of an
tools and expressive stochastic disambiguation mod-
ordered set of rules that rewrite one f -structure into
els.
another. Structures are broken down into flat lists of
"A prototype is ready for testing , and Leary hopes to set requirements for a full system by the end of the year."
are added. delete-node(Z,r1) indicates that
PRED
’be<[93:ready]>[30:prototype]’
PRED
’prototype’
NTYPE GRAIN count
the structure rooted at node Z is to be deleted, and
SUBJ
PRED ’a’
SPEC
DET DET−FORM a, DET−TYPE indef
rule-trace(r1,del(Z,X)) adds a trace of
30 CASE nom, NUM sg, PERS 3
PRED
’ready<[30:prototype]>’
XCOMP
SUBJ
[30:prototype]
93 ADEGREE positive, ATYPE predicative
this rule to an accumulating history of rule applica-
PRED
’for<[141:test]>’
PRED
’test’
tions. This history records the relation of transferred
ADJUNCT
OBJ
NTYPE GRAIN gerund
141 CASE acc, NUM sg, PERS 3, PFORM for, VTYPE main
125 ADV−TYPE vpadv, PSEM unspecified, PTYPE sem
f -structures to the original f -structure and is avail-
TNS−ASP MOOD indicative, PERF −_, PROG −_, TENSE pres
PASSIVE −, STMT−TYPE decl, VTYPE copular
>s
[252:hope]
73
able for stochastic disambiguation.
PRED
’hope<[235:Leary], [280:set]>’
PRED
’Leary’
Rules used in the sentence condensation transfer
GRAIN proper
SUBJ
NTYPE NSEM PROPER name
235 ANIM +, CASE nom, NUM sg, PERS 3
system include the optional deletion of all adjuncts
PRED
’set<[235:Leary], [336:requirement], [355:for]>’
SUBJ
[235:Leary]
PRED
’requirement’
except negatives (e.g., He slept in the bed. can be-
OBJ
NTYPE GRAIN unspecified
336 CASE acc, NUM pl, PERS 3
PRED
’for<[391:system]>’
come He slept., but He did not sleep. cannot become
PRED
’system’
PRED ’full’
ADJUNCT 398 ADEGREE positive, ADJUNCT−TYPE nominal, ATYPE attributive
He did sleep. or He slept.), the optional deletion
OBL
OBJ
NTYPE
GRAIN unspecified
PRED ’a’
SPEC
DET
of parts of coordinate structures (e.g., They laughed
DET−FORM a, DET−TYPE indef
391 CASE acc, NUM sg, PERS 3, PFORM for
355 PSEM unspecified, PTYPE sem
and giggled. can become They giggled.), and simpli-
PRED
’by<[469:end]>’
XCOMP
PRED
’end’
PRED
’of<[519:year]>’
fications (e.g. It is clear that the earth is round. can
PRED
’year’
NTYPE GRAIN count
become The earth is round.). For example, one pos-
ADJUNCT
OBJ
PRED ’the’
SPEC
DET DET−FORM the, DET−TYPE def
ADJUNCT
OBJ
519 CASE acc, NUM sg, PERS 3, PFORM of
sible post-transfer output of the sentence in Fig. 1 is
512 ADJUNCT−TYPE nominal, PSEM unspecified, PTYPE sem
NTYPE
GRAIN count
shown in Fig. 2.
PRED ’the’
SPEC
DET DET−FORM the, DET−TYPE def
469 CASE acc, NUM sg, PERS 3, PFORM by
451 ADV−TYPE vpadv, PSEM unspecified, PTYPE sem
TNS−ASP PERF −_, PROG −_
"A prototype is ready for testing."
280 INF−FORM to, PASSIVE −, VTYPE main
TNS−ASP
MOOD indicative, PERF −_, PROG −_, TENSE pres
252 PASSIVE −, STMT−TYPE decl, VTYPE main
PRED
’be<[93:ready]>[30:prototype]’
197 COORD +_, COORD−FORM and, COORD−LEVEL ROOT
PRED
’prototype’
NTYPE GRAIN count
SUBJ
PRED ’a’
Figure 1: F -structure for non-condensed sentence.
SPEC
DET DET−FORM a, DET−TYPE indef
30 CASE nom, NUM sg, PERS 3
PRED
’ready<[30:prototype]>’
XCOMP
SUBJ
[30:prototype]
93 ADEGREE positive, ATYPE predicative
facts, and rules may add, delete, or change individ-
PRED
’for<[141:test]>’
ual facts. Rules may be optional or obligatory. In the
PRED
’test’
ADJUNCT
OBJ
NTYPE GRAIN gerund
case of optional rules, transfer of a single input struc-
141 CASE acc, NUM sg, PERS 3, PFORM for, VTYPE main
125 ADV−TYPE vpadv, PSEM unspecified, PTYPE sem
ture may lead to multiple alternate output structures.
TNS−ASP MOOD indicative, PERF −_, PROG −_, TENSE pres
The transfer component is designed to operate on
73 PASSIVE −, STMT−TYPE decl, VTYPE copular
packed input from the parser and can also produce
packed representations of the condensation alterna-
Figure 2: Gold standard f -structure reduction.
tives, using methods adapted from parse packing.1
An example rule that (optionally) removes an ad-
junct is shown below:
2.2
Stochastic Selection and Generation
+adjunct(X,Y), in-set(Z,Y)
?=>
The transfer rules are independent of the grammar
delete-node(Z,r1), rule-trace(r1,del(Z,X)).
and are not constrained to preserve the grammatical-
This rule eliminates an adjunct, Z, by deleting the
ity or well-formedness of the reduced f-structures.
fact that Z is contained within the set of adjuncts,
Some of the reduced structures therefore may not
Y, associated with the expression X. The + before
correspond to any English sentence, and these are
the adjunct(X,Y) fact marks this fact as one
eliminated from future consideration by using the
that needs to be present for the rule to be applied,
generator as a filter. The filtering is done by running
but which is left unaltered by the rule application.
each transferred structure through the generator to
The in-set(Z,Y) fact is deleted. Two new facts
see whether it produces an output string. If it does
not, the structure is rejected. For example, for the
1The packing feature of the transfer component could not
f -structure in Fig. 1, the transfer system proposed
be employed in these experiments since the current interface
to the generator and stochastic disambiguation component still
32 possible reductions. After filtering these struc-
requires unpacked representations.
tures by generation, 16 reduced f -structures com-
prising possible condensations of the input sentence
• Property-functions indicating cooccurences of
survive. The 16 well-formed structures correspond
verb stems and subcategorization frames (≈
to the following strings that were outputted by the
12,000 properties)
generator (note that a single structure may corre-
• Property-functions indicating transfer rules
spond to more than one string and a given string may
used to arrive at the reduced f - structures (≈
correspond to more than one structure):
60 properties).
A prototype is ready.
A prototype is ready for testing.
A trained probability model is applied to un-
Leary hopes to set requirements for a full system.
seen data by selecting the most probable transferred
A prototype is ready and Leary hopes to set require-
f -structure, yielding the string generated from the
ments for a full system.
A prototype is ready for testing and Leary hopes to
selected structure as the target condensation. The
set requirements for a full system.
transfered f -structure chosen for our current exam-
Leary hopes to set requirements for a full system by
ple is shown in Fig. 3.
the end of the year.
A prototype is ready and Leary hopes to set require-
ments for a full system by the end of the year.
"A prototype is ready."
A prototype is ready for testing and Leary hopes to
PRED
’be<[93:ready]>[30:prototype]’
set requirements for a full system by the end of the
PRED
’prototype’
year.
NTYPE GRAIN count
SUBJ
PRED ’a’
SPEC
DET
After filtering by the generator, the remaining
DET−FORM a, DET−TYPE indef
30 CASE nom, NUM sg, PERS 3
f -structures were weighted by the stochastic dis-
PRED
’ready<[30:prototype]>’
ambiguation component. Similar to stochastic dis-
XCOMP
SUBJ
[30:prototype]
93 ADEGREE positive, ATYPE predicative
ambiguation for constraint-based parsing (Johnson
TNS−ASP MOOD indicative, PERF −_, PROG −_, TENSE pres
73 PASSIVE −, STMT−TYPE decl, VTYPE copular
et al., 1999; Riezler et al., 2002), an exponential
(a.k.a. log-linear or maximum-entropy) probability
model on transferred structures is estimated from a
Figure 3: Transferred f -structure chosen by system.
set of training data. The data for estimation consists
of pairs of original sentences y and gold-standard
This structure was produced by the following set
summarized f -structures s which were manually se-
of transfer rules, where var refers to the indices in
lected from the transfer output for each sentence. For
the representation of the f -structure:
training data {(s
rtrace(r13,keep(var(98),of)),
j , yj )}m
j=1 and a set of possible sum-
rtrace(r161,keep(system,var(85))),
marized structures S(y) for each sentence y, the ob-
rtrace(r1,del(var(91),set,by)),
jective was to maximize a discriminative criterion,
rtrace(r1,del(var(53),be,for)),
namely the conditional likelihood L(λ) of a summa-
rtrace(r20,equal(var(1),and)),
rtrace(r20,equal(var(2),and)),
rized f -structure given the sentence. Optimization
rtrace(r2,del(var(1),hope,and)),
of the function shown below was performed using a
rtrace(r22,delb(var(0),and)).
conjugate gradient optimization routine:
These rules delete the adjunct of the first conjunct
m
(for testing), the adjunct of the second conjunct (by
eλ·f(sj)
L(λ) = log
.
the end of the year), the rest of the second conjunct
j=1
s∈S(yj) eλ·f(s)
(Leary hopes to set requirements for a full system),
and the conjunction itself (and).
At the core of the exponential probability model is a
vector of property-functions f to be weighted by pa-
3
A Method for Automatic Evaluation of
rameters λ. For the application of sentence conden-
Sentence Summarization
sation, 13,000 property-functions of roughly three
Evaluation of quality of sentence condensation sys-
different categories were used:
tems, and of text summarization and simplification
• Property-functions indicating attributes, attri- systems in general, has mostly been conducted as
bute-combinations, or attribute-value pairs for
intrinsic evaluation by human experts. Recently, Pa-
f -structure attributes (≈ 1,000 properties)
pineni et al.’s (2001) proposal for an automatic eval-
uation of translation systems by measuring n-gram
det_type(a:7, indef),
matches of the system output against reference ex-
adjunct(be:0, for:12),
obj(for:12, test:14),
amples has become popular for evaluation of sum-
adv_type(for:12, vpadv),
marization systems. In addition, an automatic eval-
psem(for:12, unspecified),
uation method based on context-free deletion deci-
ptype(for:12, semantic),
num(test:14, sg),
sions has been proposed by Jing (2000). However,
pers(test:14, 3),
for summarization systems that employ a linguistic
pform(test:14, for),
parser as an integral system component, it is pos-
vtype(test:14, main).
sible to employ the standard evaluation techniques
Matching these f -structures against each other cor-
for parsing directly to an evaluation of summariza-
responds to a precision of 1, recall of .61, and F-
tion quality. A parsing-based evaluation allows us
score of .76.
to measure the semantic aspects of summarization
The fact that our method does not rely on a com-
quality in terms of grammatical-functional informa-
parison of the characteristics of surface strings is a
tion provided by deep parsers. Furthermore, human
clear advantage. Such comparisons are bad at han-
expertise was necessary only for the creation of
dling examples which are similar in meaning but
condensed versions of sentences, and for the man-
differ in word order or vary structurally, such as in
ual disambiguation of parses assigned to those sen-
passivization or nominalization. Our method han-
tences. Given such a gold standard, summarization
dles such examples straightforwardly. Fig. 4 shows
quality of a system can be evaluated automatically
two serialization variants of the condensed sentence
and repeatedly by matching the structures of the sys-
of Fig. 2. The f -structures for these examples are
tem output against the gold standard structures. The
similar to the f -structure assigned to the gold stan-
standard metrics of precision, recall, and F-score
dard condensation shown in Fig. 2 (except for the re-
from statistical parsing can be used as evaluation
lations ADJUNT-TYPE:parenthetical versus
metrics for measuring matching quality: Precision
ADV-TYPE:vpadv versus ADV-TYPE:sadv).
measures the number of matching structural items
An evaluation of summarization quality that is based
in the parses of the system output and the gold stan-
on matching f -structures will treat these exam-
dard, out of all structural items in the system out-
ples equally, whereas an evaluation based on string
put’s parse; recall measures the number of matches,
matching will yield different quality scores for dif-
out of all items in the gold standard’s parse. F-score
ferent serializations.
balances precision and recall as (2 × precision ×
In the next section, we report experimental re-
recall)/(precision + recall).
sults of an automatic evaluation of the sentence con-
For the sentence condensation system presented
densation system described above, and show a close
above, the structural items to be matched consist of
correspondence between the automatically produced
relation(predicate, argument) triples. For example,
evaluation results and human judgments on the qual-
the gold-standard f -structure of Fig. 2 corresponds
ity of generated condensed strings.
to 23 dependency relations, the first 14 of which are
4 Experimental Evaluation
shared with the reduced f -structure chosen by the
stochastic disambiguation system:
The
sentences
and
condensations
we
used
tense(be:0, pres),
are taken from data for the experiments of
mood(be:0, indicative),
Knight and Marcu (2000), which were provided to
subj(be:0, prototype:2),
us by Daniel Marcu. These data consist of pairs
xcomp(be:0, ready:1),
stmt_type(be:0, declarative),
of sentences and their condensed versions that
vtype(be:0, copular),
have been extracted from computer-news articles
subj(ready:1, prototype:2),
and abstracts of the Ziff-Davis corpus. Out of
adegree(ready:1, positive),
atype(ready:1, predicative),
these data, we parsed and manually disambiguated
det(prototype:2, a:7),
500 sentence pairs. These included a set of 32
num(prototype:2, sg),
pers(prototype:2, 3),
sentence pairs that were used for testing purposes
det_form(a:7, a),
in Knight and Marcu (2000). In order to control for
"A prototype, for testing, is ready."
per bound was measured by recording the F-score
PRED
’be<[221:ready]>[30:prototype]’
for the f -structure that received highest probability
PRED
’prototype’
NTYPE GRAIN count
according to the learned distribution on transferred
SUBJ
PRED ’a’
SPEC
DET DET−FORM a, DET−TYPE indef
structures.
30 CASE nom, NUM sg, PERS 3
PRED
’ready<[30:prototype]>’
In order to make our results comparable to the
XCOMP
SUBJ
[30:prototype]
221 ADEGREE positive, ATYPE predicative
results of Knight and Marcu (2000) and also to in-
PRED
’for<[117:test]>’
PRED
’test’
vestigate the correspondence between the automatic
ADJUNCT
OBJ
NTYPE GRAIN gerund
117 CASE acc, NUM sg, PERS 3, PFORM for, VTYPE main
evaluation and human judgments, a manual evalua-
73 ADJUNCT−TYPE parenthetical, PSEM unspecified, PTYPE sem
TNS−ASP
MOOD indicative, PERF −_, PROG −_, TENSE pres
tion of the strings generated by these system vari-
201 PASSIVE −, STMT−TYPE decl, VTYPE copular
ants was conducted. Two human judges were pre-
"For testing, a prototype is ready."
sented with the uncondensed surface string and five
PRED
’be<[177:ready]>[131:prototype]’
PRED
’prototype’
condensed strings that were displayed in random
NTYPE GRAIN count
SUBJ
PRED ’a’
order for each test example. The five condensed
SPEC
DET DET−FORM a, DET−TYPE indef
131 CASE nom, NUM sg, PERS 3
strings presented to the human judges contained
PRED
’ready<[131:prototype]>’
XCOMP
SUBJ
[131:prototype]
(1) strings generated from three randomly selected
177 ADEGREE positive, ATYPE predicative
PRED
’for<[27:test]>’
f -structures, (2) the strings generated from the f -
PRED
’test’
ADJUNCT
OBJ
NTYPE GRAIN gerund
structures which were selected by the stochastic
27 CASE acc, NUM sg, PERS 3, PFORM for, VTYPE main
11 ADV−TYPE sadv, PSEM unspecified, PTYPE sem
model, and (3) the manually created gold-standard
TNS−ASP
MOOD indicative, PERF −_, PROG −_, TENSE pres
condensations extracted from the Ziff-Davis ab-
83 PASSIVE −, STMT−TYPE decl, VTYPE copular
stracts. The judges were asked to judge summariza-
tion quality on a scale of increasing quality from 1
Figure 4: F -structure for word-order variants of gold
to 5 by assessing how well the generated strings re-
standard condensation.
tained the most salient information of the original
uncondensed sentences. Grammaticality of the sys-
the small corpus size of this test set, we randomly
tem output is optimal and not reported separately.
extracted an additional 32 sentence pairs from the
Results for both evaluations are reported for two
500 parsed and disambiguated examples as a second
test corpora of 32 examples each. Testset I contains
test set. The rest of the 436 randomly selected
the sentences and condensations used to evaluate the
sentence pairs were used to create training data.
system described in Knight and Marcu (2000). Test-
For the purpose of discriminative training, a gold-
set II consists of another randomly extracted 32 sen-
standard of transferred f -structures was created
tence pairs from the same domain, prepared in the
from the transfer output and the manually selected
same way.
f -structures for the condensed strings. This was
Fig. 5 shows evaluation results for a sentence
done automatically by selecting for each example
condensation run that uses manually selected f -
the transferred f -structure that best matched the
structures for the original sentences as input to the
f -structure annotated for the condensed string.
transfer component. These results demonstrate how
In the automatic evaluation of f -structure match,
the condenstation system performs under the opti-
three different system variants were compared.
mal circumstances when the parse chosen as input
Firstly, randomly chosen transferred f -structures
is the best available. Fig. 6 applies the same eval-
were matched against the manually selected f -
uation data and metrics to a sentence condensation
structures for the manually created condensations.
experiment that performs transfer from packed f -
This evaluation constitutes a lower bound on the
structures, i.e. transfer is performed on all parses
F-score against the given gold standard. Secondly,
for an ambiguous sentence instead of on a single
matching results for transferred f -structures yield-
manually selected parse. Alternatively, a single input
ing the maximal F-score against the gold stan-
parse could be selected by stochastic models such
dard were recorded, giving an upper bound for the
as the one described in Riezler et al. (2002). Such a
system. Thirdly, the performance of the stochastic
separate phase of parse disambiguation, and perhaps
model within the range of the lower bound and up-
the effects of any errors that this might introduce,
lower
system
upper
lower
system
upper
testset I
testset I
bound
selection
bound
bound
selection
bound
F-score
58%
67.3%
77.2 %
F-score
55.2%
63.0%
72.0%
sum-quality
2.0
3.5
4.4
sum-quality
2.1
3.4
4.4
compr.
50.2%
60.4%
54.9%
compres.
46.5%
61.6%
54.9%
lower
system
upper
lower
system
upper
testset II
testset II
bound
selection
bound
bound
selection
bound
F-score
59%
65.4%
83.3%
F-score
54%
59.7%
76.0 %
sum-quality
2.1
3.4
4.6
sum-quality
1.9
3.3
4.6
compr.
52.7%
65.9%
56.8%
compres.
50.9%
60.0%
56.8%
Figure 5: Sentence condensation from manually se-
Figure 6: Sentence condensation from packed f -
lected f -structure for original uncondensed sen-
structures for original uncondensed sentences.
tences.
report compression ratios. As can be seen from
can be avoided by transferring from all parses for
these tables, the ranking of system variants pro-
an ambiguous sentence. This approach is computa-
duced by the automatic and manual evaluation con-
tionally feasible, however, only if condensation can
firm a close correlation between the automatic eval-
be carried all the way through without unpacking.
uation and human judgments. A comparison of eval-
Our technology is not yet able to do this (in partic-
uation results across colums, i.e. across selection
ular, as mentioned earlier, we have not yet imple-
variants, shows that a stochastic selection of trans-
mented a method for stochastic disambiguation on
ferred f -structures is indeed important. Even if all
packed structures). However, we conducted a pre-
f -structures are transferred from the same linguisti-
liminary assessment of this possibility by unpacking
cally rich source, and all generated strings are gram-
and enumerating the transferred f -structures. For
matical, a reduction in error rate of around 50% rela-
many sentences this resulted in more candidates than
tive to the upper bound can be achieved by stochastic
we could operate on in the available time and space,
selection. In contrast, a comparison between trans-
and in those cases we arbitrarily set a cut-off on the
fer runs with and without perfect disambiguation of
number of transferred f -structures we considered.2
the original string shows a decrease of about 5%
The result of this experiment, shown in Fig. 6, thus
in F-score,3 and of only .1 points for summariza-
provides a conservative estimate on the quality of the
tion quality when transferring from packed parses
condensations we might achieve with a full-packing
instead of from the manually selected parse. This
implementation.
shows that it is more important to learn what a good
In Figs. 5 and 6, the first row shows F-scores
transferred f -structure looks like than to have a per-
for a random selection, the system selection, and
fect f -structure to transfer from. The compression
the best possible selection from the transfer output
rates associated with the systems that used stochas-
against the gold standard. The second rows show
tic selection is around 60%, which is acceptable, but
summarization quality scores for generations from
not as aggressive as human-written condensations.
a random selection and the system selection, and
Overall, the summarization quality achieved by
for the human-written condensation. The third rows
our system is similar to the results reported in
2Since the output of the transfer system is set up to pro-
Knight and Marcu (2000). However, the data used
duce smaller f -strucutures first, i.e. transferred f -structures are
in the experiments of Knight and Marcu (2000) and
produced according to the number of rules applied to transfer
them, a cutoff on the transfer output will keep more condensed
3The drop in F-score for the upper bound compared to trans-
variants and discard less condensed ones. Furthermore, with our
fer from a manually selected parse is due to a fall-back mech-
current implementation, in some cases the transfer component
anism for transferring from a randomly selected parse that had
was also unable to operate on the packed represenation, and in
to be applied occasionally in the current experiments. Despite
those cases we chose a parse at random, again in order to pro-
these near-term system deficiencies, overall results on summa-
vide a conservative estimate of condensation quality.
rization quality are still state-of-the-art.
therefore in our experiments are somewhat artifi-
the MT Summit VII. MT in the Great Translation Era,
cial: The human-written condensations in the data
pages 134–142. Kent Ridge Digital Labs, Singapore.
set extracted from the Ziff-Davis corpus show
Stuart Geman and Mark Johnson.
2002.
Dynamic
the same word order as the original sentences
programming for parsing and estimation of stochas-
and do not exhibit structural modifications such
tic unification-based grammars. In Proceedings of the
as nominalization, which are common in human-
40th Annual Meeting of the Association for Computa-
tional Linguistics (ACL’02), Philadelphia, PA.
written summaries. Also, no additional condensa-
tions were created for the original sentences other
Gregory Grefenstette. 1998. Producing intelligent tele-
than the condensed versions extracted from the
graphic text reduction to provide an audio scanning
human-written abstracts. This simplifies the con-
service for the blind.
In Proceedings of the AAAI
Spring Workshop on Intelligent Text Summarization,
densation task strictly to the operation of dele-
Stanford, CA.
tion. Clearly, it would be desirable to match each
system output against any number of independent
Hongyan Jing. 2000. Sentence reduction for automatic
text summarization. In Proceedings of the 6th Applied
human-written condensations of the same origi-
Natural Language Processing Conference (ANLP’00),
nal sentence, without restrictions on word-order
Seattle, WA.
or structural modifications. The idea of comput-
ing matching scores to multiple reference exam-
Mark Johnson, Stuart Geman, Stephen Canon, Zhiyi Chi,
and Stefan Riezler. 1999. Estimators for stochastic
ples was proposed by Alshawi et al. (1998), and
“unification-based” grammars. In Proceedings of the
later by Papineni et al. (2001) for evaluation of ma-
37th Annual Meeting of the Association for Computa-
chine translation systems. Similar to these propos-
tional Linguistics (ACL’99), College Park, MD.
als, an evaluation of condensation quality could con-
Kevin Knight and Daniel Marcu. 2000. Statistics-based
sider multiple reference condensations and record
summarization—step one: Sentence compression. In
the matching score against the most similar exam-
Proceedings of the 17th National Conference on Arti-
ple. Also, an evaluation of our system on a corpus of
ficial Intelligence (AAAI-2000), Austin, TX.
more diverse condensations would test our system in
John Maxwell and Ronald M. Kaplan.
1989.
An
a more interesting way, and is thus a desideratum for
overview of disjunctive constraint satisfaction. In Pro-
future work. Furthermore, work on employing pack-
ceedings of the International Workshop on Parsing
ing techniques not only for parsing and transfer, but
Technologies, Pittsburgh, PA.
also for generation and stochastic selection is cur-
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
rently underway (see Geman and Johnson (2002)).
Jing Zhu.
2001.
Bleu: a method for automatic
This will eventually lead to a system that completely
evaluation of machine translation. Technical Report
IBM Research Division Technical Report, RC22176
avoids costly unpacking of representations.
(W0190-022), Yorktown Heights, N.Y.
Stefan Riezler, Tracy H. King, Ronald M. Kaplan,
References
Richard Crouch, John T. Maxwell, and Mark John-
son. 2002. Parsing the Wall Street Journal using a
Hiyan Alshawi, Srinivas Bangalore, and Shona Douglas.
Lexical-Functional Grammar and discriminative esti-
1998.
Automatic acquisition of hierarchical trans-
mation techniques. In Proceedings of the 40th Annual
duction models for machine translation. In Proceed-
Meeting of the Association for Computational Linguis-
ings of the 36th Annual Meeting of the Association for
tics (ACL’02), Philadelphia, PA.
Computational Linguistics (ACL’98), Montreal, Que-
bec, Canada.
Michael J. Witbrock and Vibhu O. Mittal. 1999. Ultra-
summarization: A statistical approach to generating
John Carroll, Guido Minnen, Yvonne Canning, Siobhan
highly condensed non-extractive summaries. In Pro-
Devlin, and John Tait. 1998. Practical simplification
ceedings of the 22nd ACM SIGIR Conference on Re-
of english newspaper text to assist aphasic readers.
search and Development in Information Retrieval,
In Proceedings of the AAAI Workshop on Integrating
Berkeley, CA.
Artificial Intelligence and Assistive Technology, Madi-
son, WI.
Anette Frank. 1999. From parallel grammar develop-
ment towards machine translation. In Proceedings of
Add New Comment