This is not the document you are looking for? Use the search form below to find more!

Report home > Education

Converting the Penn Treebankto Systemic Functional Grammar

0.00 (0 votes)
Document Description
Systemic functional linguistics offers a grammar that is semantically organized, so that salient grammatical choices are made explicit. This paper describes the explication of these choices through the conversion of the Penn Tree bank into a systemic functional grammar corpus. Developing such a resource can help connect work in natural language processing to a significant body of research dealing explicitly with the issue of how lexical and grammatical selections create meaning.
File Details
Submitter
  • Name: armas
Embed Code:

Add New Comment




Related Documents

SYSTEMIC FUNCTIONAL GRAMMAR: A TOOL TO INVESTIGATE THE ...

by: dania, 42 pages

The objective of this study is to investigate the level of lexicogrammatical complexification of the Portuguese-English interlanguage of advanced learners. It is intended as a pilot cross-sectional ...

Systemic Functional Grammar and its pedagogical implications

by: monika, 6 pages

This paper focuses on the application of systemic Functional Grammar (SFG) to language study. By providing a sample of text analysis from the systemic functional point of view, the paper illustrates ...

SYSTEMIC FUNCTIONAL GRAMMAR: A FIRST STEP INTO THE THEORY

by: stephan, 30 pages

This is an introductory account of a particular theory of grammar, namely systemic-functional theory. Grammar is one of the subsystems of a language; more specifically, it is the system of wordings ...

A Probabilistic Representation of Systemic Functional Grammar

by: sebestyen, 13 pages

The notion of language as probabilistic is well known within Systemic Functional Linguistics. Aspects of language are discussed as meaningful tendencies, not as deterministic rules. In past ...

Systemic Functional Grammar

by: marco, 12 pages

Systemic Functional Grammar, a Power Point Report

The Role of the Lexicon in Lexical-Functional Grammar - Example on ...

by: heikki, 6 pages

The LFG model is based on an enriched lexicon, which contains associations between grammatical functions and their arguments, enabling decomposition on characteristic features and is suitable for ...

A Systemic Functional Micro-Grammar of Spanish Clitics

by: lantos, 6 pages

The word order patterns and participant role distribution of Spanish clitics are two well-known phenomena which have been thoroughly studied in Hispanic Linguistics from the perspective of both ...

LEXICAL-FUNCTIONAL GRAMMAR OF THE CROATIAN LANGUAGE: THEORETICAL ...

by: minna, 10 pages

Formal description aims to find the most suitable way to formalize a certain segment of the language, or some language phenomena at morphological, lexical, syntactic or semantic level. There is a ...

Parsing the Wall Street Journal usinga Lexical-Functional Grammar ...

by: christian, 8 pages

We present a stochastic parsing system consisting of a Lexical-Functional Grammar (LFG), a constraint-based parser and a stochastic disambiguation model. We report on the results of applying this ...

PARSING TURKISH USING THE LEXICAL FUNCTIONAL GRAMMAR FORMALISM

by: lyyli, 7 pages

This paper describes our work on parsing Turkish using the lexical-functional grammar formalism. This work represents the first effort for parsing Turkish. Our implementation is based on Tomita's ...

Content Preview
Converting the Penn Treebank to Systemic Functional Grammar
Matthew Honnibal
Department of Linguistics, Macquarie University
Macquarie University
2109 Sydney
Australia
mhonn@it.usyd.edu.au
Abstract
language processing (Munro, 2003; Couchman and
Whitelaw, 2003), and there is a strong history of
Systemic functional linguistics offers a grammar
interaction between systemic functional linguistics
that is semantically organised, so that salient gram-
and natural language generation (Matthiessen and
matical choices are made explicit. This paper de-
Bateman, 1991). However, there is currently a lack
scribes the explication of these choices through the
of computational SFG resources. There is no stan-
conversion of the Penn Treebank into a systemic
dard format for machine readable annotation, no an-
functional grammar corpus. Developing such a re-
notated corpora, and no useable parsers. Converting
source can help connect work in natural language
the Penn Treebank will make a large body of SFG
processing to a significant body of research dealing
annotated data available to computational linguists
explicitly with the issue of how lexical and gram-
for the first time, an important step towards address-
matical selections create meaning.
ing this situation.
1
Introduction
We first discuss some preliminaries relating to
the nature of systemic functional grammar, and the
The Penn Treebank was designed to maximise con-
scope of the converted corpus’s annotation.
We
sistency and annotator efficiency, rather than con-
then discuss the conversion of the treebank’s phrase-
formity with any particular linguistic theory (Mar-
structure representation to SFG constituency struc-
cus et al., 1994). This results in trees that strongly
ture, and finally we discuss the addition of interper-
suggest the use of synthetic features to explicate
sonal and textual function structures.
semantically significant grammatical choices like
mood, tense, voice or negation. These distinctions
2
Some preliminaries
lie latent in the configuration of the tree in the Tree-
bank II annotation scheme, making it difficult for a
2.1
Structure of the SFG analysis
machine learner to make use of them.
Systemic functional grammar divides the task of
Rather than the ad hoc addition of this informa-
grammatical analysis — the process of stating the
tion at the feature extraction stage, the corpus can be
grammatical properties of a text — into two parts:
re-presented in a way that makes feature extraction
analysis of syntactic structures, and analysis of
more principled. This involves increasing the size
function structures.
and complexity of the representation of a sentence
SFG syntactic analysis is constituency based,
by organising the tree semantically. Organising a
and is predicated on Halliday’s notion of the rank
grammar semantically is by no means a trivial task,
scale (Halliday, 1966): clauses are composed of
and has been an active area of linguistic research for
groups/phrases, which are composed of words,
the last forty years. This paper describes the con-
which are composed of morphemes.
The main
version of the Penn Treebank into a prominent out-
concerns of SFG syntactic analysis are the chunk-
put of such research, systemic functional grammar
ing of words into groups/phrases, and the chunk-
(SFG).
ing of groups/phrases into clauses. Levels of con-
Systemic functional grammar does not confine its
stituency between groups/phrases and their words
description to syntactic structure, but includes a rep-
are recognised in the literature (Matthiessen, 1995),
resentation of the choices grammatical configura-
but rarely brought into focus in research unless
tions represent — or ‘realise’, to use the term pre-
the group/phrase contains, or is, an embedded con-
ferred in the linguistics literature (Halliday, 1976).
stituent from another rank (e.g., a nominal group
There is growing evidence that systemic func-
like ‘the man’ with an embedded relative clause like
tional grammar can be usefully applied to natural
‘who knew too much’).

Function structures can refer to any rank of the
The distinction between systems which can be
constituency, but clause rank functional analysis
automatically annotated and systems which cannot
is generally regarded as the most important. The
lies in the way the systems are realised.
Mood
grammar defines a set of systems, which can be de-
and theme are realised primarily through the order
fined recursively using conjunction and disjunction.
of constituents (the order of Subject and Finite in
They are usually represented graphically in system
the case of mood, and the first Adjunct, Subject,
networks (Matthiessen, 1995), as in Figure 1.
Complement or Predicator in the case of theme).
In this figure, the nested disjunction ‘indicative
They are realised structurally, as opposed to lexi-
or interrogative’ represents a more delicate, or finer
cally. Other systems are realised through the se-
grained, distinction than that between indicative and
lection of grammatical items (also called ‘function
imperative. After selecting from the initial choice,
words’ — a term we prefer not to use because of the
one proceeds from left to right into increasingly del-
special sense of ‘function’ in the context of SFG).
icate distinctions. These systems are categorised
Systems that are realised with grammatical items,
into three metafunctions, which represent differ-
such as voice, polarity and tense, can also be au-
ent types of meaning language enacts simultane-
tomatically annotated. Lexically realised systems,
ously (ideational, interpersonal and textual) (Hall-
on the other hand, require a lexicon or equivalent
iday, 1969).
resource, since the choice of words within identi-
cal syntactic structures changes the selection from
declarative
the system. Trees which are identical at every level
indicative
except their leaves have different process type se-
interrogative
lections. The central system of transitivity, process
type, cannot be analysed for this reason.
imperative
The annotation of the corpus we present there-
fore attempts to include selections from the follow-
Figure 1: A simple mood system, ‘(indicative or in-
ing systems at clause rank:
terrogative) or imperative’
• interpersonal
2.2
Scope of target annotation
mood (i.e.
mood type and role tags
There is no clearly defined limit to systemic func-
for Subject, Finite, Predicator, Adjunct,
tional grammar, in the sense that one could say that a
Complement, Vocative)
text has been ‘fully’ analysed. The grammar is con-
clause class
stantly being extended, with new kinds of analysis
and levels of delicacy suggested. The ultimate aim
status
of the approach is to distinguish every semantically
tense
distinct different wording choice (Hasan, 1987).
polarity
When working with systemic functional gram-
mar, then, practitioners generally define the scope
• textual
of their analysis. We must do the same, although
theme (i.e. role tags for Textual Theme,
the reasons are different. Analysis, so far, has al-
Interpersonal Theme, Topical Theme,
ways been performed manually, with only finite
Rheme)
time available. Projects have therefore had to de-
cide between the size of a sample and the detail of
voice
its analysis. In our case, we are limited to the kinds
of analysis which can be directly inferred from
Ideational analysis is omitted entirely, because
the Penn Treebank. Future research will doubtless
transitivity analysis requires a more complicated
leverage other resources to extend the analysis of
approach, as discussed above. Although arguably
the corpus we present, but attempts to do so are be-
some aspects of taxis and expansion type could be
yond the scope of this paper.
annotated automatically, because the central infor-
The Penn Treebank presents accurate con-
mation cannot be annotated, we have left it out en-
stituency and part-of-speech information. This is
tirely.
enough information to annotate the corpus automat-
ically with roughly two thirds of the most important
3
Constituency Conversion
clause rank systems: mood and theme, but not tran-
We have not found it necessary to use a method of
sitivity.
automatic rule induction to generate a CFG. The

lack of a suitable training set made that approach
S
impractical for the time and resources we have had
available; and good results have been obtained by
NP
VP
simply using a set of hard-coded transformation
functions, implemented as a Python script. This
NP
NP
PP
approach does have a significant drawback, how-
ever: because the script does not output a con-
Figure 2: Raising of NP and PP nodes dominated
version grammar, correcting systematic errors and
by a VP
other maintenance or extension tasks are much more
difficult.
Sentence
The first process in the conversion of a sentence
is to parse the Lisp-style string representation into a
NP-SBJ
VP
tree of generic node objects. Each node contains
a function tag (which may be null), a node label
Sentence
and a set of children (which may be empty). The
root node is then used to initialise a sentence ob-
NP-SBJ
VP
ject, which sorts its immediate children into clause,
group and verbal group objects. As each class is
NP
initialised, it initialises a clause, verbal group, other
group or lexis object with each of its children. The
A Lorilard spokeswoman said
This
is
an old story
tree is thus recursively re-represented by more spe-
cific constituent objects, rather than generic node
Figure 3: A clause dominating another
objects. Subtyping the nodes facilitates the changes
to the structure that must be performed, since the
ture. All non-nominalised, non-embedded clauses
structural changes are mostly specific to either ver-
are therefore siblings dominated by the root clause
bal groups or clauses.
complex.
These changes are divided into a series of steps,
Figure 3 shows the Treebank representation, with
each coded as a function. Each function contains
a hypotactic clause as a child of a VP. Hypotactic
a series of conditionals which identify the struc-
clauses are raised to be siblings of the nearest clause
ture being targeted and how it should be altered.
node above them. Figure 4 shows the tree after this
The most significant functions are described in more
has been performed.
detail below. This is not an exhaustive list, how-
ever, as several trivial changes have been omitted.
ClauseComplex
These include things like node relabelling and the
addition of group nodes for conjunctions. There
Clause
Clause
are many changes of this sort, some introduced by
the specific mechanics of altering the tree. They
NP-SBJ
VP
NP-SBJ
VP
are not generally interesting differences between
the constituency representations of the Treebank’s
NP
phrase-structure representation and systemic func-
tional grammar.
A Lorilard spokeswoman said
This
is
an old story
3.1
Raising verb phrase predicates
Figure 4: Equally ranked clauses
The most obvious difference between SFG con-
stituency and the Treebank II annotation scheme is
3.3
Flattening auxiliaries
the flatter, ‘minimal bracketing’ style SFG uses. To
convert a tree to SFG clause constituency, all com-
In the Treebank II annotation scheme, each auxil-
plements and adjuncts must be raised by attaching
iary — and the main verb — is given its own node,
them to the clause node; in the Treebank annotation
dominated by the auxiliary before it. This structure
they attach to the verb. Figure 2 illustrates the rais-
needs to be flattened to match the SFG representa-
ing of clause constituents from the verb phrase.
tion. If all of a verb phrase’s lexical items have POS
tags in the following list: VB, VBD, VBG, VBN,
3.2
Raising hypotactic clauses
VBP, VBZ; and it only has one verb phrase child,
SFG represents the distinction between hypotaxis
then its lexis attaches to the verb phrase below it.
and parataxis with features, rather than tree struc-
The empty internal node will later be removed in

Sentence
siblings of the dominant verb phrase (such as the
subject), all lexis of the dominant verb phrase (such
NP-SBJ
VP
as the finite), and all children of the ellipsed verb
phrase (such as the complement) are copied to the
PP
Sentence
new clauses. In effect, the only items in the ‘orig-
inal’ clause that are not in the ‘ellipsis’ clauses are
NP
VP
Trace-NP
children of the first verb phrase (such as the adver-
bial phrase).
Yields
on
mutual funds
continued
to slide
It is not entirely clear that copying the words is
the best solution. A trace — an empty group that
Figure 5: Treebank representation of a sentence that
simply references the original version — is possi-
contains a verbal group complex
bly more convenient. The trace solution is more
convenient when using the corpus as training data
the generic ‘flattening’ stage.
for a computational linguistics task, while copying
the elements makes the corpus easier to use for lin-
3.4
Verbal group complexing
guistic research. The SFG literature is unhelpful for
SFG distinguishes between clause complexes and
these kinds of decisions: it is concerned with con-
verbal group complexes. The rules for parsing a tree
tent descriptions, not representation descriptions.
as one or the other type of construction are quite
simple.
3.6
Pruning and truncating
If a verb phrase has one verb phrase child, and
Lexical nodes that contain only punctuation or
dominates a lexis node that is not a finite, then it
traces are pruned from the tree. Group nodes that
is treated as a verbal group complex. Additionally,
contain no lexis are also pruned. This operation is
if a verb phrase has a sentence child that is not a
performed recursively, from the bottom up, clearing
direct quotation, does not have the function tag PRN
away any branches that have no lexical leaves. In-
(parenthetical), and is not labelled SBAR (used for
ternal nodes that contain only one child are replaced
relative and subordinate clauses), it is treated as a
by that child, truncating non-branching arcs of the
verbal group complex. For example, SFG renders
tree.
the tree in Figure 5 as a single clause, with the verbal
The clearance of punctuation is a problem with
group “(continued) (to slide)”.
the script as it currently stands, since clearly this
Group and phrase complexing is actually repre-
information should not be lost.
sented a little inaccurately in the script. Ideally,
4
Adding Metafunctional Analysis
a structural Complex node should be created, and
all groups attached to it. This representation would
Function structures must be added after the con-
mirror the way clause complexing is handled. In-
stituency conversion.
The structures attach to
stead, group or phrase complexing is treated like
clauses in the constituency tree, making separation
rank-shifting, with the first group dominating the
into clauses essential before systems can be anno-
others. This concern is not crucial, however, since it
tated.
does not affect the clause division or the annotation
Function structures fall into two categories:
of function structures.
metafunctional roles, and systems. Metafunctional
roles describe the interpersonal, textual or ideational
3.5
Ellipsis
function of a particular constituent, which is consid-
Ellipsis was the most difficult case to deal with,
ered the role’s realisation. Systems are instead dis-
since it involves more than just relocating nodes in
junctions from which a term is selected if the entry
the tree. A new clause is created when a verb phrase
condition is met. The names of metafunctional roles
is identified as part of a clause with an ellipsed sub-
are generally capitalised in the literature, while sys-
ject. The verb phrase is moved to the new clause,
tem names are given in italics. We follow this con-
along with all of its children, and any items identi-
vention to help make the distinction clearer.
fied as ellipsed are copied and attached. Lexis that
As with the constituency conversion, function
is copied in this way must be renumbered, so that
structures were added by hard-coded functions, im-
the clause sorts properly.
plemented as a Python script. Four kinds of infor-
When a verb phrase has two or more verb phrase
mation are used for metafunctional analysis:
children, each verb phrase child after the first is
moved to a new clause. Figure 6 shows the struc-
1. The Penn Treebank’s function tags
ture of a sentence containing an ellipsed clause. The
2. The Penn Treebank’s POS tags

Clause
NP-SBJ
Dominant VP
Original VP
Ellipsed VP
ADVP
ADVP
NP
Conditions were
worsening
daily
and quickly
becoming intolerable
Figure 6: Treebank representation of an ellipsed clause, with verb phrases named
3. The value of other systems
erwise the first word of the verbal group receives the
interpersonal role Finite.
4. The order of constituents in the SFG represen-
tation
4.3
Predicator
The use of values from other systems makes the an-
Predicator is an interpersonal role. The Predicator
notation procedure order dependent. They are usu-
is the lexical verb of a verbal group.
ally used to determine whether a system’s entry con-
If a clause is minor class, it does not contain a
dition has been met. For instance, tense is not se-
Predicator. Otherwise, the last word of the verbal
lected by non-finite clauses — so the function that
group receives the interpersonal role Predicator. If
discerns tense first checks the that requirement, and
a verbal group has only one word, that word will
assigns null tense if the clause has no Finite.
therefore receive two interpersonal roles (Finite and
The subsections below give a brief linguistic de-
Predicator). This is the analysis recommended in
scription of the system being annotated, and then
the literature (Halliday, 1994).
describe the way its selection is calculated. If the
4.4
Status
entry condition is not met, the selection is consid-
ered ‘none’.
Status is an interpersonal system with the possible
values ‘free’ and ‘bound’. Status refers to whether
4.1
Class
a clause is ‘independent’ or ‘dependant’, to use the
Class is an interpersonal system with the possible
terms from traditional grammar.
values ‘major’ and ‘minor’. Major clauses are those
Minor clauses do not select from the status sys-
with a verbal group. Minor clauses are equivalent
tem, so receive the value ‘none’. Major clauses that
to sentence fragments in other grammatical theories.
have no Finite, or were originally attached to an-
An example from the Penn Treebank is the fragment
other clause and were tagged SBAR, or are rank-
“Not this year.”
shifted, are considered bound. All other clauses are
If a clause contains a verbal group, it is marked
considered free.
‘major clause’. If it has no verbal group, it is marked
4.5
Subject
‘minor clause’.
Subject is an interpersonal role. The Subject of a
4.2
Finite
verbal group is the nominal group whose number
Finite is an interpersonal role. The Finite is the tense
the verbal group must agree with.
marker of a verbal group. It is either the first auxil-
Nominal groups realising Subject are generally
iary, or it is included with the lexical verb as a mor-
tagged explicitly in Treebank II annotation. The ex-
phological suffix. The Finite is a significant unit of
ception to this is wh- subjects like ‘who’, ‘what’ or
the grammar, because the placement of it in relation
‘which’. If no nominal group has the function tag
to the Subject realises mood type, and its morphol-
SBJ, and there is a wh- nominal group that was not
ogy realises tense selection and number agreement
attached to the verbal group, that nominal group is
with the Subject.
considered the Subject.
If a clause is minor class, or the first word of its
In clauses with an Initiator (‘I made him paint the
verbal group has one of the following POS tags: TO,
fence’), two nominal groups will usually have been
VBG, VBN; then it does not contain a Finite. Oth-
marked subject (‘I’, ‘him’). In these cases, the first

occurring nominal group is considered the subject
Polarity is the simplest system to determine, since
(‘I’).
it only involves checking the verbal group for the
word “not” (or “n’t”). Looking at negation more
4.6
Mood type
generally would be far more difficult, since it is
Mood type is an interpersonal system with the possi-
more of a semantic motif than specific grammatical
ble values ‘declarative’, ‘interrogative’ and ‘imper-
system.
ative’. Mood type refers to whether a clause is con-
gruently a question (interrogative), command (im-
4.9
Adjuncts, Complements, Vocatives
perative) or statement (declarative).
Adjunct, Complement and Vocative are interper-
Minor and bound clauses do not select from
sonal roles. Nominal groups can be either Voca-
this system, and therefore receive the value ‘none’.
tives, Adjuncts or Complements. Adjuncts repre-
Free clauses with no subject are marked ‘impera-
sent circumstances of a clause — the where, why
tive’. Clauses with the node labels SQ or SBARQ
and when of its happening. Complements represent
are marked ‘interrogative’. Other free clauses are
its non-Subject participants — the whom, to whom
marked ‘declarative’.
and for whom of its happening. Vocatives are nom-
inal groups that name the person the clause is ad-
4.7
Tense
dressed to.
Tense is an interpersonal system whose value
Adverbial groups, prepositional phrases and par-
is some sequence of ‘present’, ‘past’, ‘future’,
ticles are always given the interpersonal function
‘modal’. Tense refers to the temporal positioning of
‘Adjunct’. Vocatives are explicitly marked in the
the process of a clause, with respect to the time of
Treebank, with the VOC tag.
Nominal groups
speaking. In English, it is a serial value, because se-
that realise an adverbial function are also explicitly
quences of tenses can be built (‘have (present) been
tagged, with either TMP, DIR, LOC, MNR or PNR.
(past) going (present)’).
Nominal groups with one of these tags receive the
Finite declarative and interrogative clauses re-
interpersonal role ‘Adjunct’. All other non-Subject
ceive one or more tense values. The function iter-
nominal groups receive the interpersonal role ‘Com-
ates through the words of the verbal group (or the
plement’.
first verbal group in a verbal group complex), and
assigns these values based on the words’ POS tags,
4.10
Voice
and in special cases their text.
If a tag is either VBD or VBN, the value ‘past’
Voice is a textual system with the possible values
is appended to the tense list. If the tag is either
‘active’, ‘passive’ and ‘middle’.
Voice refers to
VB, VBG, VBZ or VBP, the value ‘present’ is ap-
whether the Subject is also the ‘doer’ of the clause,
pended to the tense list. If the tag is MD, then the
or whether the participants have been switched so
text is checked. If the word is “’ll”, ‘will’ or ‘shall’,
that the Subject is the ‘done to’. Compare the active
the value ‘future’ is appended to the tense list. The
clause “the dog bit the boy” with the passive version
value ‘modal’ is appended to the tense list for lexi-
“the boy was bitten by the dog”. If clauses do not
cal items tagged MD. When an MD tag is seen, the
have a ‘done to’ constituent which might have been
next word in the list is skipped, since it will be a
made Subject (i.e. a Complement), they are consid-
bare infinitive that does not represent a tense selec-
ered ‘middle’ (‘the boy slept’).
tion. If the lexical items ‘going’ or ‘about’ are seen,
Minor clauses do not select for voice, and there-
the value ‘future’ is appended to the tense list, and
fore receive the value ‘none’. Non-finite clauses are
the next two words are skipped, as they will be ‘to’
typed according to the POS tag of their Predicator.
and an infinitive verb. This does not occur if ‘go-
If the tag is VBG, voice is determined to be active;
ing’ is the last word of the verbal group, since in
if the tag is VBN, voice is determined to be passive.
that case it is the process, not a tense marker.
Infinitive non-finite clauses receive the value ‘none’.
Passive clauses will have received an extra ‘past’
Finite clauses with a final tense other than ‘past’
tense value, so when a clause is labelled passive, its
are labelled active. If the final tense is ‘past’, and
last tense selection is removed.
the penultimate word of the verbal group is a form
of the verb ‘be’, the clause is labelled passive, and
4.8
Polarity
the tense sequence is corrected accordingly.
Polarity is an interpersonal system with the possible
Active clauses are then subtyped into true ac-
values ‘positive’ and ‘negative’. Polarity refers to
tive and middle voices. Middle clauses are active
whether the verbal group is directly negated.
clauses which have at least one complement.

4.11
Theme/Rheme
the case of Modal and Comment Adjuncts), or Tex-
Theme and Rheme are textual roles. Theme refers
tual Theme (in the case of Conjunctive Adjuncts).
to the order of information in a clause.
The
The Wall Street Journal corpus, which was the
Theme/Rheme structure of a clause is often called
only section of the Penn Treebank available for this
Topic/Comment in other theories of grammar. The
research, contains very few Mood, Comment or
Theme is the departure point of information in a
Conjunctive Adjuncts, so the extent of this problem
clause. The Rheme is the information not encom-
could not be properly measured.
passed by the Theme.
6
Conclusion
The first Adjunct, Complement, Subject or Pred-
icator that occurs is marked ‘Topical Theme’. Any
This work is approximately ten years overdue, in the
conjunctions that occur before it are marked ‘Tex-
sense that that is how long the resources required
tual Theme’, while any vocatives or finites that oc-
to perform it have existed. The motivations for it
cur before it are marked ‘Interpersonal Theme’. All
are even older: corpus linguistics has been a pil-
other clause constituents are marked ‘Rheme’.
lar of systemic functional linguistic research since
it began, and raw text corpora are inadequate for
5
Accuracy
many of the questions systemic functional linguis-
Accuracy was checked using 100 clauses that had
tics asks (Honnibal, 2004). The first effort to con-
not been sampled while the script was being de-
vert the Penn Treebank to another representation
veloped or debugged.
Each clause was checked
was presented within months of the corpus’s com-
for constituency accuracy to the group and phrase
pletion (Wang et al., 1994). Since then, treebanks
rank — i.e., clause division and clause constituency
have been converted to several grammatical theories
were checked. Each of the eleven function struc-
(cf. (Lin, 1998; Frank et al., 2003; Watkinson and
tures were also checked:
clause class, status,
Manandhar, 2001)). It is unclear why SFG has been
mood, tense, polarity, Subject, Finite, voice, Topi-
left behind for so long.
cal Theme, Textual Themes, Interpersonal Themes.
A corpus of over two million words of SFG con-
Two errors were found, both on the same clause.
stituency analysed text, annotated with the most im-
The status selection of an indirect projected speech
portant clause rank interpersonal and textual sys-
clause was marked ‘free’ instead of ‘bound’. This
tems and functions, is now available. This is an
occurred because the projected clause was top-
important resource for linguistic research, the devel-
icalised (i.e., it occurred before the projecting
opment of SFG parsers, and research into applying
clause), which is rare for indirect speech. To cor-
systemic linguistics to language technology prob-
rect this, the script must consider the presence or
lems.
absence of quotation marks, which may be com-
Acknowledgements
plicated by the slightly inconsistent attachment of
punctuation in the Penn Treebank (Bies, 1995). Be-
I would like to thank Jon Patrick for his useful feed-
cause the status of this clause was given as free,
back on this paper. I also owe thanks to the many
the clause incorrectly met the entry condition for
people who have helped me on my honours the-
the mood type system, causing the second error —
sis, from which this paper is mostly drawn. Chris-
a mood type selection of ‘declarative’ instead of
tian Matthiessen and Canzhong Wu have both been
‘none’.
wonderful supervisors. Paul Nugent helped with
In this somewhat small sample, 1198/1200
the graphics used in this paper and my thesis, and
(99.83%) properties were correct, and 99% of
proof-read with invaluable diligence. James Salter
clauses were annotated without any errors.
The
has shown remarkable patience over the last year
lack of plausible Adjunct subtyping may present
and half while teaching me to program. Finally,
problems for the accurate determination of Topical
the language technology research group at Sydney
Theme in a more register varied sample, such as the
Uni have all contributed sound advice and interest-
Brown corpus.
ing discussions on my work.
Adjuncts should be subtyped into Modal Ad-
References
juncts (such as ‘possibly’), Comment Adjuncts
(such as ‘unfortunately’), Conjunctive Adjuncts
A. Bies. 1995. Bracketing guidelines for Treebank
(such as ‘however’) and Experiential Adjuncts (such
II style. Penn Treebank Project.
as ’quickly’). Only Experiential Adjuncts can be
Maria Herke Couchman and Casey Whitelaw.
Topical Theme; if another kind of Adjunct occurs
2003. Identifying interpersonal distance using
first it should be marked Interpersonal Theme (in
systemic features. In Proceedings of the first

Australasian Language Technology Workshop
(ALTW2003)
.
Anette Frank, Louisa Sadler, Josef van Genabith,
and Andy Way, 2003. From Treebank Resources
To LFG F-Structures - Automatic F-Structure An-
notation of Treebank Trees and CFGs extracted
from Treebanks
. Kluwer, Dordrecht.
Michael A. K. Halliday. 1966. The concept of rank:
a reply. Journal of Linguistics, 2(1):110–118.
Michael Halliday. 1969. Options and functions in
the english clause. Brno Studies in English.
Michael Halliday. 1976. System and Function in
Language. Oxford University Press, Oxford.
Michael Halliday. 1994. Introduction to Functional
Grammar 2nd ed. Arnold, London.
Ruqaiya Hasan. 1987. The grammarian’s dream:
lexis as most delicate grammar. In Halliday and
Fawcett, editors, New developments in systemic
linguistics: theory and description
. Pinter, Lon-
don.
Matthew Honnibal. 2004. Design, creation and use
of a systemic functional grammar annotated cor-
pus. Macquarie University.
Dekang Lin. 1998. A dependency-based method
for evaluating broad-coverage parsers. Natural
Language Engineering
, 4(2):97–114.
M. Marcus, G. Kim, M. Marcinkiewicz, R. Mac-
Intyre, A. Bies, M. Ferguson, K. Katz, and
B. Schasberger. 1994. The Penn Treebank: An-
notating predicate argument structure. In Pro-
ceedings of the 1994 Human Language Technol-
ogy Workshop
.
Christian M. I. M. Matthiessen and John A. Bate-
man.
1991.
”Text generation and systemic-
functional linguistics: experiences from English
and Japanese”
. ”Frances Pinter Publishers and
St. Martin’s Press”, ”London and New York”.
Christian Matthiessen. 1995. Lexicogrammatical
Cartography. International Language Sciences
Publishers, Tokyo, Taipei and Dallas.
Robert Munro. 2003. Towards the computational
inference and application of a functional gram-
mar. Sydney University.
Jong-Nae Wang, Jing-Shin Chang, and Keh-Yih Su.
1994. An automatic treebank conversion algo-
rithm for corpus sharing. In Meeting of the As-
sociation for Computational Linguistics
, pages
248–254.
S. Watkinson and S. Manandhar. 2001. In Proceed-
ings of the workshop on evaluation methodolo-
gies for language and dialogue systems
.

Download
Converting the Penn Treebankto Systemic Functional Grammar

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Converting the Penn Treebankto Systemic Functional Grammar to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Converting the Penn Treebankto Systemic Functional Grammar as:

From:

To:

Share Converting the Penn Treebankto Systemic Functional Grammar.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Converting the Penn Treebankto Systemic Functional Grammar as:

Copy html code above and paste to your web page.

loading