Language Variation and Change, 14 (2002), 79–118. Printed in the U.S.A.
© 2002 Cambridge University Press 0954-3945002 $9.50
DOI; 10.1017.S0954394502141044
It’s not or isn’t it? Using large corpora to determine
the influences on contraction strategies
M a l c a h Y a e g e r - D r o r , L a u r e n H a l l - L e w ,
a n d S h a r o n D e c k e r t
University of Arizona
A B S T R A C T
In analyzing not-negation variation in English it becomes clear that specific strat-
egies are used for prosodic emphasis and reduction of not in different social situa-
tions, and that contraction strategies vary independently of prosodic reduction. This
article focuses on the factors influencing contraction strategies that are clearly di-
alect related and attempts to tease out those factors that are related to register and
speaker stance. First, we review background information critical to an adequate
analysis of not-negation and not-contraction. We then describe the corpora chosen
for the present study, the research methods employed in the analysis, and the results
of the analysis. The variable under analysis is the choice between uncontracted and
not-contracted forms and between not-contracted and Aux-contracted forms in well-
formed declarative sentences, for verbs which permit both. We end with some sug-
gestions for corpus composition that will enable meaningful comparisons between
social situations and between speakers, or characters, within one corpus. As re-
searchers we can assure that future corpora will permit increasingly inclusive and
interesting comparative studies; we close with some suggestions for those who wish
to carry out studies.
Tottie (1991) showed that there are three direct ways to express negation in En-
glish. These are shown in Table 1. She found that the vast majority of English
negatives used are not negatives. For that reason, the present study narrows its
focus to the analysis of not-negation. Not-contraction, which entered the English
language around 1600 (Jespersen, 1917; Warner, 1993) or even earlier (Rissanen,
1999), has become the norm in most varieties of spoken British and American
English. “British” is used as a cover term for the English spoken in England,
Scotland, and Ireland and “American” as a cover term for United States and An-
glophone Canadian speech. British nonfiction uses contraction less consistently
This article would not have been written without the funding of NSF#9808994 or without the LDC
corpus, accessed—with much technical support and advice from Dave Graff and the moral support of
Merrill Garrett—through the Cognitive Science Program of the University of Arizona. We are also
grateful to Mary Finch of the Bush Presidential Library, Ron Whealan of the Kennedy Presidential
Library, and the linguistics librarian at the University of Arizona, Sara Heitshu, all of whom facili-
tated access to the text and sound files of political corpora. Doug Biber, Crawford Feagin, Candy and
Chuck Goodwin, Greg Guy, John Heritage, Chuck Meyer, Michael Montgomery, and Sali Taglia-
monte provided interesting insights, as did the journal’s anonymous reviewers. Any remaining short-
comings are our own.
79
80
M A L C A H YA E G E R - D R O R E T A L .
TABLE 1. Types of negation
Tottie’s Terminology
Examples
Sample Sentences
not-negation
is not, isn’t, ’s not
It isn’t really possible.
no-negation
nowhere, never, nothing, nobody . . .
I never did that.
affixal negation
imperfect, irrespective, independent,
I am incapable of doing it!
nonfunctional, disingenuous, unable . . .
Source: Tottie (1991).
than fiction (Kjellmer, 1998). Older texts use full forms more than recent texts in
the same genre or register (Biber, 1988). Written registers use full forms more
consistently than spoken registers (Kjellmer, 1998, and Tottie, 1991, for British
English; Yaeger-Dror, 1997, for American English). Bell (1984) showed that, in
declarative sentences in news reporting, contracted forms are more common in
the United States than in the British Commonwealth, and it is commonly believed
that the full form is more common in British than in American conversational
declaratives as well.
Biber (1988), who has done the most work to compare large linguistic corpora
from different social situations (or speech “registers”), showed that, if a multi-
variate analysis is carried out on information concerning variation in many lin-
guistic factors, five register continua (or dimensions) can be isolated for English.
Dimension 1, the statistically most significant of these, is a continuous parameter
fluctuating from a more informative pole, which he referred to as “Information-
al,” to a socially more interactive pole, which he referred to as “Involved.” There
is now a fair amount of evidence to support the claim that register influences
contraction strategies.
Cognitive theories would project that, when given the choice, forms with full
not retained would be favored in informative settings, since not carries important
semantic information; we have referred to this as the Cognitive Prominence Prin-
ciple (Yaeger-Dror, 1996, 1997, 2002b). It is not coincidental that contraction is
the most extreme form of lexical reduction available to the English speaker, and
that speakers avoid contraction in informative situations where the significance
of not is most important.
The pattern for interactive data is also influenced by a conflicting Social Agree-
ment Principle, which has been identified in the work of Goffman (1971), Sacks
(1992), and their students; these researchers showed that repair (or remedial turns)
is dispreferred in conversation, whereas supportive turns are preferred. Yaeger-
Dror (1985, 1997, 2002a, 2002b) found considerable evidence that, in both spo-
ken and written use of American English, where the speaker’s purpose is
informative and socially neutral, full forms of not are favored; not-contracted
forms are favored when the speaker’s purpose is supportively interactive and the
negation is used to express a repair. Consequently, Yaeger-Dror (1996, 1997,
2001, 2002b, 2002c) found that negation has characteristics that particularly mil-
itate against a simple analysis based on one corpus with a limited register, and
I T ’ S N O T O R I S N ’ T I T ?
81
that the ideal would be to access data that would enable researchers to compare
the negatives used by speakers in several social situations.
Holding dialect and chronological era steady, the informative registers of Brit-
ish and American English (e.g., news, tutorials, or written descriptive texts) use
more full forms than the interactive registers (e.g., conversations or written dia-
logue) (Biber, 1988; Yaeger-Dror, 1997).1 In her analysis of not-negation in two
corpora of newspaper prose, Westergren-Axelsson (1998) provided confirmatory
evidence for variation in negative syntactic strategies, which coincides with the
informative–interactive continuum. She isolated three subgenres: reporting, ed-
itorials, and reviews.2 She also made a distinction (already isolated in Yaeger-
Dror, 1996, 1997) between material inside and outside quotation marks—with
more not-contraction in dialogue and actual conversations than in informative0
narrative segments of text. Both Yaeger-Dror (1997; Yaeger-Dror, Hall-Lew, &
Deckert, in press) and Westergren-Axelsson found that there is a large gap be-
tween narrative prose and written dialogue, which are presumably more infor-
mative and more (pseudo)interactive, respectively.
Both contraction and prosodic strategies in dialogue differed significantly from
those in read descriptive prose and were more similar to actual interaction. Thus,
when interactive rules are more relevant—whether signaled by quotation marks
in print or triggered by the interactiveness of a social situation—the likelihood of
not-contraction increases; conversely, when conveying information is primary,
not-contraction is curtailed.
Biber and Finegan developed the ARCHER corpus specifically to permit analy-
sis of change in time (Biber, Conrad, & Reppen, 1998). When ARCHER written
corpora from different eras were compared, Dimension 1 was shown to have
varied over the last few centuries; some social situations (or registers) have be-
come more interactive, while others have become more informative. Thus, for
example, while most registers have become more interactive, medical writings
have become more informative (Biber, Finegan, & Atkinson, 1993:9). Both Amer-
ican and British journals (diaries) have become more interactive over the last 240
years; moreover, the American diaries that were initially more informative have
become more interactive (1993:10).
Not only is intention of the speaker relevant (to convey information? to repair
another’s turn? etc.), but speaker stance (Goffman, 1981) is critical as well. In cer-
tain registers disagreement is preferred. For example, considerable evidence has
now been presented to show that children express disagreement prominently in cer-
tain registers (Corsaro & Rizzo, 1990; Goodwin, 1983; Goodwin, Goodwin, &
Yaeger-Dror, 2002; Hoyle & Adger, 1998; Kyratzis & Guo, 2001; Sheldon, 1996,
1998). Even among adults, an adversarial stance, which requires negation to be em-
phasized, is not as uncommon as early conversation literature would have us con-
clude (Clayman, 2002, in press; Heritage, 2002; Hutchby, 1996, 1999).
For example, comparing evidence from political debates with data from other
registers, Yaeger-Dror found that interactive registers vary along this second con-
tinuum, from more supportive turns, as in conversational interactions analyzed
by Sacks (1992) and Schegloff, Jefferson, and Sacks (1977), to more adversarial
turns, as in political interviews, legal interactions, and debates (Yaeger-Dror,
82
M A L C A H YA E G E R - D R O R E T A L .
1996, 1997, 2002a). She also found that full not would be retained in an adver-
sarial stance, as in debates, but not when used by the program moderator (Yaeger-
Dror & Hall-Lew, 2000). Other situations in which adults were expected to express
disagreement quite emphatically (but the moderators were not) included political
TV programs (Blum-Kulka, Blondheim, & Hacohen, 2002; Scott, 1998), politi-
cians’ news conferences (Clayman, in press; Heritage, 2002; Perez de Ayala,
2001), call-in programs (Hutchby, 1996, 1999, 2001), candid camera programs
(Al-Khatib, 1997), and talk shows (Ilie, 1999). In such situations, unreduced not
tokens were preferred for the participant stance.
In contrast, overt disagreement is even more tabooed for moderators or nego-
tiators than it is for polite conversationalists (Clayman, 2002; Jacobs, 2002). It is
also true that there is a growing body of evidence to demonstrate that a specific
register may require an adversarial stance in one culture but not in another (Yaeger-
Dror, 2002a, 2002b).3
Contraction is also correlated with sentence type. Not tokens in imperatives, as
in (1), and interrogatives, as in (2), are almost categorically contracted in Amer-
ican English (Yaeger-Dror, 1996, 1997), even in informative written contexts. Con-
traction is not inevitable in these sentence types in British data, as shown in
examples (2c) through (2g). In his analysis of written British corpora from the
1960s, Kjellmer (1998) found that questions were only 90% not-contracted. In their
study of interviews with older speakers of northern and rural British dialects,
Tagliamonte and Smith (in press) 4 found that only 65% of questions were not-
contracted, although tags were categorically contracted as they were in our Amer-
ican sample. Moreover, Tagliamonte and Smith found that for Scots speakers the
bulk of the contracted forms were used in rhetorical questions, which are more like
tags, and so the percentages for full forms in questions requiring an answer were
even higher and less like the American interrogatives. The interaction of dialect
with sentence type must be considered as a separate issue.
(1) a. Please don’t eat the daisies! (American)
b. Don’t mess with social security! (PD, St. Louis Debate, Bush 978)5
c. . . . But don’t just sit here slow dancing for 4 years! (PD, Richmond Debate,
Perot 310)
d. Don’t go away yet! (PD, St. Louis Debate, Jim Lehrer 1130)
(2) a. Isn’t she sweet?0Ain’t she sweet? (American)
b. “Oh, aren’t you well? Sha’n’t I bring your dinner?” (Wharton, 191101969, 253)
c. Well, why should they not use the words of the original? (BNC, S1.1, 964)
d. but . is– is that not a private library? (BNC, S3.3, 194)
e. Is there not somewhere you can copy it up? (COLT, F, 47yrs.)
f. Is that not scary as crap? (COLT, F, 17yrs.)
g. Is she not wearing tights today? It doesn’t look very nice. (COLT, F, 14yrs.);
For the present study the analysis focuses only on not-negatives in complete
declarative sentences. Interrogatives and imperatives are not included in the analy-
sis, nor are sentences that are radically elliptical.
In English language studies (e.g., Biber, 1988), as in the preceding overview,
the full form is generally contrasted with the contracted form. The majority of
I T ’ S N O T O R I S N ’ T I T ?
83
TABLE 2. Full and contracted forms
Full Form
not-Contracted
Aux-Contracted
He cannot
He can’t
—
He has not done that
He hasn’t done it
He’s not done that
We will not do it
We won’t do it
We’ll not do it
We have not done it
We haven’t done it
We’ve not done it
He is not here
He isn’t here
He’s not here
We are not here
We aren’t here
We’re not here
verbs fall into a class which only permits one form of contraction: not-contraction.
Consonant-initial verbs which only permit not-contraction are referred to here as
“other” verbs.
However, Table 2 shows that for some verbs there are actually two possible
contracted forms. These two contraction strategies are referred to as not-contraction
(isn’t) and Aux-contraction (’s not). Note that Aux-contraction permits not to be
unreduced; this has important implications for our hypothesis, which posits that
uncontracted negatives would be more likely to occur in informative registers
(Yaeger-Dror, 1997, 2002a).
The extent of variability in contraction strategies and the fact that they are
subject to internal-linguistic constraints (such as preceding phonological unit or
whether the sentence is declarative or interrogative) as well as dialect and register
constraints provide a very interesting set of problems for analysis.
Initially it was assumed that Aux-contractable verbs were classed together
because they were vowel-initial (that is, for a fairly surface structure reason).
However, Lightfoot (1999:186–195) presented historical evidence that {be, will,
have} have never functioned like other English verbs. While surface factors (e.g.,
whether the preceding word ends in a vowel or a consonant) are certainly relevant
to choice of Aux-contracted or not-contracted form (Hiller, 1987; Kjellmer, 1998;
McElhinny, 1993), they are not discussed in detail here.
In British dialects the relative frequency of Aux-contraction is said to increase
the further north one goes (Trudgill, 1978). However, Tagliamonte and Smith (in
press) found that, while the range of Aux-contraction varies more widely than it
does for our American speakers, the geographical picture is much more complex
than Trudgill’s comments would suggest.
A number of linguistic constraints restrict variation. In Britain {will, is, are}
are said to be contracted more often than other auxiliaries in declarative sentences
(Tagliamonte & Smith, in press).6 For American English we try to show that
{have, is, are} are the auxiliaries most often contracted, while will is very rarely
contracted in our corpus.
Jespersen (1917) and Denison (1999) discussed the fact that amn’t* r an’t r
ain’t. By the early 20th century, in England ain’t was still preferred to aren’t as a
contraction for am not, but was condemned as a contraction for other subjects. That
is, I ain’t was acceptable in British English, while he ain’t was a stigmatized ver-
nacular form (Trudgill, 1990:94). In the Irish and British vernaculars today, an’t
84
M A L C A H YA E G E R - D R O R E T A L .
or amn’t is used in r-ful dialect areas, and aren’t is used in r-less dialects (Bresnan,
2000; Hudson, 2000; Tagliamonte & Smith, in press; Trudgill, 1990).
Two scholars have analyzed variation in not-contraction and Aux-contraction
among southern vernacular speakers in the United States, based on analysis of
their own Labov-style interviews. Both Feagin (1979) and Hazen (1996) ana-
lyzed the ratio of different be-contraction forms, {isn’t, ’s not, ain’t} and {aren’t,
’re not, ain’t}, among southern vernacular speakers and found that there is a high
percentage of {is not, are not} realized as ain’t tokens for both rural and urban
working-class speakers, while there is a much lower percentage of ain’t among
urban middle-class speakers. They both found that the use of is and are as verb or
copula did not appear to influence contraction preferences. Thus, in southern
non-middle-class vernacular, as in British vernaculars, ain’t is used not just for
am not, but also for isn’t, aren’t, and even haven’t and hasn’t (Feagin, 1979;
Hazen, 1996).
Given that ain’t occurs only rarely in our sample, except in some of the 19th-
century literature, these findings are relevant primarily because of a conjecture
made by Feagin. Looking at evidence from change in apparent time, Feagin sug-
gested that the fact that Southern Standard speakers appear to favorAux-contraction
over not-contraction may stem from their wish to avoid ain’t. She thus assumed
that southern middle-class speakers use Aux-contraction for {is not0are not} more
consistently than middle-class speakers from other regions. In fact, one of our goals
in this study is to determine if speakers from different regions in the United States
vary in the extent to which they favor (or disfavor) Aux-contraction.
Both is not (3) and are not (4) tokens can be full, Aux-contracted, or
not-contracted.
(3) a. But that is not the only way . . . (Wharton, 191701998, 639; descriptive passage)7
b. “ That’s all, is it? It’s not much!” (Wharton, 191701998, 402)
c. This is not mud slinging. This is fact! (PD, Richmond Debate, Perot 224)
d. Now, it’s not the Republicans’ fault, . . . (PD, Richmond Debate, Perot 82)
e. But it isn’t going to get the job done, . . . (PD, Richmond Debate, Perot 594)
f. Runway 3-3 is not available this morning. (ATC, Boston, 20295)
g. If it’s not, I’ll uh– I’ll have to check. (ATC, Dallas, 31457)
h. . . . but the frequency isn’t -uh– that good. (ATC, Boston, 42477)
(4) a. All the nuclear weapons are not dismantled! (PD, Richmond Debate, Perot 224)
b. You know, we’re not under oath at this point! (PD, Richmond Debate, Perot
164)
c. Everybody cares if people aren’t doing well! (PD, Richmond Debate, Clinton
448)
d. Replies are not received from several flights. (ATC, Boston, 38524)
e. Alright, they’re not working then. (ATC, Dallas, 14890)
f. . . . We don’t know because inspectors aren’t in. (PD, Bush0Gore, Debate 3)
As already stated, the present study focuses on the choice between Aux-contracted
and not-contracted declarative tokens of (is not, are not} which do permit vari-
ation in American English. Ain’t occurs too rarely in the corpus analyzed here to
be discussed further.
I T ’ S N O T O R I S N ’ T I T ?
85
We start with the hypothesis that prominent not is preferred in informative
situations and dispreferred in interaction (Yaeger-Dror, 1996). Meaningful con-
clusions about dialect influence on contraction strategies can only be drawn when
there is ready access to a large corpora of transcribed speech coded for a range of
sociolinguistic variables (age, sex, region, social class) as well as register. The
present study makes use of a corpus which permits a pilot study of such dialect
variation and compares the results with those from data in other registers.
Today, there are several large transcribed corpora which permit analysis of
lexical and morphological variation, and studies based on those corpora have
been published (Biber, Conrad, & Reppen, 1998; Biber, Johansson, Leech, Con-
rad, & Finegan, 2000; Johansson & Oksefjell, 1998; Kennedy, 1998). The present
study analyzes data from several large corpora which are now available. The
primary goal is to analyze variation in contraction strategies, and a secondary
goal is to determine how feasible it is to compare data from corpora which differ
in multiple ways simultaneously.
Earlier studies of the use of negatives have shown that sentence type, dialect,
time, social situation, and speaker stance all influence contraction strategies. Cer-
tainly the full form is more common in interrogatives and imperatives in British
English than in American English, where contraction has been found to be almost
categorical even in writing (Yaeger-Dror, Hall-Lew, & Deckert, ms.). The con-
sensus is that in declarative sentences contraction has become more acceptable in
20th-century written texts. However, we show that even today scripted texts (like
read news) use full form more consistently than unscripted informative texts (like
Air Traffic Control or academic data). We project that Aux-contraction for {is
not0are not} is higher in those registers which permit more full form (i.e., infor-
mative and adversarial stance registers), and that dialect is a relevant parameter in
the United States as well as in the United Kingdom.
One purpose of the present study is to determine whether, holding register and
sentence type steady, regional dialect is a factor for not-contraction. Another
purpose is to discover whether it is possible to analyze variation in register, stance,
and dialect simultaneously to determine which has the strongest influence and to
see whether the corpora presently available permit any viable multivariate analysis.
T H E P R E S E N T C O R P U S
Various corpora are analyzed to attempt a meaningful comparison of {is not, are
not} contraction strategies in declaratives for different registers and among speak-
ers from different regions. Our goal here is to demonstrate the influence of dialect
and register on the choice between Aux-contracted and not-contracted forms. In
the course of the preliminary discussion, we characterize the different corpora
used; the contraction percentages for the “other” verbs in these corpora are noted.
The problems facing researchers who wish to do a comparative survey using
ready-made corpora should become obvious.
Table 3 provides a list of the corpora consulted for this study, when and where
they were collected, and the number of words in the corpus. The Appendix at the
86
M A L C A H YA E G E R - D R O R E T A L .
TABLE 3. Corpora included in this analysis
Text Type
Source
Date
Region
Number of Words
Informative!
ATC
1980s
ne0DC0w
—
Radio: Boston News
NPR
1980
ne:MA
54,739
Marketplace
USC
1996
Los Angeles
—
ftf: Lectures
MICASE
1995f
nc:MI;
222,000
Student Presentations
0
0
0
83,027
Seminars
0
0
0
34,982
Defenses
0
0
0
48,596
ftf: Q0A
Kennedy
1961– 62
ne:MA
93,545
Nixon
1969–74
w:CA
50,166
Ford
1974–76
nc:MI
37,710
Carter
1977–80
s:GA
68,664
Reagan
1981–86
nc:IL
52,272
Bush
1989–91
ne0CT
30,727
Clinton
1993–99
s0w:AR
35,961
Informative News
G0M
1999
CDN
49,446
Informative News
NYT
1999
US
??
Inf. §s
NYT
1997–99
US
33,689
Inf. §s [1st p]
NYT
1997–99
CDNa
23,426
Book Reviews
NYT, NYRB
1999
US
29,688
Book Reviews
G0M
1999
CDN
38,877
Literary §s[1st p]
NYT
1997–
CDNa
18,475
Literary §s
NYT
1997–
CDNa
10,784
Literary §s
NYT
2000
US
28,000
Literary Texts
Jane Austen
1814
UK:S
159,911
C. Bronte: J.Eyre
1846
UK:N
186,000
E. Bronte: Wuther
1847
UK:N
116,700
A. Bronte: AG
1848
UK:N
169,000
Hawthorne
1850
US:MA
83,688
Stowe
1852
US:ne
182,450
Gaskell: N&S
1853–54
UK:N
284,000;
Eliot: Mill
1860
UK:MID
45,000
Collins: Woman
1860
UK:London
252,160
Dickens
1861
UK:London
187,400
Trollope: PhF
1869
UK:S
126,750
Alcott: LittleMen
1871
US:MA
185,800
Trollope: EuD
1873
UK:S
272,500
Twain: Huck 0 Tom
187601884
US:MO0LA
240,000;
Twain: Abroad
—
US:MO0LA
—
Henry James
1880
US:NY0London
64,000
Twain: Yankee
1889
US:MO0CT
120,700
Hardy: Tess
1891
UK:S
150,000
Chopin
1899
US:St. Louis
90,700
Glasgow
1904
US:VA
135,300
Cabell: Hour
1909
US:VA
56,000
Wharton: EFrome
1911
US:NY0MA
31,274
Wharton: Summer
1914
US:NY0IL
57,437
Maugham
1915
UK:Kent
76,000
Cather: Lark
1915
US:NE
152,700
Woolf: N&D
1919
UK:London
167,300
continued
I T ’ S N O T O R I S N ’ T I T ?
87
TABLE 3. (continued)
Text Type
Source
Date
Region
Number of Words
Cather: Prof
1925
US:NE0NY
62,000
Cleary
1950
US:OR
22,597
Tyler
1988
US:NC0MD
65,932
Beattie
1990s
US:DC0ME
5,100
Keillor
1985
US:MN
13,000
Interactive, phone
SWB by region
1980s
US
339,000
ne0n
56,0000
n mid
56,000
nyc
56,000
s
56,000
s mid
56,000
w
56,000
Interactive, phone
Call Home
1980s
US
144,000
Interactive, ftf
Upholstery Shop
1971
NYC
32,500
Interactive, ftf
Segrin, Supportive
1996–97
midwest
74,400
Interactive, ftf
Segrin, Remedial
1996–97
midwest
140,400
Interactive, ftf
COLT: adult
1993
UK[teen]
500,000?
teenager
0
0
0
Adversarial, phone
T0L
1997–
n0w
16,962
Adversarial!, ftf
PD: Kennedy x4
1960
ne:MA
18,032
PD: Nixon x4
1960
n 0w:CA
18,648
PD: Ford x3
1976
nc:MI
12,854
PD: Carter x4
1976
s:GA
15,242
PD: Reagan: cmb.
1980
nc:IL
25,548
M0L: Mecham
1988
w:UT
1,260
M0L: Babbitt
1988
w:AZ
475
PD: Bush x3
1992
ne:CT
15,170
PD: Perot x3
1992
s0w:TX
13,296
PD: Clinton x3
1992
s0w:AR
14,450
PD: GWBush x2
2000
w:TX
13,015
PD: Gore x2
2000
s:TN
13,143
MD: Tucson
1999
w:AZ
7,670
MD: McCasson
1999
w
1,782
Ont. Primary
1995
CDN
18,000
Note: See the Appendix for more elaborate discussion of the corpora.
aIn the NYT corpus, Canadian authors were isolated from United States authors.
end of this article clarifies the abbreviations used to refer to the different corpora,
describes the corpora a bit more fully, and lists a URL where the corpus, or at least
a description of it, can be found online.
Written registers
Informative journalistic prose from the United States and Canada was collected di-
rectly from the web. The advantage of downloading one’s own text corpus is that
one can choose a very narrow, clearly defined register and verify the native dialect
area of specific journalists or authors included in the sample. The Oxford Text Ar-
88
M A L C A H YA E G E R - D R O R E T A L .
chives, the British National Corpus (Aston & Burnard, 1997), and the Linguistic
Data Consortium (LDC) all have journalistic text files, but they do not permit the
analyst to code for dialect or register. The massive news files include wire data (e.g.,
from AP or Reuters) which preclude the tracking of dialect information. They also
merge files from various sections of newspapers which are dissimilar in register
and stance. Downloading one article at a time is much more time-consuming, but
part of that time is spent verifying where the journalist is from8 and determining
what register is being used.
Two sources for American journalistic prose were analyzed: scientific articles
from the New York Times (NYT) or the New York Review of Books (NYRB) for
northeastern United States speech and from clearly Toronto born and raised jour-
nalists at the Toronto Star, Macleans, or Globe and Mail (henceforth collectively
designated as G0M) for Canadian speech.9 These were supplemented with first
chapters (or §1) of nonfiction books from the NYT for northeastern United States
authors and from the NYT and G0M for Ontario authors.
Book reviews were also collected from the same sources, with the same at-
tention paid to where the reviewer was said to have been raised. For the most part,
informative-style reviews of informative texts (biographies, science books) were
chosen from the NYT and NYRB for northeastern United States authors and from
the NYT and G0M for Ontario authors. Downloading one review at a time, it was
also possible to verify the journalist’s region of origin and his0her stance vis-à-
vis the topic.10 The comparison of NYT and G0M corpora revealed that both
region and register had an impact on contraction.
Prose was also collected from the web. Older classical literary prose could be
contrasted with more recent literary prose, which was only available by scanning
the data11 or by taking the first chapter samples available from the NYT and
G0M. As with the journalists, authors’ biographies were available on the site
itself or were found with google.com. Narrative was isolated from dialogue.
British authors’ use of contraction was contrasted with American authors’
usage. Dialect area was determined (as closely as possible) not just for the
authors but for their characters as well. It is obvious in certain cases that the
authors used very different contraction strategies for some characters than for
others, and that, just as descriptive prose and dialogue must be isolated, ulti-
mately each character’s dialogue should be coded separately for variables like
contraction, for which generation, class, and dialect area might be critical fac-
tors in the analysis.
Three late 20th-century texts (Cleary, 1968; Keillor, 1985; Tyler, 1988) were
scanned to provide data that could not otherwise be accessed. The fact that chil-
dren are known to use aggravated disagreement more consistently than adults
entails that children’s literary dialogue should be studied. While Cleary (Ramo-
na) or Rowling (Harry Potter) may soon be available on the web, scanning the
Cleary text was the only way of getting child dialogue into the corpus.12 The other
two authors were chosen to provide 20th-century adult dialogue from specific
areas of the country. In addition, all three authors’ characters come from roughly
the same region as the author, so contraction strategies would not be influenced
by the authors’ assumptions about speakers from other areas.
Add New Comment