Cognitive determinants of subtractive word
formation: A corpus-based perspective*
STEFAN TH. GRIES
Abstract
This paper investigates mechanisms underlying the coining of intentional
morphological blends and complex clippings. In one case study, I investi-
gate the degree to which (a corpus-based definition of) psycholinguistic re-
cognition points play a role in these subtractive word-formation processes.
Also, I am concerned with the issue whether a separation of these two cate-
gories, which has been embraced by some but not all morphologists, is sup-
ported. Given the role that similarity plays in subtractive word-formation
processes, a second case study investigates the degree to which the source
words of blends and complex clippings are similar to each other and, again,
whether the empirical findings warrant this distinction in the first place.
Keywords:
blends; complex clippings; subtractive word-formation; unique-
ness points; recognition points; recognizability; similarity;
corpora.
1.
Introduction
One of the most creative word-formation processes—where creative is
used in the sense of ‘defying characterization by means of hard-and-fast
productive rules’—is blending, i.e., the intentional subtractive word-
formation process exemplified by a few examples in (1), where parenthe-
sized letters are those that enter into the blend.
(1)
a.
(br)eakfast
Â
l(unch)
!
brunch
b.
(mot)or
Â
h(otel)
!
motel
c.
(fanta)stic
Â
f(abulous)
!
fantabulous
d.
(fraud)
Â
(auditor)
!
frauditor
While these examples are probably all too well-known, they mask the
fact that blending is a process which has so far not been defined in such
Cognitive Linguistics 17–4 (2006), 535–558
0936–5907/06/0017–0535
DOI 10.1515/COG.2006.017
6 Walter de Gruyter
536
S. Th. Gries
Morphological processes
inflectional
lexical processes
processes
derivational
other
compounding
processes
processes
abbreviations
acronyms
other
blends
clippings
complex clippings
Figure 1.
Simplistic schema of a frequent classification of morphological processes
a way as to properly set it apart from a variety of other subtractive pro-
cesses which are superficially similar on one or more dimensions. Part of
the reason for this lack of a widely accepted definition is probably the fact
that subtractive word-formation processes are among the most under-
studied word-formation processes. In fact, for some scholars and in a va-
riety of textbooks, they are not even part of regular derivational word
formation proper because they are conscious processes that defy charac-
terization by hard-and-fast productive morphological rules. More specifi-
cally, a very simplistic classification openly embraced or at least implied
in much morphological work is the one represented in Figure 1 (cf. Algeo
1978 for a refined multidimensional classification).
A definition of blending to which many scholars could probably agree
is the one in (2).
(2)
Blending as a word-formation process involves coining a new word
out of already existing source words such that, typically,
–
two words (rather than three or more) are merged;
–
one or both of the words undergoes shortening in the merger, which
may be graphemic or segmental;
–
if no shortening occurs, the words exhibit partial overlap, which may
be graphemic or segmental (cf. (1d)).
While the above examples in (1) fit the definition in (2) quite well, the issue
becomes much more complicated when looking at more varied cases. Is a
word W a blend even if
–
the merging is ‘nonlinear / recursive’? (cf., e.g., transmigrate  modify
! transmogrify)
–
W contains material that is not from one of the source words or
that has been changed in the process of blending? (cf., e.g., quick Â
concrete ! quikrete and deliciously  delightfully ! delishfully)
Cognitive determinants of subtractive word formation
537
–
W looks like a neo-classical compound? (cf., e.g., movie  marathon
! moviethon)
–
the process contracts syntagmatically adjacent words? (cf., e.g.,
permanent  agriculture ! permaculture)
Also, note that the above definition in (2) leaves open the location of
the splinters in their source words: Is a word W a blend only if the begin-
ning of the first source word (henceforth sw1) is merged with the end of
the second source word (henceforth sw2) as above in (1) or also if the be-
ginning of sw1 is merged with the beginning of source word sw2, i.e.,
what has sometimes been referred to as a complex clipping (cf., e.g.,
system  administrator ! sysadmin)?
If one turns to previous work on blends, three points become immedi-
ately obvious. First, there is a large body of mostly classificatory work,
trying to answer questions such as ‘how does one distinguish blends
from other similar word-formation processes such as the one outlined
above?’. This would be exactly the kind of study needed to determine the
categorial status of blends, and much of the work falling under this head-
ing raises important issues and/or proposes interesting criteria on the
basis of which word-formation processes can be distinguished (cf. Algeo
1977, Algeo 1978 for a paradigm example, and Lo´pez Ru´a 2002, 2004
for a prototype-inspired approach largely using, but apparently unaware
of, Algeo’s criteria). However interesting these studies are, one of their
shortcomings is that, polemically speaking, they basically attempt to
squeeze blends etc. into an a priori established set of categories on the
basis of some criteria without ever determining to what degree the criteria
invoked are warranted when subjected to empirical scrutiny.
Second, in contrast to much classificatory work, there is only a handful
of studies which attempt to tackle the issue of how blends are actually
formed.1 These studies adopt a preliminary definition of blends much as
the one I proposed above and investigate, for example, the order of the
source words in blends, the choice of the location of the cut-o¤ points,
the lengths of the source words’ splinters constituting the blend, the role
similarity plays on di¤erent levels of analysis, etc.; cf. Kubozono (1990),
Berg (1998), Kelly (1998), Kaunisto (2000), and for work from a more
cognitive perspective, cf. Lehrer (1996), Kemmer (2003), Gries (2004a, b,
c). Especially the work by Gries has been concerned with the fact that
blend coiners choose source words for a blend that (i) communicate
what is to be communicated and that (ii) are more similar to each other
graphemically, segmentally, and phonologically than one would expect
on the basis of chance. Also, not only do blend coiners choose similar
words to blend, they also blend them in such a way as to render the blend
538
S. Th. Gries
similar enough for both source words to be recognized again since other-
wise the wit of many blends could not be appreciated in the first place.
However, a shortcoming of this work by Gries is that he investigated the
notion of recognizability only with respect to the amount of material of
each source word that is still part of the blend even though other ap-
proaches are potentially more useful and revealing.
Third, some scholars at least are puzzled by these two issues—the com-
plexity of how blends are actually formed and the issue of how to come
up with a viable morphological taxonomy—to such a degree that their
conclusions about the degree of patterning observable at all are rather
pessimistic. Bauer (1983) and Cannon (1986) admit it most openly:
in blending, the blender is apparently free to take as much or as little from either
base as is felt to be necessary or desirable. [ . . . ] Exactly what the restrictions are,
however, beyond pronounceability and spellability is far from clear. (Bauer 1983:
225)
we find no discernible relationship between phonology [ . . . ] and a viable blend.
[ . . . ] This fact helps to make blends one of the most unpredictable categories of
word-formation. (Cannon 1986: 744)
In the present study, I will address the two shortcomings mentioned
above: (i) the fact that recognizability may correlate with more than just
the amount of graphemic/segmental material of the source words and the
blend and (ii) the fact that classificatory approaches to subtractive word-
formation processes often do not motivate the choice of parameters,
which is why a bottom-up test of proposed distinctions may be useful
and in fact called for.
Section 2 will investigate the degree to which the psycholinguistic no-
tion of recognition or uniqueness points plays a role in the o¤-line forma-
tion of blends. One point to be looked at is that coiners of subtractive
word formations must ensure that their creation’s component parts can
be recognized again. However, the secure way of doing this—simply in-
cluding (nearly) the whole word—is not available since blends and com-
plex clippings would then not exhibit the wit for which they are frequently
put to use (esp. in advertising) because (i) no cunning word play would be
involved and (ii) the blend would not be similar to both its source words
anymore. If, for example, the automobile brands Chevrolet and Cadillac
were to merge, I dare say nobody requested to symbolize that in a witty
blend would suggest Chevrolet  Cadillac ! Chevroladillac. Thus, I will
investigate whether word coiners make use of the so-called recognition
point of the source words involved in order to ensure that, first, their
new creation is not too long and thus not very witty (as it would be if
Cognitive determinants of subtractive word formation
539
both words would hardly be shortened and just stuck together) and, sec-
ond, not too short to be recognized in the first place (as it would be if too
little of the source words is still present in the blend, as in Chevrolet Â
Cadillac ! Chac). As a matter of fact, I would assume that in, say adver-
tising and brand name development, even more factors play a role, in-
cluding for example the desire to make the word formation not too sim-
ilar to competing product names, which can be important to minimize the
risks of trademark infringements or customers mixing up names of medi-
cations etc. In a way, all this can be phrased in a parlance that is very
familiar to cognitive linguists such that blend coining is a very intricate
process requiring coiners to deliberate how to strike an optimal balance
between many di¤erent competing motivations; the process will therefore
often involve experimenting with, and fine-tuning of, di¤erent formations
until the ’right’ formation has been identified and is, thus, clearly an o¤-
line process.
Another point to be tested in the very same section is whether the no-
tion of recognition points also allows us to distinguish between two di¤er-
ent subtractive word-formation processes, namely blends and complex
clippings. This comparison would be interesting because scholars are di-
vided as to whether these are actually two di¤erent classes, which is why
an empirical study may contribute to our knowledge of the theoretical
status of the two processes.
Given the above and the results to be discussed in Section 2, Section 3
will then turn to the role of similarity in subtractive word formation.
While earlier work by Gries has shown that similarity plays a role for
the formation of blends (cf. especially Gries 2004b), it has remained un-
clear whether this also applies to other subtractive word formations. I
will investigate whether blends and complex clippings behave di¤erently
when looking at the role similarity plays in their formation and the objec-
tive is, again, to determine whether di¤erent subtractive word-formation
processes can be distinguished on the basis of data rather than preconcep-
tions about what their defining characteristics are.
2.
A corpus-based approach to recognition/uniqueness points
2.1
Methods
In this section, I will investigate the role of recognition/uniqueness points
on blend-formation. However, I must first clarify the corpus-based oper-
ationalization in quite some detail to make explicit what method was cho-
sen on which grounds. The uniqueness point UP of a word W is the point
at which W can be uniquely identified from a set of candidate words. The
540
S. Th. Gries
recognition point RP of a word W is the empirical estimate of W ’s UP.
More specifically, RP is the point at which a majority of speakers (e.g.,
85%) can recognize W with a high probability (e.g., 80%) when presented
with parts of W. It will be important below to know that RPs exhibit a
word-frequency e¤ect of tokens: more frequent words are recognized
faster (by approx. 20%) than their closest competitors (cf. Marslen-
Wilson 1987: 91f.).
RPs have been determined both experimentally—for example using
gating tasks, phoneme monitoring, shadowing, lexical decision tasks, or
word vs. nonword detection—as well as on the basis of (usually elec-
tronic) dictionaries or on the basis of natural language corpora. As is ob-
vious from the title of this paper, I will approach RPs in the latter fash-
ion, i.e., on a strictly corpus-linguistic basis. To give two examples for
how RPs maybe approximated very simplistically in corpora:
–
in the British National Corpus World Edition, the letter sequence is-
lamiciza narrows possible continuations down to the unique possibil-
ity islamicization;
–
in the CELEX database for English (Baayen et al. 1995), the pho-
neme sequence [ebnaIzeI] narrows possible continuations down to
the unique possibility [ebnaIzeI§n].2
One attractive feature of using a corpus-based approach to RPs is that
this approach makes it possible to not only identify the RP as such, but
one can easily also identify also the number of types of all candidate sets
as well as the frequency distribution of each candidate set. For example,
when the target word is islamicization, a corpus-based frequency list of
words allows for identifying all words starting with i and their frequencies
of occurrence, all words starting with is and their frequencies of occur-
rence, etc. up to islamiciza., where only islamicization and its frequency
are left. However, how would one approach cut-o¤ points of blends and
other subtractive word-formation processes?
Let us approach this question using the example of (agit)ation Â
( prop)aganda ! agitprop. In other words, how would we operationalize
the RP of agitation, i.e. the point where subjects may be (most) likely to
guess from the part they are exposed to that sw1 that entered into the
complex clipping is agitation? One easily conceivable possibility would
be to, first, determine for each beginning of agitation the number of types
and/or tokens that start with this beginning (cf. Table 1).
In a second step, one could then plot the type and token frequencies
along the parts of agitation to determine the point where the cost of add-
ing another letter (of course, the logic also applies to phonemes) does
not result in an appropriate further reduction of the search space. This
Cognitive determinants of subtractive word formation
541
Table 1.
Type and token frequencies of words beginning with beginnings of agitation (based
on the CELEX database)
Part of sw1
Types starting
Tokens starting
Examples
with part of sw1
with part of sw1
a
4,347
2,840,567
a, able, adore, agree, . . .
ag
137
45,320
agave, age, . . .
agi
12
347
agile, agitator, . . .
agit
8
267
. . .
agita
8
267
. . .
agitat
8
267
. . .
agitati
3
125
agitation(s), agitating
agitatio
2
118
agitation(s)
agitation
2
118
agitation(s)
Figure 2.
A ‘scree plot’ representation for type and token frequencies of parts of agitation
(based on the CELEX database)
approach, basically an adaptation of the scree plot technique used in fac-
tor analysis, is represented in Figure 2.
Figure 2 suggests two approximations to the RP that are marked with
the rectangles. One is at the third letter, i.e., at agi, while the other is at
the seventh letter, i.e., at agitati. On both occasions, the search space is
reduced markedly but the next letter will not make guessing the word
much easier.
However attractive this approach may seem at first, it has a few
problems associated with it. A practical problem is that, the larger the
542
S. Th. Gries
Figure 3.
A scree plot representation for type and token frequencies of parts of absolutely
(based on the CELEX database)
database, the more graphs one would have to inspect. Even worse, for
each blend one would have to look at four graphs: (i) sw1 using letters
(as above), (ii) sw2 using letters, (iii) sw1 using phonemes, and (iv) sw2
using phonemes. With more than a few hundred cases this becomes infea-
sible quickly. Another practical problem is that not all cases can be de-
cided straightforwardly, as is obvious from the analogous representation
for absolutely (as used in, say, absolutely  positively ! absotively), where
Figure 3 shows that no similarly obvious RP emerges.
However, while these shortcomings might be overcome using some in-
genious statistical technique,3 there is one shortcoming that can not. The
point is that this method only looks at the number of word types or to-
kens which are possible given a particular part of a source word—it does
not take into consideration the frequency distributions of these candidate
types or tokens. Imagine a word starting with the letter sequence abs. Let
us also assume that upon giving the first two letters, ab, there are 100
word tokens in our corpus that start with ab. Now, the method exempli-
fied above would result in a data point (2, 2): we are looking at the second
letter, and log10 100 is two, too. However, the method does not take into
consideration the frequency distribution of the 100 tokens. Let us assume
just for the sake of the argument that the 100 tokens in fact instantiate
just four types. There are now two extreme possibilities for how the distri-
bution may look like, which are represented in Figure 4.
Cognitive determinants of subtractive word formation
543
100 tokens
uninformative distribution
informative distribution
type 1
type 2
type 3
type 4
type 1
type 2
type 3
type 4
25
25
25
25
95
3
1
1
Figure 4.
Hypothetical distributions of four types across 100 tokens
Obviously, the left distribution is extremely uninformative: ab is not a
good clue because the four types from the remaining candidate set are all
equally likely, which is also reflected in the entropy value for this distribu-
tion: H ¼ 2. The right distribution, however, is very informative: the like-
lihood that type 1 is the target word is overwhelmingly high and entropy
is correspondingly low: H Q 0:35. However, the method outlined above
cannot distinguish between these two distributions. Thus, what is needed
is a way of identifying distributions which makes it easy to identify a
source word that does not only depend on the number of types or tokens
in the candidate set. In this paper, I will adopt the following method to
approximate the RP of a word W. For each part of a source word W of
a subtractive word-formation process (i.e., for a, ag, agi, agit, . . . , agita-
tio, agitation)
–
count the number of types in the corpus that begin with this part;
–
count the number of tokens in the corpus that begin with this part;
–
determine the number of types that begin with this part that have
higher token frequencies than the target word;
–
locate the first position of the minimum of these frequencies.
Let us clarify this procedure on the basis of the example from Table 1
above; consider Table 2.
Table 2.
Type and token frequencies of words beginning with beginnings of agitation
Part of sw1
Types starting
Tokens starting
Frequency
with part of sw1
with part of sw1
rank of agitation
a
4,347
2,840,567
595
ag
137
45,320
24
agi
12
347
1
agit
8
267
1
agita
8
267
1
agitat
8
267
1
agitati
3
125
1
agitatio
2
118
1
agitation
2
118
1
544
S. Th. Gries
The three left columns are the same as before, the key change is the
rightmost column. It provides the frequency rank of the target word, agi-
tation, of all types that start with the part given in the leftmost column. In
other words, Table 2 is to be interpreted as follows. There are 2,840,567
tokens in the CELEX database starting with a. These are made up of
4,347 types. Of these 4,347 types, 594 (¼ 595 À 1 for agitation itself ) are
more frequent than agitation, which is why a is not a good clue to agita-
tion. The second row reveals that there are 45,320 tokens in the CELEX
database starting with ag. These are made up of 137 types. Of these 137
types, 23 are more frequent than agitation. Now finally, there are 347 to-
kens in the CELEX database, which are made up of 12 types, and of
these 12 types, agitation is the most frequent one. Thus, this is the first po-
sition of the overall minimum, and, thus, it is here where the part of the
leftmost column becomes the most likely clue for agitation for the first
time. While it is this point within agitation that is singled out by the pro-
posed method, this point is probably still a little early for a psycholinguis-
tic RP proper, which is why I will refer to agi as the SP (for selection
point) of agitation (following a suggestion by R. Harald Baayen).
One final step is necessary. We have now seen how RPs can be ap-
proximated on a corpus-linguistic basis using SPs, but the final questions
that remain are (i) how to determine whether coiners of blends or com-
plex clippings care about SPs when they choose a cut-o¤ point and (ii)
how to test whatever result we get for significance. What is needed is a
random baseline or, even better, an index that measures the deviation of
all possible cut-o¤ points from the actually chosen cut-o¤ point.
My answer to these challenges can be understood easiest with reference
to Figure 5. I compute for each source word the SP as above (circled in
Figure 5) and for each position at which the coiner of a subtractive word
formation I compute the distance in letters/segments from the SP, which
are given in the last row of Figure 5. From this we can compute the aver-
age random distance to the cut-o¤ point at the SP, namely the mean of
the set of all distances fÀ2; À1; 0; 1; 2; 3; 4; 5; 6g, which is þ2. The final
step then consists in comparing the actual cut-o¤ point at distance þ1
letters of source word to be recognized
a
g
i
t
a
t
i
o
n
frequency rank of source word
595
24
1
d
1
1
1
1
1
1
distance to the ideal cut-o¤ point
À2
À1
0
þ1
þ2
þ3
þ4
þ5
þ6
"
"
actual cut-o¤ point
‘randomly’-chosen point
Figure 5.
Distances of cut-o¤ points from the SP
Add New Comment