A. Neijt, R. Schreuder
LANGUAGE AND SPEECH, 2009, 52 (2/3), 287–
Visual Intonation in the Prosody of a
Svetlana Dachkovsky, Wendy Sandler
University of Haifa
While visual signals that accompany spoken language serve to augment the
communicative message, the same visual ingredients form the substance of the
linguistic system in sign languages. This article provides an analysis of visual
signals that comprise part of the intonational system of a sign language.
The system is conveyed mainly by particular actions of the upper face, and
is shown to pattern linguistically and predictably in Israeli Sign Language.
Its components, aligned with prosodic constituents, are associated with
particular but general meanings and may be combined to derive complex
meanings. The Brow Raise component is functionally comparable to H tones,
signaling continuation and dependency, and characterizing yes/no questions and the if-clause of
conditionals, for example. The component Squint instructs the addressee to retrieve information
that is not readily accessible, and characterizes relative clauses, topics, and other structures. The
details of the componential analysis proposed here explain why the two components together
co-occur on such seemingly diverse structures as yes/no questions about mutually retrievable
information and counterfactual conditionals. Like auditorily perceived intonational melodies, the
visual intonational arrays in sign language provide a subtle, intricately structured, and meaningful
accompaniment to the words and sentences of language.
Recent decades have seen an increased awareness of the fact that linguistic communica-
tion is not limited to the oral-aural channel. Visually perceived gestures of the hands,
face, and body have been brought into the purview of research on spoken language,
and some of these nonverbal communicative behaviors constitute what is sometimes
Acknowledgements: This work was supported in part by the Israel Science Foundation. We
are grateful to participants in the Workshop on Visual Prosody at the Max Planck Institute
for Psycholinguistics for useful discussion of some of the issues addressed here.
Address for correspondence. Wendy Sandler, Sign Language Research Lab, University of
Haifa, Mount Carmel, Haifa, 31905 Israel; e-mail <firstname.lastname@example.org>
Language and Speech
© The Authors, 2009. Reprints and permissions: www.sagepub.co.uk/journalsPermissions.nav
Language and Speech 0023-8309; Vol 52(2/3): 287–314; 103175; DOI:10.1177/0023830909103175
288 Visual intonation in the prosody of a sign language
referred to as visual prosody, the topic of this special issue of Language and Speech. In
sign languages, languages that are transmitted entirely in the visual modality, the same
visual signals are organized into a constrained linguistic system, a system that shares
certain key features with the prosody of spoken languages (e.g., Wilbur, 1991, 2000, for
American Sign Language; Nespor & Sandler, 1999, and Sandler, 1999a, 1999b, for Israeli
Sign Language; van der Kooij, Crasborn, & Emmerik, 2006, for Sign Language of the
Netherlands). Sign languages have conventionalized ways of (1) dividing utterances
into prosodic constituents; (2) making signs more or less prominent; and (3) conveying
intonational “tunes,” tunes that are seen and not heard. Since the linguistic prosodic
system of sign languages is constructed from the same raw ingredients available to
speakers, studying the functions and patterning of sign language prosody can inform
the investigation of the visual signals that accompany spoken language communica-
tion. Here we take a closer look at the intonational part of the prosodic system in one
sign language, Israeli Sign Language (ISL).1 We demonstrate that specific actions of
the upper face, actions that also occur, if idiosyncratically, on the faces of speakers,
comprise part of the linguistic intonational system in this language.
The view that facial expression in sign language corresponds to intonation in
spoken language has been suggested by a number of researchers (e.g., Nespor & Sandler,
1999; Padden, 1990; Reilly, McIntire, & Bellugi, 1990a, 1990b; Sandler, 1999a, 1999c,
2005; Wilbur, 1991, 2000).2 Here we provide evidence showing that specific actions
of the brows and eyes in a sign language function and pattern much like intonational
melodies of spoken language. We argue that the articulations Brow Raise and Squint
have identifiable but general meanings, an approach which explains their occurrence
on a range of utterance types. When they combine with one another, the resulting
complex interpretation provides evidence that ISL intonational meaning is compo-
nential, as some have argued is the case in spoken language intonation as well (e.g.,
Bartels, 1999; Hayes & Lahiri, 1991; Pierrehumbert & Hirschberg, 1990). In the course
of the exposition, we show that these actions are conventionalized and are aligned with
prosodic constituents. In all of these ways, these linguistic facial expressions differ from
emotional facial expressions also used by signers, a difference which is expected to be
instructive in our understanding of the visual prosody that accompanies speech.
1 In earlier work, the term “superarticulation” was adopted in order to avoid the aural conno-
tation of the word “intonation,” but, just as Stokoe’s term “cherology” gave way to the more
general term “phonology,” we accept the suggestion of a reviewer and adopt the more general
term “intonation” here. However, we draw the line at the words “tunes” and “melodies” for
combinations of intonational elements, and substitute the word “arrays” instead.
2 The association of certain facial expressions and other nonmanual behaviors with specific
linguistic structures was first documented in detail by Liddell (1978, 1980). Liddell and some
subsequent researchers have explained this association by positing a direct link between
these nonmanuals and the syntactic structure of American Sign Language (see, e.g., Aarons,
Bahan, Neidle, & Kegl, 1992; Petronio & Lillo-Martin, 1997; Wilbur, 1999). Here we assume
that the link is indirect, and that intonation is part of the prosodic system, which in turn often
aligns with syntactic structure. See Sandler and Lillo-Martin (2006) for explicit arguments
in favor of this view.
Language and Speech
S. Dachkovsky, W. Sandler 289
We begin in Section 2 with a brief overview of relevant aspects of spoken language
prosody. In Section 3 we outline the form and function of ISL intonation within the
prosodic system. An analysis of the intonation system of ISL comprises Section 4,
focusing specifically on yes/no and wh-questions, so-called “shared information,” and
conditionals, plain and counterfactual. The form and distribution of the intonational
arrays marking these constituents in our data will be dealt with in that section, where
we investigate the meanings contributed by the individual actions Brow Raise and
Squint. In Section 5, we deal with componential behavior of these articulations in ISL,
showing how they combine to derive more complex meanings. In the final section of
the article, we summarize the similarities but also describe differences between the
intonational systems of spoken and signed languages, and point to some directions
for future research.
2 Intonation as part of prosody in spoken language
The language stream is not a monotonous string, but is divided up into hierarchically
organized rhythmic constituents. When we speak, we may mark the boundary of a
prosodic phrase by lengthening the word that ends it, by pausing, or both. Prominence
is assigned to some position in the phrase, and this phrasal stress also contributes to
the rhythm and serves to set one phrase off from another.
Intonation, the music of everyday speech, constitutes part of the prosodic
system. In our study of sign language intonation, we adopt Ladd’s definition of
intonation as “the use of suprasegmental phonetic features to convey postlexical
meanings in a linguistically structured way” (Ladd, 1996, p.6). Prosodic features
of fundamental frequency, intensity and duration are suprasegmental in the sense
that they are superimposed over constituents of different sizes, such as the word,
the phrase, or the whole utterance. This system is postlexical because it conveys
functions, meanings, and relations such as sentence type, speech act, focus, and
other aspects of information structure at the level of phrases, utterances, or the
discourse as a whole.
The intonational part of prosody is linguistically structured in the sense that it
is made up of a finite list of primitives—tones—which occur and combine with one
another according to rules (Beckman & Pierrehumbert, 1986; Pierrehumbert, 1980).
By dint of its temporal distribution, intonation reinforces the rhythmic structure of
an utterance, while at the same time the individual tones add elements of meaning
to the overall interpretation of the tune (Gussenhoven, 1984, 2004; Pierrehumbert &
Hirschberg, 1990). Pitch accents, phrase accents, and boundary tones are aligned with
elements on different levels of the prosodic hierarchy: the syllable, the prosodic word,
the intermediate phrase, or the intonational phrase.
Intonational phrases, the focus of the present study, are the primary domain of
intonational tunes; pitch accents, phrase accents, and boundary tones cluster together
at their edges, each element having scope over its respective domain of interpreta-
tion (Beckman & Pierrehumbert, 1986; Hayes & Lahiri, 1991). Constituents such as
nonrestrictive relative clauses ([His books,]I; [which I liked a lot,]I; [are out of print]I;)
typically occur in independent intonational phrases, as do topics, parentheticals, right
Language and Speech
290 Visual intonation in the prosody of a sign language
dislocated elements, and other constituents (Nespor & Vogel, 1986). The clustering
of the individual tones at the intonational phrase boundary and the componential
interpretation of the intonational contour are demonstrated in example (1), from
Pierrehumbert and Hirschberg (1990, p.273), which has a typical yes/no question
(1) Are legumes a good source of vitamins?
L* H H%
According to their analysis, the L* pitch accent on the stressed syllable of vita-
mins implies that the item is salient but does not form part of the utterance predication
(in fact, by asking this question the speaker expects the hearer to include the marked
item into his/her answer, and constitutes the predication). The forward directionality
of the utterance is emphasized by the high rise, made up of a high phrase accent and
boundary tone. The H phrase accent signals that the current intermediate phrase forms
part of a bigger interpretational unit, and the H% boundary tone conveys the same in-
formation about the whole intonational phrase, which is to be followed and completed
by the interlocutor’s answer.3
Individual tones can reoccur in different environments conveying a stable basic
meaning regardless of intonational, syntactic or lexical context. Consider the H%
boundary tone in utterance (2), occurring at the end of instructions for starting a car,
also from Pierrehumbert and Hirschberg (1990).
(2) If you’re lucky you’ve started your car
The high boundary tone (H%) conveys continuation or incompleteness, indicating
that the current phrase is to be interpreted with respect to a succeeding phrase. While in
example (1) above, H% gets a canonical yes/no question interpretation, in example (2) the
H% boundary tone contributes to the contingency relations in the conditional, and can
imply a relation of causality and conditionality between two conjoined clauses. Examples
(1) and (2) show that the connection between a certain intonational tune and an utterance
type is not accidental; it reflects a particular functional feature of the construction.4 The
linguistic structure of spoken language intonation—its alignment with prosodic boundar-
ies and its conventionalized componential interpretation—contrasts with paralinguistic
3 Dainora (2002) shows that there are strong statistical propensities for some pitch accents and
boundary tones to occur together, and suggests that a tunal rather than a tonal interpretation
of intonational tunes is called for. Her results are striking, and call for perceptual experiments
in order to determine whether they contradict Pierrehumbert and Hirschberg’s claim that the
individual components contribute to interpretation.
4 Bartels (1999) argues that the H% boundary tone expresses the general meaning of unassertive-
ness and incompleteness. Her analysis explains why H% is such a common intonational phrase
boundary tone, and provides a unified interpretation of the kinds of constituents it bounds.
Language and Speech
S. Dachkovsky, W. Sandler 291
uses of pitch variation for signaling emotions, which is not organized along linguistic lines
(Ladd, 1996, p.12).
3 Intonation and prosody in Israeli Sign Language
In sign languages, intonational properties are conveyed mostly by articulations of the
upper face. Like intonational pitch excursions, intonational arrays of facial expression
function postlexically to signal meanings which are typical of intonation: they mark
prosodic constituents for various discourse functions, such as distinguishing sentence
types (declarative utterances, wh- and yes/no questions); and they also express various
propositional attitudes5 like disbelief or assumption of shared knowledge. As we will
show in detail in the following section, conventionalized facial expressions in ISL
meet all of Ladd’s criteria for intonation, and they are also componential in structure.
These linguistic properties put the facial articulations discussed in this article in the
category of linguistic intonational signals, and distinguish them from paralinguistic
uses of face (see Dachkovsky, 2005).
We do not mean to imply that all facial expression in sign languages is intona-
tional. Facial expression also functions in the grammatical system as a phonological
component of lexical signs, as adverbial or adjectival markers (Anderson & Reilly,
1998; Liddell, 1980; Meir & Sandler, 2008), as mimetic character attributes, or as iconic
gestures (of the mouth in particular; Sandler, 2003, in press). As with all humans, deaf
or hearing, facial expression may also reflect emotions. We deal here only with the
linguistic intonational function of facial expression.
In earlier work, ISL was shown to have prosodic constituents at the following
levels of the prosodic hierarchy: phonological word, phonological phrase (see note
6), and intonational phrase (Nespor & Sandler, 1999; Sandler, 1999a, 1999b, 2005,
2006, in press; Sandler & Lillo-Martin, 2006). Here we focus on intonational phrases
alone, the primary domain for intonation, and the patterns of facial actions that
In an earlier study of 90 elicited sentences (30 sentences signed by three native ISL
signers), Nespor and Sandler (1999) found that intonational phrases are systematically
separated from one another by changes in head and/or body position and optional
eyeblink. These corporeal signals are enhanced by rhythmic characteristics of the
manual articulation of the final sign in the phrase, such as pause, hold, and increased
5 By the term “propositional attitude” we mean attitudes toward the propositions expressed in
interactive discourse. That is, we communicate the way in which our mind entertains those
propositions that we express: e.g., with doubt, belief, regret or pretense (Andersen & Fretheim,
2000; Sperber & Wilson, 1986). According to Andersen and Fretheim, basic types of propositional
attitudes are cross-linguistically communicated via sentence types—declarative, interrogative,
imperative, exclamative—while “more delicate attitudinal differences” can be expressed with the
help of non-truth conditional particles, or intonation, or a combination of the two (2000, p.6).
Language and Speech
292 Visual intonation in the prosody of a sign language
Distribution of intonational arrays on two intonational phrases. “If the
goalkeeper had caught the ball, (the team) would have won the game.”
size and duration. The latter are found at the lower level intermediate phrase6 boundary
as well, but are often more exaggerated at the intonational phrase boundary. Crucially,
intonational phrase boundaries are also signaled by an across-the-board change in facial
expression. No matter which facial articulators are involved, for example, outer or inner
eyebrows, upper or lower eyelids, and regardless of the articulation they manifest,
they all typically change their position at the boundary between intonational phrases.
The alignment of facial expression with intonational phrase boundaries is one of the
motivations for attributing intonational status to facial expression in ISL (and possibly
in sign languages generally). Just as elaborate and salient pitch excursions occur at
intonational phrase boundaries in spoken languages, so do full intonational arrays
change their configurations at these boundaries in this sign language.
Figure 1 shows the ISL sentence, “If the goalkeeper had caught the ball, they
would have won the game.” Like the other sentences in the present study, we coded
this sentence using an elaborate coding system that we developed, with 12 categories of
rhythmic and intonational signals of the hands, body, head, and each facial articulator
(see Section 4.1). Here, we show only those elements of the intonational arrays that
are relevant for the discussion, specifically, articulations of the face and head. The
line under the name of the articulation indicates its scope. Crucially, all aspects of
facial expression and head position change between the two intonational phrases, as
illustrated in the close-up in Figure 2.
6 The Nespor and Sandler (1999) study adopts Nespor and Vogel’s (1986) terminology, in which
the level below the intonational phrase is the phonological phrase. Here we use Beckman and
Pierrehumbert’s (1986) term “intermediate phrase.” The differences between them need not
concern us here.
Language and Speech
S. Dachkovsky, W. Sandler 293
Change in visual intonation at the IP boundary in a counterfactual conditional sentence.
The first intonational phrase of the utterance is characterized by Brow Raise and
Squint, which co-occur with the whole phrase, and is terminated with a lean forward
on the last sign. All nonmanual signals relax at the intonational phrase boundary, and
the second intonational phrase starts with the head position up and back, and neutral
expression on the upper face.
This description gives us good reason to believe that we are talking about a
prosodic system, but some important observations about the interpretation of intona-
tion remain unexplained. Specifically, we have found that Brow Raise characterizes
not only yes/no questions, but conditionals and temporal adverbial phrases as well.
And Squint characterizes mutually retrievable or “shared” information in some sense,
but it also typically characterizes relative clauses, remote past, and other structures.
Finally, the two articulations may combine as they do in Figure 1. A closer look at
the semantics of these elements and their distribution provides a unified explanation
of the way in which they function in the intonational system of ISL.
4 Meanings of ISL intonational articulations
As we have mentioned, previous research has shown that ISL has a rich system of facial
configurations serving a wide range of communicative functions (Meir & Sandler, 2008;
Nespor & Sandler, 1999; Sandler, 1999a, 2005). In this study we provide quantitative
support for these observations. In addition, we break down complex intonational
arrays into smaller components, and provide an analysis of the way complex meaning
is built up by combining them.7
7 These results are based on Dachkovsky (2005).
Language and Speech
294 Visual intonation in the prosody of a sign language
We created target sentences in Hebrew with specific types of linguistic structures that
we expected would elicit particular facial expressions, based on earlier research. We
aimed to elicit the following linguistic constructions: yes/no questions, wh-questions,
neutral and counterfactual conditionals, relative and temporal clauses, and constituents
containing mutually retrievable information. The sentence types were intermixed, and
from eight to ten tokens were elicited for each construction. To avoid a listing effect,
we wrote each sentence on a separate card. Each target sentence was embedded in a
mini-discourse in order to provide a controllable context and to minimize extraneous
associations that a signer might have had in his/her mind.
In choosing an elicitation procedure over analysis of spontaneous discourse, we follow
the “read and pronounce” methodology commonly used in much spoken language intona-
tion research (see Cerrato & Imperio, 2003; Hedberg & Sosa, 2003). In order to reduce both
artificiality and interference from Hebrew, we modify the read and pronounce technique
in the following way. Subjects do not read/translate the written sentence, but internalize its
meaning, put the card aside, and create a corresponding ISL sentence, which they convey
to another signer.8 Subjects were five native ISL signers.
This technique aims to establish a base line for the intonational system in an under-
studied modality, the sign modality, and is intended to pave the way for future studies of
spontaneous discourse. Our approach allows us to identify intonational patterns associated
with certain meanings and types of utterances with a minimum of uncontrolled variables
from the general discourse that could affect the intonation in ways that may not be rigorously
analyzable. The study attempts to establish such a base line for Squint and Brow Raise in
ISL, a language with a rich system of facial expression. Another reason for choosing an
elicitation technique is the scarcity of certain sentence types in spontaneous discourse.
Neutral and especially counterfactual conditionals are quite rare in natural discourse,
making elicitation necessary in order to amass a large enough corpus for identifying patterns
and establishing generalizations. Studies such as the present one based on elicited data will
provide a basis for further research with spontaneous data that are necessarily messier and
include contextual information that cannot be controlled.
The subjects were videotaped, and each sentence glossed with the help of native
signer consultants. The data were then coded through frame-by-frame viewing, using
the Facial Action Coding System (FACS) (Ekman & Friesen, 1978), which specifies each
Facial Action Unit (AU) with a numeral.9 Interpretations were checked by interviews
with native signer consultants.
8 We attempt to avoid the pitfalls of a translation technique by training our subjects to become metalin-
guistically aware of the difference between Hebrew and ISL and by having them convey the message
to another deaf signer. This method has proved successful in our work, measured by regularities in
structure of the ISL sentences and the extent to which they differ from the Hebrew prompt.
9 FACS is an anatomically based, descriptive system, identifying a set of 44 AUs. Alone or in
combination, these AUs are intended to account for any observed facial movement and head
posture. With a few exceptions, the Action Units have a one-to-one correspondence with
single muscles as identified by anatomists. FACS also specifies a way of coding four levels of
intensity, using the letters a–d, going from lowest to highest on the intensity continuum.
Language and Speech
S. Dachkovsky, W. Sandler 295
Typical yes/no question intonation. “Do you think it’s possible to learn a foreign language
in one year?”
Results: Correlation of facial expressions with particular
The coded data were analyzed to see if there was systematic correlation between the
type of linguistic construction/pragmatic function and nonmanual clusters. The data
showed a very high degree of regularity in these structures, strongly supporting the
conclusion that they reflect a conventionalized linguistic system.
For example, the study confirmed that yes/no questions in ISL are systematically
marked by Brow Raise (AU 1 2, typically accompanied by lines in the forehead) and
wide eyes (AU 5). This pattern was found in 100% of the yes/no questions elicited.
Forward head movement (AU 57) occurred in over 94% of yes/no questions. This
intonational array is illustrated in Figure 3. The underlined word in the figure caption
is the one pictured. Table 1 shows the number of sentences of each type that was
elicited and coded.10
10 A similar combination of facial markers and head position for yes/no questions has been
identified in American Sign Language (ASL) (Baker & Cokely, 1980; Baker-Shenk, 1983;
Liddell, 1980), British Sign Language (BSL) (Deuchar, 1984; Woll, 1981), Swedish Sign
Language (SSL) (Bergman, 1984), Sign Language of the Netherlands (SLN) (Coerts, 1992),
Norwegian Sign Language (NSL) (Vogt-Svendsen, 1990).
Language and Speech
296 Visual intonation in the prosody of a sign language
Typical wh-question intonation. “Where is the book?”
Figure 4 illustrates a typical wh-question: AU 4 (which draws the brows together
and/or lowers them) occurs in 92% of the wh-questions in our corpus, and AU 5 (which
raises the upper eyelids making the eyes look bigger) in 75%. As with yes/no questions,
a forward head movement (AU 57) characterizes wh-questions (100%).11
Squint (AU 44) is strongly associated with constituents whose status is negotiated
between the interlocutors as retrievable: it appears in 95% of the relevant environments.
For now, we will name this sort of information “mutually retrievable,” and will provide
a more detailed description and analysis in Section 22.214.171.124
In Figure 5, Yossi is assumed to be someone known to both interlocutors, but
not previously mentioned in the discourse. The Squint is an instruction to retrieve this
shared information. This kind of facial expression also marked 85% of the restrictive
relative clauses in our data.13 It can occur, sometimes in combination with additional
11 Wh-questions are described as having furrowed brows as well as a characteristic head posi-
tion in ASL (Baker-Shenk, 1983), SSL (Bergman, 1984), BSL (Deuchar, 1984; Kyle & Woll,
1985; Woll, 1981), SLN (Coerts, 1992).
12 Brow Raise has been described as the most prototypical nonmanual marker in ASL topics
(Baker-Shenk, 1983; Coulter, 1979; Liddell, 1980) and in SSL (Bergman, 1984). Alongside
with Brow Raise, Squint was discussed as an alternative possibility for topic marking in DSL
(Engberg-Pedersen, 1990), in which the two distinct markers indicate different pragmatic
functions of topics. A similar difference was observed in ASL by Baker and Cokely (1980).
13 The other 15% of the relative clauses in our data were marked not by Squint, but by Brow
Raise. The content of these relative clauses restricts the referents by supplying the condi-
tion on which the fulfillment of the main predication is contingent. This condition can be
understood as “systematic dependence” (Langacker, 1997; Ziv, 1997) holding between the
content of the relative clause and the main predication. It is these contingency and continu-
ation dependency relations that are signaled by Brow Raise in ISL.
Language and Speech