Integrating computer games in speech therapy
for children who stutter
Walter Kosters, Fons Verbeek
Niels O. Schiller
Leiden University Centre for Linguistics
Leiden Institute of Advanced
Leiden Institute for Brain and
Leiden Institute for Brain and Cognition
Leiden, The Netherland
Leiden, The Netherlands
Leiden, The Netherlands
creating conceptually novel tasks, which may go beyond the
In this paper we describe our work with the development of a
paper-based training methods. The major challenge of training-
novel computer game supporting speech therapy for children who
system developers is to seemingly incorporate relevant
stutter. We discuss the motivation for our work, the theoretical
educational or therapeutic elements into the “world” of the game.
background, and outline the plans and strategies for the
If such elements trigger learning mechanisms of particular skill
development process. Finally, we describe a preliminary study
while taking advantage of the increased attention and motivation
carried out to evaluate the potential of integrating computer games
levels of the player, a potentially highly effective training tool can
in speech therapy for children.
be offered. According to Hubbard , attractiveness should be a
primal concern when designing educational software. He suggests
that while playing computer games, the immersion effect is
Stuttering, speech-motor training, real-time visual feedback,
created, whereby the environment into which the players are
submerged progressively increases their levels of attention and
concentration on the goal, which stimulates the learning process.
The power of computer games is in providing immediate and
Stuttering is a speech disorder in which the flow of speech is
accurate feedback in a systematic way. As opposed to a delayed
disrupted by involuntary repetitions, prolongations of sounds, and
form of feedback, a child perceives the correlation of his/hers
silent blocks. From a motor point of view, stuttering can be
performance to the presented goal in real-time, and gains the
described as a disorder in the timing and coordination of the
ability to monitor and adjust his/hers behaviour in a very precise
subsystems involved in speech production, namely respiration,
manner. This immediate correlation is central to the speech
phonation, and articulation . One of the most common forms of
training environment we are developing. The issue of control
stuttering treatment for children is speech motor training. In a
plays a major role in the design of interactive programs for
speech clinic, a child is typically asked to repeat a large amount of
children. The common approach is to allow children to choose
syllable strings, such as "puh-tuh-kuh". By systematically
between different formats of exercise and different levels of
manipulating the complexity of syllables, speech motor training
difficulty. In general, research indicates that the key points for
aims at establishing correct speech motor programs at the syllable
creating effective training computer games are provision of
level. After training, a child uses those motor programs during
feedback, control, curiosity and a feeling of competence . To
spontaneous speech. While training, the speech therapist provides
create the feeling of challenge, goals must be clearly presented,
verbal feedback on the child's production, and directs to the next
and be not too easy nor too difficult. When children try to reach a
exercise. This form of practice presents several inherent
specific goal, they must be provided with an intuitive feedback.
Curiosity is recognized as a powerful source of intrinsic
motivation . A structured build-up progress through difficulty
A child is expected to remain motivated to perform large
levels will increase motivation to go further in the training. The
amount of practice, while pursuing tasks which are rather abstract
idea is to channel existing computer-usage habits of children
towards a useful and learning-aimed interaction.
» The feedback is provided to the child only after he/she has
With those ideas in mind, together with speech therapists we have
produced the syllable set. This does not allow the child to monitor
decided to develop a new training tool for children who stutter.
specific articulatory movements in real-time and learn to correct.
The training tool is implemented as a computer game, in which
» Training is dependent on the presence of a speech therapist,
the child is encouraged to achieve specific goals by using his/hers
which is expensive. A child has no possibility to practice the
voice to play the game. The game provides real-time visual
learned skills at home at a preferred moment of time.
feedback on the quality of speech productions and motivates the
child to improve speech motor skills in order to progress in the
game. The feedback we plan to provide is related to the bio-
feedback paradigm, where a certain physiological signal is
One potential tool to improve the current training situation is a
visualized in real-time to allow a person to gain some amount of
computer-based system. Considering the remarkable fascination
control over it . In our case the bio-signals are the voice
of children with computer games, it seems well-reasoned to house
parameters relevant to the specific speech motor training
speech motor training within a computer-game environment.
methodology. The structure of exercises used to train speech
Educational computer games have been widely recognized as
motor skills is reflected in the game, which becomes a dynamic
having great potential to motivate children to perform tasks which
extension of the treatment program. The therapy game is by no
are not intrinsically motivating. This is especially relevant for
means assumed to replace therapy sessions with a clinician, but
children who need to practice certain skills where a disability is
rather to complement existing therapy, and provide an attractive
diagnosed. The usage of new media enables developers to engage
possibility to practise speech motor skills at home.
children in learning not only by boosting motivation, but also by
3. SYSTEM DEVELOPMENT
» Sensible speech analysis and visualization. The game should
3.1 Training methodology
provide visual feedback on relevant speech parameters, such as
An important aspect of speech motor training methodology is the
intensity and pitch contour, voiced/voiceless consonant detection,
choice of verbal stimuli to be practised. Researchers distinguish
smooth coarticulation, correct rate and rhythm.
between stimuli of real words, non-sense words which have a
legal word structure, and non-sense syllable strings which have an
3.3 Speech signal processing
unnatural prosodic structure, since they are equally stressed. It is
Following the need for a more intelligent analysis of the incoming
agreed that the non-sense syllables stimulus yields a measure most
speech signal, we are now investigating the possibilities of
related to speech motor abilities, since it is least affected by
integrating a machine learning component based on speech
linguistic factors which influence real-word and "legal" non-sense
recognition (SR) technology. The idea to utilise a recogniser is
word repetitions. Since the syllable string is completely
related to the desire of providing feedback on the quality of
unfamiliar, most likely never heard or produced before, its
specific phonemes or syllables which will be part of the training
production cannot rely on prior linguistic knowledge, and a child
program. Since we know in advance which syllable sequences are
has no stored motor programs to rely on. Therefore, this kind of
expected, the SR component could be able to segment the
practice requires an assembly of new motor programs, which
incoming speech signal according to the expected syllable
allows researchers to focus on measuring and training the speech
sequence, which means to align the two sequences with markers
motor skills per se.
on segment borders. Next, the segments would have to be
We plan to take advantage of psycholinguistic studies about the
evaluated for the relevant parameters (precision of articulation,
“mental syllabary”  to contribute to the methodology
substitution of phonemes, smoothness of coarticulation etc.)
underlying speech motor training. This syllabary is assumed to
according to the acoustic models corresponding to the expected
contain pre-compiled articulatory motor programs for the most
segments. This in fact is an elaborate pattern recognition process
frequently used syllables in order to facilitate their rapid
which can potentially yield quality scores (confidence/distance
production, such that speakers do not have to assemble these
measures) for our exercises. However, in order to perform such
syllabic motor programs from scratch each time they produce
elaborate analysis, the system must be first trained on a rather
them. To support those assumptions, Schiller et al.,  have found
large database of speech samples, representative of the expected
that for English, Dutch and German, approximately 80% of
input tokens. Furthermore, next to each syllable sequence
speech in those languages utilizes only 500 different syllables,
production, such a database must contain a transcription of the
which is only 5% of the entire syllable inventory.
relevant error types, as well as clinician's judgements of the
This suggests that speakers produce the same syllables over and
sequence. The compilation of such an elaborate database is
over again. Therefore, training children who stutter with syllables
logistically rather non-trivial, demanding many expert working
from different "frequency bins" in a structured program should
hours. This disadvantage is common to supervised machine
produce different effects on speech motor skills. More precisely,
learning techniques. In general, those methods need a fully
training syllables that occur often in the language of the speaker
annotated training database to learn to discriminate incoming
should produce a greater advantage in daily language use for the
signals. Alternatively, we can think of looking on the fly at the
speaker than syllables that rarely occur in the language.
acoustic features of the incoming signal, having a set of acoustic
filters (or discriminators) to decide on the overall quality. The
3.2 Design strategy
disadvantage of such an approach is that the system is not learning
Our starting points in developing the new training system are the
following key questions regarding system usability:
A reasonable solution would be to conceive an unsupervised
learning routine, in which a database of speech samples (not fully
How to make the system adaptive to different users?
annotated, but with overall quality scores) is passed through a set
» Which age groups can benefit from the therapy game ?
of acoustic-based discriminators (a sequence of acoustic analysis
» Which form of feedback best stimulates children to practice?
functions). The resulting data structure is a n-dimensional space,
where n = number of discriminators. A classification function is
In the process of design and implementation, the aim is to draw
then derived which separates the data-space into areas of quality
from existing research in developmental psychology dealing with
scores. Further on, in the real-time situation, the same set of
learning and child-computer interaction. This knowledge, together
acoustic discriminators is applied to the incoming speech signal,
with clinical experience is used to realize the new training system.
and the classification function is used to obtain a quality score.
The departure point is interviewing speech therapists in order to
The advantage of such an approach is that machine learning is
elicit requirements for the new system. Once the concepts for the
applied to incorporate previous knowledge of utterance judgement
system are formulated, the research-through-design approach is
(by clinicians) to the evaluation of productions in real time. The
taken. The process involves several cycles of prototyping and
database needed to train such a system needs not to be fully
testing for usability with children (not only children who stutter).
annotated, which makes its compilation much more feasible.
Our guidelines for designing a sustainable therapy tool are:
» Adaptive mechanisms to support individual, child-specific
4. SYSTEM VERIFICATION
training procedures. One possibility is to allow the clinician to set
4.1 Usability studies
child-specific goals and training sets. Another option is to
After each design iteration, the developed prototype will be tested
program the system to adapt dynamically to the performance of
with children. The usability tests will aim to answer the posed
the user by adjusting targets on the fly.
research questions — revealing which age groups are most suited
» Key partners in the process of game development are the
to use the system, and how motivated children are to interact with
children themselves. Our experience so far echoes the enormous
the current prototype. Following guidelines of testing usability in
value of children's contribution to the design and evaluation of
child-computer interaction , we will conduct qualitative studies
interactive computer games (see also ).
by means of:
» Questionnaires — items on motivation, attitude, appreciation.
occurrence of the to-be-trained syllables while controlling for
potentially confounding factors such as segment frequency and
Behavioural observations — video recordings and independent
articulatory difficulty arising from the combination of certain
judgements about engagement.
segments (co-articulation). Productions of various target syllables
» Free choice — frequency of selection and the time spent with
will be evaluated using quantitative criteria such as voice onset
the speech-training system, or with two other games.
latencies and speech errors by professional speech therapists
before and after the training.
4.2 Treatment efficacy studies
Once the developed training system is “mature”, we will conduct
5. PRELIMINARY STUDY
treatment efficacy studies with children who stutter. Together with
5.1 A prototype game
speech therapists, we will measure the effect of training with the
So far, we have conducted an initial study on the potential of
new system on stuttering severity of children. In the process of
using a therapy game to supplement existing forms of speech
testing the effectiveness of the developed system, questions will
motor training. In that study, we have built and experimented with
be raised concerning the effects of training on fluency:
a straightforward prototype of a speech training game . In a
» How does training affect spontaneous speech of children who
training session, a child controls the motion of an antagonist
stutter in long the term?
figure using his/her voice, with the goal of hitting targets on the
» How does an individual child's stuttering history (time since
screen. Those targets represent the syllables which the child has to
onset of stuttering, severity of stuttering) affect the treatment
produce, so that targets in the game are hit when the child
outcomes for that child?
» Which syllable sets are most effective as training material?
The treatment efficacy studies will utilize a group comparison
design. One group of children who stutter will undergo speech
motor training with the computer-game, and a control group will
receive standard speech motor therapy. A total number of 30
subjects will be recruited, aged 6, 7, 8, and 9 years. The training
will last for 15 weeks. Measurements will take place pre-
treatment, 5 and 10 weeks into treatment, and post-treatment.
Dependent variables are stuttering frequency, measured as % SS =
percentage of stuttered syllables (see Figure 1), and stuttering
severity rating (a continuous scale from 0 to 7). Audio and video
Figure 2: A screen shot from a prototype game
produces the correct speech pattern in terms of rhythm and
continuous phonation (see Figure 2).
A rather basic analysis is used by applying a low-pass filter to the
intensity signal so that a smooth loudness pattern is obtained and
can be visualised. This object returns the loudness level of the
incoming signal based on the psycho-acoustic approach to human
sound perception. This is found to be the most practical and robust
approach in application-driven work. The loudness is the
Training duration in weeks
subjective judgement of the intensity of sound. First, the Short
Figure 1: Exemplary fluency measurements: percentage of
Time Fourier Transform is applied to a window of size 512, and
stuttered syllables over time
the spectrogram is obtained, containing the energy levels of 512
frequency bands. Subsequently, each energy value
samples of children's productions will be collected throughout
Ek is summed up and the average constitutes the loudness at time
various speech tasks. Those tasks vary in complexity from picture
t (see formula below).
naming tasks, reading tasks, and finally spontaneous speech.
Samples will be analysed with the help of a research assistant, and
stuttering frequencies will be measured independently by two
researchers in order to provide high reliability. The stuttering
severity rate will be perceptually determined by a speech therapist
and the parents of a child.
k is the amplitude of frequency band k of total N in the
spectrogram (N = 512). This approach allows us to catch even
subtle pauses in phonation and by this discriminating relatively
4.3 Psycholinguistic studies
continuous and abrupt voicing.
A psycholinguistic study will measure how treatment outcomes
Short gaps in the signal intensity are detected, assuming to reflect
are affected by manipulating different syllable sets for training.
breaks in phonation, enabling the program to trigger appropriate
The aim of the psycholinguistic study is to reveal which choice of
visual cues. In the current version, the wooden floor underneath
syllable sets will provide the most efficient training material. In
the game hero would break at the point where the break in
the empirical study, we will parametrically vary the frequency of
phonation was detected. The game prototype does not include any
mechanism to distinguish and recognise specific phonemes or
based on that. For example, she would ask the child “Why the
syllables. After each exercise, the score is presented and the child
running figure did not reach the target?”, and the child would
proceeds to the next level. The clinician can set child-specific
respond by recognising the relation of his/her voice volume to
difficulty level, rate of the exercise, and switch between different
matching the target. By guiding the child, the clinician is able to
modes of visualisation (see Figure 3).
refer to specific ways for improving certain voice quality in terms
of the game's symbols, to which the child easily relates. For
5.2 Clinical tests
example, instead of saying "Your volume is lower at the last part
We have tested the prototype game in a speech clinic with
of the syllable set", the clinician would say "Try to hit the last
children who stutter. We have conducted two experiment sessions
three cows as well now". Thus, the role of the clinician during the
with children at the Rijnland Stuttering Clinic in Oegstgeest, The
training is important in few aspects, at least in the first stage,
Netherlands. In each session we have tested with 8 children, one
when a child gets accustomed to the game. First, making the child
at a time. There were 2 girls and 14 boys. We have invited both
comfortable with the training situation. This we have especially
children who stutter and those who do not. In total, 4 out of 16
observed when the child first made contact with the game. There
children were non-stutterers. We wanted to check if those children
were a few cases during the second test session, when children
relate differently to the game than those who stutter. The ages of
refused to approach the game in the first place without the
the children in the first session were: 9, 11, 12, 9, 11, 4, 7, 8.
clinician being present. Secondly, at providing guiding cues to the
Children in the second session were: 6, 10, 8, 6, 10, 5, 9, 9.
child during the exercises through the symbolic language of the
The main question we have posed is whether children relate easily
game, thus referring to concrete visual objects and tasks.
to the kind of interaction the game offers. The goals of those
experiments were as follows:
» Observe the reaction of children to the prototype of our game.
» Observe whether children show motivation to succeed.
» Collect explicit impressions of children about their experience.
» Look at the way clinicians refer to the game as a therapy tool.
» Observe the reaction of parents to the training process.
» Analyse the strong and weak points of the development so far.
The overall reaction of children to the game prototype was
positive. This impression was built from both observing the
children at play time and asking them explicit questions
afterwards. It was clear to see that in almost all cases, children
seemed involved and focused on the exercise. Another clear point
Figure 3: An interface screen for the clinician
was that the children immediately understood the principle and
rules of the game without explicit explanations. In order to
6. CONCLUSIONS & FUTURE WORK
evaluate how intuitive the interaction with the game is, we have
The results of our preliminary clinical tests and usability
simply let a child play the game without instructions and
evaluations convincingly suggest that computer games are a
guidance. After the first few trials with their voice, all children
powerful tool for motivating children to practice speech motor
picked up the principle and started aiming to hit the target objects
skills. The experiments showed high attention and concentration
while pronouncing the syllable sets. We have recorded no case of
levels of children who practised, as well as short term
a child asking how does the game work or being puzzled by the
improvement in performance in terms of the game scores. The
reaction of the moving figure to his/her voice or by the way the
visual environments used in the prototype game proved to be easy
target objects react to the figure hitting them. Based on a clear
for children to relate to, however more variety is needed to sustain
pattern of immediate relation to the game's mechanism, we can
conclude that children found the interaction principle simple and
Following our current results and experience, we have sketched
intuitive. Observing the determination expressed by most children
the main directions for future work, with the idea to proceed with
during the syllable production as well as the joyful reactions upon
building a widely usable training tool. Psycholinguistic studies
completing a screen with optimal performance, we can assume
should expand our understanding about the relative effectiveness
they experience a rather high motivation to succeed. A clear
of various syllable types in speech motor training. Besides
contribution to the motivation factor were the scoring screens of
providing a firm theoretical framework for the treatment program,
the game during the second test session. The children could
such findings will indicate the best choice of training material for
perceive in a clear way how well they had performed and reacted
the program. Furthermore, we plan to investigate which speech
strongly either with satisfaction or dissatisfaction, and in both
processing techniques can best provide evaluation on the quality
cases with an intention to perform better. This observation goes
of production in a real-time situation. We aim to strike a balance
along well with our prediction that a concrete and simple goal
between utilising machine learning approaches while not relying
would create motivation to succeed in the exercises.
on the existence of a very specific speech corpus.
A very important factor on the child's approach to the game
Finally, a treatment efficacy study should provide quantitative
proved to be the presence of the clinician. It was clear that
support for the effectiveness of our training system in real-life
children were more comfortable with her being present. Her role
treatment situations. Once developed and verified, the system will
proved to be highly efficient in guiding a child to interpret the
offer clinicians and children a new evidence-based tool to practice
visual feedback of the game and correct the syllable production
speech motor skills in an attractive and playful way.
 Druin, A. “The role of children in the design of new
This research is funded by a Mosaic grant from the Netherlands
technology.” Behaviour & Information
Organisation for Scientific Research (NWO).
Technology, 21(1), 1-25 (2000).
Our kind gratitude goes to the speech therapists of the VSN (The
 Hubbard. “Evaluating computer games for language
Dutch Association of Stuttering Clinics) for guiding our
learning”. Simulation and Gaming 22, 220–223 (1991).
development, and to all the children who are involved in the
 Schiller, N. O., Meyer, A. S., Baayen, R. H., & Levelt, W. J.
M. “A comparison of lexeme and speech syllables in Dutch.”
Journal of Quantitative Linguistics, 3, 8–28 (1996).
 Cholin, J., Levelt, W. J. M., & Schiller, N. O. “Effects of
syllable frequency in speech production”. Cognition, 99,
 Kent, R.D. “Research on speech motor control and its
disorders, a review and prospective.” Journal of
Communication Disorders 33, 391-428 (2000).
 Darby, Mc. , Condron, G., Hughes, G., AugenBlick, N.
“Affective feedback” Submission chapter for a book entitled:
 Ryan, R.M. & Deci, E.L. “Intrinsic and extrinsic motivations:
Enabling Technologies, Elsevier Science Ltd. (2001).
classic definitions and new directions.” Contemporary
Educational Psychology, 25, 54-67 (2000a).
 Donker, A., & Reitsma, P. ” Usability testing with young
children”. Proceedings of IDC 2004. pp 43-48. New York:
 Umanski, D. “Computer game development for speech
ACM Press (2004).
therapy support.” MSc thesis, Leiden University (2006).