Context and Situation as Enablers for Multi-Modal
Interaction in Mobile Games
University of Bonn
University of Bonn
Armin B. Cremers
University of Bonn
University of Bonn
other context as well as social interaction to create games
with rich experience. This market share has recently shown
Location-based mobile multiplayer games can oﬀer a unique
the biggest growth. Hence, the development of such games
and intensive game experience. Three spheres of experience
will be an issue of growing importance in the next years.
merge: social interaction, virtual game world, and the real
location. While the potential is huge the main obstacle to
a seamless and enjoyable experience is the user interaction
GAMES IN MOTION
with the mobile system. The problem is not the sheer lack
Location-based multiplayer games use the location of play-
of interaction techniques, but the assignment of appropri-
ers, but the way they use motion is diﬀerent. Explorative
ate modality in diﬀerent game situations. The main state-
games or location-based puzzles can have a rather casual
ment of this paper is: multimodality alone is not enough to
attitude towards location. In this paper we focus on those
cope with mobile gaming. Context-awareness that lead to
games where location and movements are of major impor-
situation-speciﬁc interaction modes may come to the rescue.
tance. To characterize this family of games we give a deﬁn-
This paper discusses the requirements of modality selection
ition of motion intensive games here:
against the background of location-based multiplayer games
and illustrates the correlation to context-awareness and sit-
Motion intensive games depend on the move-
ments of the players, taking place in the physical
reality and inﬂuence the game in real time.
In the last years, mobile phones moved from simple voice-
Typical exponents of motion intensive games are rallies,
and text-handsets to small computing platforms. With this,
hide-and-seek games, etc. They are characterized by con-
the possibilities for enhanced entertainment options have in-
tinuously moving players, frequently interacting with the
creased: photos, video, music – everything on the go. Since
virtual game world and communicating with teammates or
the advent of the Java-based phones a vivid and growing
opponents. We currently develop Scotland Yard to go, an
mobile games market has evolved. A recent study by Ju-
adoption of the classical board game Scotland Yard 2 where
niper research1 expects the mobile games market to reach
a group of detectives works together to catch Mr. X in the
10 billion US-$ by 2009 and more than 460 million mobile
city of London.
In the original board game the focus is
users are expected to download games for their devices in
on strategic pondering about diﬀerent theories regarding
that year. When the iPhone AppStore launched on July
Mr. X’s current location which is only revealed every 5th
10, 2008, more than 160 out of 500 applications provided
round. Movements are round-based and only symbolic in
terms of relocating tokens on a board. In contrast, Scotland
Currently most of the discussed mobile games focus on a
Yard to go is played in the real, supported by wirelessly
casual single player play mode, but powerful location-aware
networked mobile computers.3
mobile devices enable more games that exploit location and
So, what speciﬁc requirements do these games impose
on HCI? They often feature situations which request time-
critical game-interaction. In such situations the game activ-
ities require the player’s full attention and his capabilities
to interact with the device are very limited. As an example,
during a hide-and-seek game the ﬂeeing player can not in-
Permission to make digital or hard copies of all or part of this work for
teract with a computer by typing while running ‘for his life’.
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
2Developed in 1983 by Ravensburger Spieleverlag, English
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
edition published in 1985 by Milton Bradley/Hasbro
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
The current version of Scotland Yard to go demands
HCI September 1-5, 2008 Liverpool, Great Britain
UMPCs running Windows XP. The next version will use
Copyright 2008 ACM X-XXXXX-XX-X/XX/XX ...$5.00.
Even observing a small display while moving seems unfeasi-
are no eﬀectors available on the market today4. Practically
ble (more detailed examples are presented in section 5). The
it already works, e.g. when you notice the warmth of your
lack of attention the user can aﬀord for the interaction with
computer and get alert that it did not switch oﬀ correctly.
the computing device has to be considered when designing
But you can use it only in very special situations, e.g. in a
an intuitive controllable motion intensive game.
crime thriller game as a clue. Nociception has already been
used by computer game designers5 but cannot be stimulated
with standard mobile handsets. Equilibrioception might be
stimulated by certain optical eﬀects, but no studies which
SENSING IN MOTION
apply this on a mobile device are known to the authors. We
discuss a possible example in section 5.5.
Moving needs vision, hence while we move our vision is
highly stressed. Consequently visual interaction with a dis-
play is limited or nearly impossible when running. Semi-
ACTING IN MOTION
transparent headmounted displays or retina projectors are a
Direct manual interaction is the standard input for com-
potential solution as they can display information overlaying
Examples are keyboards for text input,
the real ﬁeld of vision. Unfortunately, they are not widely
hardware buttons for special functions, as well as touch-
available soon so we can not rely on them. Therefore, game
screens with arbitrary virtual widgets. Although hand-held
design for motion intensive games should not rely solely on
touchscreens in general allow for intuitive usability, they are
visual interaction in situations where motion is crucial.
not always appropriate for players on the move.
Audition is not as much stressed by movements as vision.
Large-scale motion is an inherent interaction mode for
Although in some situation audible capacity is reduced too
location-based games. Hence the computing system already
(e. g. while talking face to face with teammates, or due to
has to support location sensing and processing. Beyond po-
background or traﬃc noise), it can take the pressure oﬀ the
sition a fruitful context parameter to measure is orientation.
players visual attention. Even though information transfer
Digital compasses are available either as separate small sen-
is limited by a narrow bandwidth and pure sequentiality,
sors or integrated in mobile phones as the Nokia 5140.
acoustic signals are valuable as they can be noticed in par-
Gestures based on small-scale body motion can be used
allel to other audible perception and thus are relative inde-
in diﬀerent ways for input. (1) the device can be moved sys-
pendent of the player’s situation. Emotional inﬂuence by
tematically, e. g. tilting or shaking the device can be sensed
sounds and music is a speciﬁcity of the media that can be
by accelerometers and used to detect gestures. Stegmann
used to enhance the game experience and communicate the
et al. discuss in  how they successfully apply motion con-
course of the game on a high and rather vague level. Brew-
trol and other modalities in the course of the MediaScout
ster gives in  an overview on using audio for non-speech
project of the Deutsche Telekom Laboratories to enhance
output. There are plenty of studies attesting the inﬂuence
media access. The implementation of some basic gestures
of music on perception and performance that could be ex-
for using a mobile phone as virtual music instrument is de-
ploited for motion intensive as well as dramatists games.
scribed in . (2) mobile devices could be used as pointing
Tactition can also be used for interaction. A vibrating
devices in the real environment as Simon et al. discuss in
device for example can communicate signals in parallel to
. Based on such reality pointers gestures could also be
other communication, while the player is moving. There are
perceived in the context of the ambience.
some special eﬀectors like vibration belts or tactile interact-
Audio inputs can be used in terms of voice control (active
ing shoes that transfer more complex information like the
or interactive) or noise sensors. Speaker independent speech
direction, but they are far from being standard equipment.
recognition is feasible when the vocabulary is small enough.
Nevertheless, the standard mobile device contains a sim-
Audio commands can be recognized on the handset, but
ple vibration eﬀector that can be modulated in frequency,
due to the limited resources of mobile devices, true voice
sometimes even in intensity. We envision to apply vibration
recognition requires redirecting a powerful server as with the
frequency to communicate vicinity. Hemmerl et al. already
VLingo system. For most games a limited set of commands
used it to emit a continuous ’sign of life’ to communicate the
is already very helpful and allows even for steering as for
overall status of the mobile phone. Nevertheless, vibration
example the Vocal Joystick project shows, c. f. .
eﬀectors can only transfer very few information and their
perception is unreliable.
In addition to tangible output, contact diﬀerentiation also
SOME TYPICAL GAME SITUATIONS
plays a major role in input interaction. A perceptable diﬀer-
In the following we present examples of reoccurring per-
ence for diﬀerent input modes (as known from the ‘F’ and
tinent game situations that can be supported by certain
‘J’ keys on a standard PC keyboard which support touch
modality mixes. Some of these scenarios have already been
typing by small raisings) can help to interact with a mo-
implemented and tested, others are planned and some not
bile device in situtions where vision controlled input is not
realistic yet and need further research. The scenarios are
applicable. Tactons (tactile icons) are a concept for tactile
analyzed with respect to determining contexts.
displays developed by Brewster, Brown and others, c. f. .
Beside these particular conditions, the game itself is a
Even systematic deformation of tangible devices can be used
very special context that inﬂuences the adequateness greatly.
to increase the information transfer. First prototypes of dy-
This holds in particular when mobile phones are used and
namic knobs have been developed (c. f. ) but they are too
far from maturity to consider them here in more detail.
4The heat which some mobile devices, especially ﬁrst-
Other human senses can currently not be used in prac-
generation UMPCs, produce when running is currently not
tise. Smell and taste can not be reached by computing de-
directly controllable with an API.
vices. Thermoception could theoretically be used but there
the user is accustomed to communication-speciﬁc interaction
Hearkening and Tiptoeing
modes, as the in game communication situation illustrates.
When hiding is an issue, hearkening and tiptoeing are
On the Move
natural behaviors. In a computer supported mobile game
this could be integrated in two diﬀerent ways: simulating
As we pointed out in section 3, motion limits visual per-
the situation and supporting the real situation.
ception, thus an adaptation towards auditive and tactile in-
Let’s ﬁrst look at hearkening and tiptoeing as a virtual
teraction is meaningful. Headsets and vibration units are
situation. We can simulate the situation even while players
widespread but typically used for passive phone application
are in reality too far away. The pace of the hiding player
only which is achieved with one button on the headset.
can be sensed by accelerometers. In case he moves too fast,
For game interaction normally a larger set of commands
the noise of his paces can be simulated and communicated
should be instantly accessible.
This demands further in-
to the other players. For many smartphones software that
teraction than passive phoning does.
tracks the pace is readily available (c. f.  and ). Second,
already being used for the selection of commands, e. g. the
if the situation occurs in reality the interaction modality
UTI project at T-Labs where tilting a remote control is used
should be adapted. When Mr. X is tiptoeing because some
to browse through a menu hierarchy (c. f. ). This appli-
detectives are very close to him he is careful not to make a
cation relies on visual feedback. In our scenario auditive or
noise. In this situation audio would be a misplaced media –
tactile feedback would be more appropriate. A limited num-
even if he was running shortly before and therefore auditive
ber of commands can be used with tactile feedback only as
interaction was appropriate.
illustrated in ﬁgure 1. For complex command sets like menu
structures, audible feedback for the selection of certain com-
mands is an option.
Oral communication is powerful and important. In case
of Scotland Yard to go the detectives cooperate intensively,
therefore conference calls are vital for the game experience.
In some situations, Mr. X can apply wiretapping to be in-
formed about the most recent theories and plans of the de-
tectives. This is crucial for a rich game experience, since one
of Mr. X’s aims is to lead the detectives up the garden path
and obtain himself the possibility to run away unnoticed.
We think about adapting the typical phone user interac-
tion in these cases by (1) using a game-speciﬁc ring tone
or vibration pattern; (2) in-game-calls can be automatically
accepted without pressing a button.
Figure 1: Tilt the device in diﬀerent directions to
Clues and Disturbing Puzzles
select commands or browse menus.
In adventure-like location-based games, clues are regular
means to lead the players towards their goal. In motion-
In the case of Scotland Yard to go the set of commands
intensive games as for example in hide-and-seek games it
that are likely to be demanded by a detective on the move
could be exciting to combine the discovery of the helpful
contains for example establishing a conference call to all de-
clue with a disturbance of the player’s equilibrioception. In
tectives, communicating the fact that Mr. X is very close,
such a case, the embedding into the right situation is crucial
asking the other detectives to help and come close.
to gain eﬀective game experience. Assumed a player needs
to solve a puzzle to get a hint for Mr. X’s current hiding
Approaching a Suspense Climax
position. Figure 2 illustrates6 how such a puzzle could look
In games with multiple parallel threads of action often un-
like. Such a puzzle could be placed best immediately before
foreseen climacteric situations emerge. Multiplayer games
the player has to move on chasing Mr. X.
like Scotland Yard to go fall into this category and the rel-
ative location of detective and Mr. X often leads to critical
situation where his arrest hangs by a hair. Two urgent de-
mands occur now conjointly: movability and awareness of
the mutual opponents. Conventional user interfaces do not
suit these needs in common. A ﬂeeing Mr. X is neither able
to observe a display nor can he trigger a rescuing action by
typing a button (e. g. throw a virtual "fog bomb" to be out
of the detective’s displays for some minutes).
We aim at a solution by modulating the frequency or
Figure 2: Count the straight lines to see Mr. X.
vibrations or acoustic signals. This has already been ap-
plied successfully in the Ambient Life project to continu-
ously communicate the state of a phone by life-like signals,
c. f. . As input triggering an important action gestures
Surrounded by the Game
performed with the device as a whole can be recognized
Often the relative position of a player to others or to game
through accelerometers. This sort of interaction seems to be
artifacts is crucial. In Scotland Yard to go, Mr. X is often
much more appropriate and easy to manage in the presence
of a hectic atmosphere and appears much more intuitive.
6Picture in ﬁgure 2 taken from www.eyetricks.com.
surrounded by detectives. It would certainly be a great ex-
ticular when they are motion intensive. Awareness about
perience to hear their voices or steps in the according direc-
the player’s context and gaming situation is a key to select
tions and an intensity representing their distance.
the appropriate interaction mode. We discuss how sensing
Such a scenario is at least partially realizable. First of all
and acting relates to the player’s context and show examples
beside the position of all involved players the orientation of
for candidates of game design patterns that can be applied
Mr. X could be tracked by a digital compass as integrated for
in speciﬁc situations. We conclude that context-sensitivity
example in the Nokia 5140. The harder part is the sound
should be an integral part of game design methodology.
output. The technique of in-ear recording with a dummy
We developed this paper based on our experiences in de-
head is established since decades. Perception requires only
signing Scotland Yard to go, a pervasive hide-and-seek game.
usual earphones and recording is also easy as long as the
Further we took a look at diﬀerent game families as pervasive
relative locations of output and input are static as typically
adventures, explorative games and rallies and deﬁned the
the case when visiting a concert or stage play.
category of motion intensive games to focus on the charac-
Such a static setting reduces the applicability in location-
teristic that makes intuitive usability so diﬃcult to achieve.
based games drastically. Nevertheless, for some games situ-
Our suggestion is to apply multimodality in innovative ways
ations static orientation of a player and its relative location
and parallel it with context-sensitivity. This can be done in
to other players and artifacts can be preplanned in advance
a predeﬁned manner at game design time or in a reﬂective
or do not have to be represented exactly but can be ab-
way at runtime or a mix of both.
stracted to a rather symbolic position. For example, when
Mr. X is surrounded by detectives their distance can be com-
municated adequately by prefabricated surround sound. We
 J.-P. Bichard and A. Waern. Pervasive play,
expect the drawback of the inaccurate position representa-
immersion and story: designing interference. In
tion to be negligible for the game experience.
DIMEA, Athens, Greece. ACM, September 2008.
Furthermore, we suggest to expend research eﬀorts in or-
 S. Brewster. Nonspeech auditory output. In Sears, A.
der to render prerecorded sounds and life audio streams ac-
and Jacko, J. (Eds.) The Human ComputerInteraction
cording to dynamic positions. The SWAN project already
Handbook, chapter 13, pages 247–264. Lawrence
has developed a prototype using acoustic signals for auditive
Erlbaum Associates, USA, 2nd edition, 2008.
navigation (c. f. ). If this would be technically feasible
 S. Brewster and L. Brown. Tactons: structured tactile
for voice or music and in particular for live streams, even
messages for non-visual information display. In
low sound quality would create a stunning experience and
Proceedings of Australasian User Interface Conference,
allow for completely new game design and interaction.
pages 15–23. Australian Computer Society, 2004.
 Edovia Inc. Steps. www.edovia.com/steps/, July 2008.
CONTEXT AND SITUATION IN GAMING
 G. Essl and M. Rohs. Shamus - a sensor-based
Bichard and Waern discuss the design of game activities
integrated mobile phone instrument. In Proceedings of
against three main styles of games: gamist games that rely
the International Computer Music Conference, 2007.
mostly on quests and puzzles, simulationist games where
 S. Harada, J. A. Landay, J. Malkin, X. Li, and
the player immerses with his own playful character into an
J. Bilmes. The vocal joystick: Evaluation of
evolving game story, and dramatist games that prescribe
voice-based cursor control techniques. In Proc. of
most of a story and let the player "listen" to the game. In
ASSETS, Portland, Oregon, USA, October 2006.
 they present the dramatist game "Interference" which
 F. Hemmert. Ambient life: Calm and excited
could be characterized as a pervasive adventure game.
pulsation as a means of life-like permanent tactile
If we follow this classiﬁcation it seems reasonable that for
status display in mobile phones. In Proceedings of the
gamist and dramatist games many situations can be pre-
Design & Emotion Conference, Hong Kong, 2008.
designed in a screen-play. Those scenes could be supported
 F. Hemmert, A. Knörig, G. Joost, and R. Wettach.
by preplanned speciﬁc user interaction modalities.
Dynamic knobs: Shape change as a means of
dramatist games often require a holistic design approach
interaction on a mobile phone. In CHI 2008
that resemble more drama authoring than technical event
Proceedings. ACM, April 2008.
ﬂow design. However, for scavenger hunt games and alike
 O. Kirkeby and M. Kähäri. Activity Monitor. Nokia
there are already some editors available (c. f. GPSMission7
Research Center, 2008.
by Orbster or MediaScape8).
 H. Mügge, T. Rho, and A. B. Cremers. Integrating
Simulationist games such as Scotland Yard to go do not
aspect-orientation and structural annotations to
predeﬁne the story in such detail. To the contrary, some
support adaptive middleware. In Proceedings of the 1st
characteristic situations evolve during the play. Hence pos-
workshop on Middleware-application interaction: in
sible adaptations of the user interaction modes can only
conjunction with Euro-Sys 2007. ACM, 2007.
rely on situation detection at runtime. In the course of the
 R. Simon, H. Kunczier, and H. Anegg. Towards
project Context-Sensitive Intelligence we developed a proto-
orientation-aware location based mobile services. In
typical architecture for such runtime adaptations (c. f. ).
3rd Symposium on LBS and TeleCartography, 2005.
 J. Stegmann, K. Henke, and R. Kirchherr. Multimodal
CONCLUSION AND OUTLOOK
interaction for access to media content. October 2008.
This paper argues that multimodal and intermodal inter-
 B. N. Walker and J. Lindsay. Navigation performance
action alone cannot cope with location-based games in par-
with a virtual auditory display: Eﬀects of beacon
sound, capture radius, and practice. Human Factors,