Applying Artiﬁcial Intelligence to Virtual Reality: Intelligent
Centre for Virtual Environments
Department of Computer Science
University of Salford
University of Warwick
Salford, M5 4WT
Coventry, CV4 7AL
Reearch into virtual environments on the one hand and artiﬁcial intelligence and artiﬁcial life
on the other has largely been carried out by two different groups of people with different preoccu-
pations and interests, but some convergence is now apparent between the two ﬁelds. Applications
in which activity independent of the user takes place — involving crowds or other agents — are
beginning to be tackled, while synthetic agents, virtual humans and computer pets are all areas in
which techniqes from the two ﬁelds require strong integration. The two communities have much
to learn from each other if wheels are not to be reinvented on both sides. This paper reviews
the issues arising from combining artiﬁcial intelligence and artiﬁcial life techniques with those
of virtual environments to produce just such intelligent virtual environments. The discussion is
illustrated with examples that include environments providing knowledge to direct or assist the
user rather than relying entirely on the user’s knowledge and skills, those in which the user is
represented by a partially autonomous avatar, those containing intelligent agents separate from
the user, and many others from both sides of the area.
In a period when niche areas of cutting-edge technological research are capturing the public imagi-
nation and moving out of the laboratory into everyday life, there is a broad impetus that can be one
of the key ingredients in a recipe for dramatic progress. Though the label dramatic may be regarded
as excessive in a quantitative sense, it is very apt in the literal sense of developments and possibilities
that include new and advanced forms of entertainment, communication and education, for example,
in which users can interact with technology in fundamentally different ways. Indeed, the vision of
exciting applications can be regarded as a driving force behind the premise of this paper that a conver-
gence has begun to take place between branches of advanced computing and research communities
which, until recently, were quite separate — namely Artiﬁcial Intelligence (AI), Artiﬁcial Life (AL)
and Virtual Reality (VR), or, as it is sometimes now known, Virtual Environments (VE). This com-
bination of intelligent techniques and tools, embodied in autonomous creatures and agents, together
with effective means for their graphical representation and interaction of various kinds, has given rise
to a new area at their meeting point, which we call intelligent virtual environments.
A number of factors are allowing the use of virtual environments in AI and AL research just at
the point when the development of particular ﬁelds of exploration, including that of intelligent and
autonomous agents, makes such use an obvious step to take. First, the continuing growth in the
amount of computing power that can be put on a desktop not only supports a much higher degree of
visual realism, but even leaves over a little processing power that can be used to add intelligence. A
second factor relates to the maturing and more widespread availability of 3D graphics software, and
the development of 3D graphics standards such as VRML ’97 (Hartman and Wernecke, 1996). Third,
AI technologies such as natural language processing have matured in parallel with this to the point
where they can be used as a means of interaction with a virtual environment.
At the same time, some researchers in the ﬁeld of virtual environments and advanced graphics
are seeking to progress beyond visually compelling but essentially empty environments to incorporate
other aspects of physical reality that require intelligent behaviour. This may involve populating urban
models with crowds (Musse and Thalmann, 1997) or trafﬁc (Wright et al., 1998), the investigation
of virtual humans (Thalmann and Thalmann, 1998) or virtual actors (Shawver, 1997; Wavish and
Connah, 1997), the creation of virtual non-humans (Terzopoulos et al., 1994) or, at the more abstract
level, the attempt to produce more adequate representations within VE tools for modelling behaviour
and intelligence (VRML Consortium, 1998).
In this paper we discuss the main research areas of this convergence, and begin in the next section
with a consideration of the basic issues underlying the enabling technologies for intelligent virtual
environments. The following three sections focus primarily on agents, ﬁrst considering broad agent
issues, and then examining the particular concerns relevant to the physical and cognitive ends of the
agent spectrum in turn. Though the area of agents is overwhelmingly where most of the effort is cur-
rently being directed, and demands most attention, we also then focus on virtual worlds themselves.
Finally, we conclude with a discussion of the research issues and the problems yet to to be addressed
and consider possible future directions. At each point, we illustrate the issues with reference to sys-
tems and applications.
Virtual environments have at least one thing in common with robotics. This is the need to respect
real-time processing constraints. A VE is a system driven by a rendering cycle that ideally works at
50 or 60 Hertz so that change appears as smooth animation rather than a series of jerks. At frame rates
below 10 Hertz, it becomes impossible to sustain the illusion of physical reality that is so important
to the feeling of presence for the user of the VE.
In a given cycle, the rendering algorithm traverses a scene graph — usually a hierarchical structure
in which all the components of a VE are represented using nodes of different types linked together.
The more complex the scene graph, the longer it takes to traverse, and the harder it becomes to
maintain a high frame rate. Researchers working on the development of VEs are very conscious of
this problem, and much individual effort has targeted the creation of visually appealing components
with as low a polygon count as possible, as well as general mechanisms, such as Level of Detail
(LOD), that allow components to be represented by successively more detailed models as the user of
the VE approaches them.
Just as in robotics, however, adding intelligence potentially steals processing power from the basic
cycle. This is literally true where added intelligence uses the same processor, but distribution onto
extra processors may have the same effect if the parallel processing is not accurately synchronised
with the frame rate. A number of the systems discussed below cannot render in real-time but must
instead render off-line and run subsequently as animations — an approach that precludes the normal
interactive use of the VE. Thus, although the growth in processing power and the development of
improved algorithms makes it possible to add intelligence to VEs, it is still currently inadequate for
many of the systems that are being developed in research laboratories.
Virtual Environment Tools
The combination of techniques necessary to drive this combination of advanced technologies suggests
that, as with the development of other sophisticated systems, tools and development environments can
play a pivotal role in the progress of the ﬁeld. There are several different issues to consider, relating
to the level of abstraction of this support, knowledge representation and the integration of complex
properties, each of which we describe in turn below.
The Level of Support
The development of intelligent VEs is even further constrained by the bias of most generally avail-
able VE toolkits towards visual realism and the graphical support of the VE, rather than towards the
addition of intelligence.
At the lowest level, a system might be developed using, for example, Open GL, or some other 3D
library system and C++ but, as usual, ﬂexibility is traded-off against the time and effort required to
achieve the desired functionality. One example of VE development at this lowest level is the AReVi
toolkit (Reignier et al., 1998) which offers a set of C++ classes built around an agent programming
language, oRis (though the sense of agent is in this context a weak one, and corresponds to the notion
of a programming entity much more than to a virtual agent in the sense discussed below).
At the next level of abstraction, VE toolkits use the scene-graph representation already referred to.
This is a convenient way of representing the graphical aspects of objects, since leaf nodes in the scene-
graph normally represent graphical primitive objects as a collection of polygons. Such primitives are
then grouped into more complex graphical objects using group nodes to which the components are
Incorporating Knowledge Representation
However, if we wish to attach knowledge to objects — and in particular to manipulate objects at a
knowledge level, the scene-graph representation is much less convenient, since it is not always clear
how conceptual objects can be mapped onto collections of graphical primitives. As a consequence, it
has been argued that it is now time for the designers of VE toolkits to consider the incorporation of
explicit knowledge representation facilities (West and Hubbold, 1998), an area in which AI has much
to offer, not least in helping to avoid the reinvention of a large number of knowledge representation
The orientation of existing toolkits to graphical representations and to the visual perspective of
the VE user can be seen in other ways, too. In most toolkits, the object representing the VE user has a
privileged position in the system so that, for example, in VRML’97 the user is provided with automatic
facilities for the detection of collisions with parts of the environments (using bounding boxes), while
this must be explicitly programmed for any other object. Support for animation is widely provided,
but this is oriented towards trajectories calculated by the designer in advance (automatic interpolation
between way-points is usually provided), rather than to the autonomous motion of objects driven by
Indeed, while many VE toolkits provide sensors, these do not correspond to a virtual representa-
tion of the type of sensing that a robot might carry out, but to facilities for detecting user interaction,
such as an alarm when a wall is hit or the generation of events in response to a mouse-click. The
addition of interesting behaviour in the AI sense normally requires direct programming via whatever
language facility the toolkit supports (which is typically C++ in many proprietary toolkits, or Java in
the case of VRML).
VRML’97 has incorporated Script nodes into its scene-graph representation in order to provide
a clean interface to user-added functionality, and this is the standard method for adding behavioural
complexity to VRML applications. However, criticisms of this approach include the amount of inter-
node trafﬁc that is generated for interesting behaviour, and there is at least one proposal for the pack-
aging of a neural net into a VRML node 1 in order to make behaviour more responsive without a large
inter-node routing load.
Interaction with Complex Properties
If we wish to attach more complex properties than visual appearance to objects in a VE, a further issue
arises relating to the ways in which an object and a VE interact. Visual interaction in the classical sense
takes place largely between a VE object and the VE user, and is encompassed by the overall visual
appearance of an object to the user, including texture, lighting effects and level of detail. In a standard
VE, objects only interact visually with each other insofar as one hides another from the user’s view.
However, once more complex properties are introduced, the amount of interaction between VE objects
and between objects and the VE itself, increases sharply. The question that needs addressing is how
far such interactions should be driven by the object and how much by the environment in which the
object is located.
As yet, there is no clear resolution of this problem. One approach is to embed, inside the object,
the properties and the necessary knowledge of how they interact with an environment. For example,
the IMPROV system (Goldberg, 1997) adopts what Goldberg calls inverse causality, and stores ani-
mations of the interaction between an object and a virtual actor within the object, in order to remove
any learning requirement from the virtual actor. Thus, a virtual actor that points at a virtual bottle of
beer is given the option of drinking from it without having to learn the necessary actions for so doing.
This seems a little counter-intuitive when considering the incorporation of aspects of physics —
such as gravity (Aylett et al., 1999) — into a VE. Here, it seems preferable that all objects placed
within a VE should obey whatever physical laws are current, whether this is falling downwards if
unsupported in the case of gravity, or ﬂoating in the case of non-gravity. An example of the same
type might be how a ﬁsh should behave if it is placed in a VE that is not full of water (West and
Hubbold, 1998). One approach might be to provide properties — such as force — that allow an object
to interact in a sensible way with any VE in which it has been embedded. It seems clear that work in
AI on common ontologies may have an important application here.
In conclusion, it must be pointed out that VE toolkits were never intended to support the kind
of functionality discussed in this paper, so that it would have been very surprising if the difﬁculties
identiﬁed here had not existed. The creation of a new generation of VE tools is an enterprise in which
the two communities of AI and VE might proﬁtably collaborate, and work is already in progress
within the VRML consortium (VRML Consortium, 1998) that is considering the future of VRML.
The convergence between AI and AL on the one hand and VEs on the other is nowhere so obvious
as in the area of agents. Autonomous agents, as a research ﬁeld in AI, has burst into a frenzy of
activity in the last ﬁve years or so, as indicated, for example, by the increasing number of workshops
and conferences, and the large number of active research groups (Aylett et al., 1998). We distinguish
here between autonomous agent research and the more general ﬁeld of multi-agent systems (Luck,
1997; Luck et al., 1998). The latter area encompasses distributed problem-solving applications, such
as network management, which do not typically involve VR or virtual environments and has a focus
on inter-agent communication and negotiation that may not be required in the autonomous agents sub-
ﬁeld. In this paper we are focussed speciﬁcally on work using VEs as a technique for exploring agent
behaviour and agent believability, or agents as a way of extending VEs into new application areas.
This includes synthetic agents, virtual actors, virtual humans, as well as avatars (which are physical
representations of users) in 3D multi-user web environments. We start with a consideration of the
nature and role of autonomy, and then examine the range of agents, using emotion, which is discussed
last, as a means of differentiating them. In all this, we discuss work with a starting point in the area
of VEs as well as work beginning from issues in AI.
The notion of autonomy has become increasingly important and studied in relation to agents that must
function effectively and independently in a dynamic environment. A range of work has attempted to
consider many issues concerned with agent autonomy, including its nature, what it entails (Luck and
d’Inverno, 1995), how it may be determined by agent architecture (Castelfranchi, 1995) and the sub-
tleties of its use by different researchers (Franklin and Graesser, 1996), for example. A question that
immediately arises is whether autonomy is useful and appropriate for agents in virtual environments
in the way it is for agents in the real world. In the real world, the environment functions independently
of the agents within it — an individual agent can only perceive part of it (and may be wrong about
what it does perceive) and is subject to independent processes and the activity of other agents. Under
these circumstances, predictions about the world are always likely to be fallible. Autonomy is an
appropriate response because leaving the agent to decide its actions allows it to take account of the
current — rather than a predicted — state of the world.
In a virtual environment, the situation is very different. The designer has a ‘gods-eye’ view of
both the environment and the agent, and need not distinguish between them. Moreover, the whole
environment is available to the agent — there need be no difference between its model of the virtual
world and the virtual world itself. Autonomy might appear a needless overhead from a practical
perspective, and only useful as a basis for more scientiﬁc investigations of agenthood.
However, as discussed in (Petta and Trappl, 1997), the omniscient approach to virtual agents in
fact turns out to be very inefﬁcient. The problem is that if virtual agents are to behave in a way that
is convincing to the user and sustains the feeling of presence in a virtual world, they ought to appear
to have the same limitations as agents in the real world. They ought to seem to collect information as
it becomes available to them and to interact with objects — noticing, avoiding and manipulating —
in a plausible manner. Omniscient agent management soon runs into combinatorial problems when it
must keep track of what each agent is supposed to know and perceive. It is far simpler to equip each
agent with virtual sensors (Thalmann et al., 1997) and use these to autonomously drive their physical
manifestion as virtual effectors. Thus, most of the work concerned with virtual agents follows the
There are other implementation-level advantages of applying the autonomous approach, especially
the potential for reuse of agents in different VEs and the ability to distribute individual agents over
separate processors. However these remain theoretical to a large extent with little evidence of reuse or
distribution in practice, perhaps reﬂecting the current immaturity of the ﬁeld and the diversity of the
problems being tackled. It is also true that while autonomy may be a prerequisite for reusable agents,
it is far from sufﬁcient, with many representational issues of agent, environment and interaction yet to
be successfully tackled.
The Spectrum of Agents
In order to divide the very large number of systems in some tractable way, we imagine a spectrum
of agency. At one end of this spectrum, we place physical agents, by which we mean agents where
the focus is on believable physical behaviour, in a virtual environment. Topics here include realistic
movement and physical interaction with the environment — for synthetic animals as well as humans
— in addition to body language, gesture and facial expression. Such agents normally interact with a
VE through virtual sensors working at a non-symbolic level.
At the other end of this spectrum, we place agents where the focus is on human cognitive behaviour
and on cognitive interaction with the human user of the system. Many of the topics here are related
to natural language and cognitive processes such as planning. Such agents often sense symbolic
information directly from the VE and it is sometimes less obvious how far they can be said to have an
autonomous perceptual apparatus.
We speak of a spectrum rather than mutually exclusive categories because more cognitive agents
usually require some degree of physical interaction with a VE while more physical agents often require
some kind of control at the cognitive level. Indeed, one could characterise work at the cognitive
end of the spectrum as working from cognition outwards and at the physical end of the spectrum as
working from the body inwards. Ideally, virtual agents should have completely realistic movement
and physical interaction as well as human-like cognitive abilities. In reality, both involve solving many
hard problems so that there is a tendency for groups to place their emphasis at one or other end of
the spectrum. This difference of emphasis can be found in a number of speciﬁc topics within virtual
agents, and is excellently illustrated by the issues involved in the key area of emotion, which raises
problems at both ends of this spectrum.
We should note here that work in virtual agents has given a fresh impetus to the whole ﬁeld of moti-
vation and emotion in agents, perhaps for two reasons. Firstly, an embodied virtual agent in a virtual
environment provides many more external channels for the representation of emotional state — gaze,
facial expression, gesture and overall body language — than was the case with disembodied intelli-
gent agents, where language content was just about the only means of expression. Secondly, as seen
below, many virtual agents domains are those in which the expression of emotional state is essential to
the application. Here, the use of avatars in distributed multi-user environments has provided a driving
The emphasis of those working at the cognitive end of the agent spectrum is on emotion as a
cognitive state, while for those working at the more physical end it is on emotion as a bodily state.
Note that by this we mean the internal modelling of emotion, rather than its external expression.
These two approaches reﬂect a long-standing debate within psychology itself (Picard, 1997) and can
be traced back as far as the separation of body and mind by Descartes.
The more long-standing approach of cognitive modelling (Ortony et al., 1988; Frijda, 1987), has
the advantage that the agent is always in an explicitly deﬁned emotional state or states, giving a clear
linkage to the external manifestation of that emotion. However, at a more physical or behavioural
level, emotion is produced by the working of lower level structures. The simplest — but rather crude
— way of modelling emotion at such a low level is to equip the agent with meters that are incremented
or decremented according to interaction with the environment, with other virtual agents or with a
human user. A more sophisticated and realistic approach at this level is to model an endocrine system,
as in Creatures (Grand et al., 1997), with chemical emitters and receptors (Canamero, 1998). Emotion
is then manifested as part of the overall interaction of the agent with its environment rather than
being modelled as a cognitive state, and much work in this area has been done by those attempting to
construct animats or artiﬁcial animals, such as (Schnepf, 1991) and (Donnart and Meyer, 1994).
In this section, we consider virtual agents where physical behaviour is seen as the key issue. Such
agents need not be human in form: they could be abstract (Sims, 1995) or mechanical (Prophet,
1996)2, they could be animals such as birds (Reynolds, 1987), ﬁsh (Terzopoulos et al., 1994) or
dolphins (Martinho et al., 1998), or they could be ﬁctional, such as Teletubbies (Aylett et al., 1999).
Human forms (Badler et al., 1993) are, of course, also common, whether as virtual actors (Shawver,
1997; Wavish and Connah, 1997), virtual humans (Thalmann and Thalmann, 1998) or avatars (Damer,
1998) in web-based multi-user virtual environments. In all these cases, common issues have to be
faced. We ﬁrst review the broader issues involved in physical agents before examining in more detail
how bodies can be animated, using two speciifc examples as illustrations. Then, we consider the
possibilities for non-verbal communication with physical agents, and end the section by discussing
the issues relating to physical agents in their interaction with their environments.
Firstly, body movement and mobility must be handled, often raising important issues of body struc-
ture. This suggests that ideas surrounding the notion of embodiment may be as signiﬁcant for virtual
agents as they are for real agents. Once a sophisticated physical representation can be controlled, there
is the opportunity to use it for non-verbal communication, including gaze, facial expression, gesture
and overall body language.
Secondly, once agents have mobility, they must be able to avoid undesired collisions with other
objects in their environment, whether these are stationary, as in trees, buildings and furniture, or also
moving, as with other agents. Indeed the introduction of other agents leads to concerns of social
movement such as herding, ﬂocking (Reynolds, 1987) or crowd motion (Musse and Thalmann, 1997),
which brings still further problems to be tackled. Other kinds of interaction involving contact with
the environment, beyond mere collisions, must also be considered, such as the grasping of objects, or
physical interactions with other agents ranging from hugging to eating.
Finally, given a sophisticated physical representation and a repertoire of physical behaviours to go
with it, a number of control issues arise. For example, the level at which control should be exercised
must be determined, and includes possibilities of control at the level of individual muscles, or at the