A Game Pidgin Language For Speech Recognition In
Computer Games
Tarashankar Rudra
Manolya Kavakli
Terry Bossomaier
Charles Sturt University
Charles Sturt University
Charles Sturt University
Bathurst, NSW
Bathurst, NSW
Bathurst, NSW
Australia -2795
Australia -2795
Australia -2795
trudra@csu.edu.au
mkavakli@csu.edu.au
tbossomaier@csu.edu.au
ABSTRACT
curiosity. “One of the inspirations for thinking about CPL as
an approach to spoken language recognition is observing the
Today very little progress has been made in the field of
evolution of handwriting recognition.” [7].
Computer Science towards the development of robust Natural
Language Processors. Developing a Computer Pidgin
Computer games use ’barks’, which is slang developed for the
Language (CPL) with limited vocabulary and simple set of
communications between game agents. The GPL we have
grammatical rules can be an effective approach to tackle the
developed will enhance the group bonding especially i n
problem of humans interacting with computers. A CPL [7] is a
multi-player games.
new spoken language, which is taught to the user and i s
efficient for dialogues with the computer. In this paper an
Speech recognition, or speech-to-text conversion involves
attempt has been made to develop a Game Pidgin Language
capturing and digitising the sound waves, converting them
(GPL) with limited vocabulary, grammar and syllables for use
into basic language units or phonemes, constructing words
with speech interactive Computer Games. The GPL i s
from phonemes, and finally contextually analysing the words
illustrated with a set of vocabulary that is designed t o
to ensure correct spelling for words that sound alike (such as
optimise the efficiency of automatic speech recognizer (ASR).
site and sight). The reverse process takes place in Speech
synthesis or text-to-speech conversion.
Keywords
Due to the absence of systematic cues marking word
Pidgin languages, Information theory, Speech Recognition,
boundaries in continuous speech [2], we will incorporate short
Extensible Markup Language.
but maximum meaning bearing words with minimal syllables
and grammatical constraints in the new language. Since
Computer Games are primarily a source of fun and
1. INTRODUCTION
entertainment, our attempt is to create funny sounding but
The earliest record of the use of “pidgin” languages dates back
quickly memorisable words.
to the Middle Ages when the European crusaders and traders of
In this paper, we use Information Theory [15] to analyse four
the eastern end of Mediterranean used it. Due to the dominance
aboriginal languages and a sample pidgin language that we
of French among the crusaders the language was known as
have developed.
Lingua Franca, which denotes any language that is used as a
medium of communication among people having no other
Our hypothesis in this study is to show that the difference i n
language in common. When a speech community endorses a
entropy and perfect information content is minimum and this
pidgin language, it becomes a Creole. For a language to be
characteristic makes it rich in vocabulary and simple i n
pidgin, it must satisfy two conditions [5] –
grammar.
1) It must have a sharply reduced grammatical structure and
Information Theory, a branch of probability has two primary
vocabulary.
goals [3]–
2) The language must be native to none who use it.
1 ) Development of fundamental theoretical limits on the
achievable performance when communicating a given
The idea for the simplification of natural language has been
information source over a given communication channel
envisaged for the communication of complex concepts with
using coding schemes from within a prescribed class.
the help of simple expressions. Currently controlled language
applications [e.g., 4, 13] have been developed for
2) Development of coding schemes that is reasonably good
computational linguistic. Most controlled languages are for
in comparison with the optimal performance given by the
people whose profession is to write and who can be trained i n
theory.
the use of this new language. We attempt to design a language
Entropy provides the information of a random process about
for speech interactive computer games. We use a simplified
itself and measures the information content or uncertainty of
language because we want to minimize repeated words.
‘x’.
“CPL or Computer Pidgin Language is a radical departure from
Entropy (H(X)) is given by –
the normal approach to Speech Recognition Systems. CPL i s
inspired by a frustration at a perceived lack of progress i n
Formula - I: H(X) = - _ x_Ax P(x)*Log2(P(x)) [3].
Spoken Language Research over the last 20-30 years” [7].
Hinde & Belrose believe that systems that only understand
Here an ensemble ‘X’ is a random variable ‘x’ with a set of
people 85% of the time are hardly usable, so speech
possible outcomes, Ax = {a1, a2, …… ai}, having probabilities
recognition is very much a last resort technology or a
9 0
Px = {P1, P2, …… Pi}, with P(x= ai) = Pi, where Pi > 0 and _x_Ax
b) Test paragraph: The Guugu Yimithirr people lived i n
(Px) = 1.
Northern Queensland, in an area northwest of Cooktown,
In our study, Ax is a set of distinct words appearing in the
where Captain Cook first landed in Australia and first met
paragraph, the set Px consists of probability of occurrence of
an aboriginal language. So, the first aboriginal language
distinct words in the paragraph.
they heard was Guugu Yimithirr. The remaining Guugu
Yimithirr live now in Hopevale, and there around 100
Perfect Information content (H0(X)) is a lower bound for the
speakers of the language left. Names correspond t o
number of binary questions that are guaranteed to identify the
traditional areas where the features named by colonialists
outcome. It is given by –
appear in maps.
c) Total number of words: 68
Formula - II: H0(X) = Log2| Ax| [12].
d) Number of unique words: 47
In this paper we have focused on developing a framework of a
e) Occurrence of distinct words and their probabilities in b) are
Computer Pidgin Language (CPL) illustrated with a set of
listed in table-1 -
vocabulary for speech interactive computer games. The
emphasis of the vocabulary is on having minimum difference
Table-1
between Entropy and Perfect Information content of the
Word
Occurrence
Pi
language. Our endeavor is also to have much lower value of
The
5
0.0735
Entropy of GPL than English. We have developed an
Extensible Markup Language (XML) called GPLXML to
Guugu
3
0.0441
structure our grammar and demonstrated its use with an
Yimithirr
3
0.0441
instance. An XML defines a set of rules, which identifies how
we can define tags that separate a document into individual
people
1
0.0147
parts and subparts [17].
lived
1
0.0147
2. INFORMATION CONTENT IN THE
i n
5
0.0735
USE OF ABORIGINAL LANGUAGES
Northern
1
0.0147
There were approximately 200 aboriginal languages i n
Queensland
1
0.0147
Australia, out of which nearly 100 are in use [1]. The
characteristics of Aboriginal languages are [8] -
an
2
0.0294
1) They are rich in vocabulary.
area
1
0.0147
2 ) Complex words can be formed by compounding,
Northwest
1
0.0147
reduplication or by using suffixes.
3) They normally do not have sounds of f, v, s, z and sh.
Of
2
0.0294
4) They have terms for every species of animal and plant i n
Cooktown
1
0.0147
their environment.
5) They have elaborate vocabulary in the area of kinship.
where
2
0.0294
6) They have complex syntax and word building processes.
Captain
1
0.0147
7) They tend to have similar sets of speech sound and share
Cook
1
0.0147
numerous grammatical features but differ greatly i n
vocabulary.
first
3
0.0441
8) All of them usually have the sounds of p, b, t, d, k and g.
landed
1
0.0147
In this paper, our purpose is to focus on information content
Australia
1
0.0147
related to semiotics rather than phonemes. The first four
and
2
0.0294
characteristics of Aboriginal languages prompt us t o
incorporate some of their vocabulary in our GPL along with
met
1
0.0147
others inherited from different languages of the world.
Aboriginal
2
0.0294
Reducing the number of phonemes has considerable
language
3
0.0441
advantages for speech recognition. Recent work suggests that
languages such as Italian, which are largely phonetic and have
So
1
0.0147
fewer phonemes than English, have a much lower rate of
they
1
0.0147
dyslexia [6]. For our purposes we can make recognition faster
heard
1
0.0147
and more robust. However, the focus of this paper is not on the
phonetic representation as such, but the semiotic qualities of
was
1
0.0147
the pidgin language. There are many ways of calculating the
Remaining
1
0.0147
entropy of a block of text. Words have strong serial
correlations, which affect the joint entropy. In this first study
live
1
0.0147
we have elected to look at word frequencies only.
now
1
0.0147
An example showing the calculation procedure for extracting
Hopevale
1
0.0147
H(X), H0(X) parameters are given below, the data is listed in
Table 3 –
there
1
0.0147
a) Language 1: English
9 1
Around
1
0.0147
Lehenbizikoz
1
0.0179
100
1
0.0147
jarri
1
0.0179
speakers
1
0.0147
zuen
1
0.0179
Left
1
0.0147
tokian
2
0.0357
Names
1
0.0147
Beraz
1
0.0179
Correspond
1
0.0147
Hizkuntza
1
0.0179
To
1
0.0147
hau
1
0.0179
Traditiona
1
0.0147
izan
2
0.0357
areas
1
0.0147
zuriek
1
0.0179
features
1
0.0147
entzun
1
0.0179
named
1
0.0147
zuten
1
0.0179
By
1
0.0147
legena
1
0.0179
colonialists
1
0.0147
Oraingo
1
0.0179
appear
1
0.0147
jendea
1
0.0179
maps
1
0.0147
Hopevale
1
0.0179
deritzon
1
0.0179
f) Entropy: 5.32, by applying formula I.
dira
2
0.0357
g) Perfect Information: 5.55, by applying formula II.
eta
1
0.0179
hiztunak
1
0.0179
a) Language 2: Guugu Yimithirr
100
1
0.0179
b) Test paragraph: Guugu Yimithirr herria bizi zen Queensland
iparraldean, Australian, Cook Kapitainak lur hartan oina
Inguru
1
0.0179
lehenbizikoz jarri zuen tokian. Beraz, hizkuntza hau izan zen
litezke
1
0.0179
Australian zuriek entzun zuten legena. Oraingo Guugu
Gure
1
0.0179
Yimithirr jendea Hopevale deritzon tokian bizi dira, eta
hiztunak 100 inguru izan litezke. Gure zerrendako izenak
Zerrendako
1
0.0179
dira eremu tradizionalen izenak, non kolonialistek
izenak
2
0.0357
izendatutako tokiak ageri diren mapetan.
c) Total number of words: 56
eremu
1
0.0179
d) Number of unique words (UW): 47
Tradizionalen
1
0.0179
e) Occurrence of distinct words and their probabilities in b) are
non
1
0.0179
listed in table-2 -
Kolonialistek
1
0.0179
Table-2
Izendatutako
1
0.0179
Word
Occurrence
Pi
tokiak
1
0.0179
Guugu
2
0.0357
ageri
1
0.0179
Yimithirr
2
0.0357
diren
1
0.0179
mapetan
1
0.0179
herria
1
0.0179
bizi
2
0.0357
f) Entropy: 5.49, by applying formula I.
zen
2
0.0357
g) Perfect Information: 5.55, by applying formula II.
Queensland
1
0.0179
Analyzing the data of Table 3 to 6, we find that our GPL (table
Iparraldean
1
0.0179
8) is most suitable as a new pidgin language for games and
Australian
2
0.0357
Guugu Yimithirr is least suitable.
Cook
1
0.0179
The following tables provide the statistical measures of
Kapitainak
1
0.0179
information in Aboriginal languages [1] based on the
lur
1
0.0179
principles of Information theory.
hartan
1
0.0179
In the following tables, UW = unique words; H = entropy; H0 =
oina
1
0.0179
perfect information.
9 2
Table-3
Table-7
English
Guugu Yimithirr
Word
Meaning
Type
Phrase
Source
Words
UW
H
H0
Words
UW
H
H0
Ma
I, my, me, myself
N
6 8
4 7
5.3
5.5
5 6
4 7
5.4
5.5
Gaba
Man
N
Wiradjuri
2
5
9
5
Tum
You, your
pron
Hindi
Bingo
Expression of joy
V
Y
English
Table-4
Fat
Expression of
V
Y
despair
English
Arabana-Wangkangurru
Words
UW
H
H
Aka
Master
N
Arabic
0
Words
UW
H
H0
Kid
an inferior
N
slang
5 7
4 7
5.4
5.5
3 6
3 3
5.0
5.0
Goofy
stupid
Adj
slang
0
5
0
4
Yea
Yes
N
English
Nay
No
N
English
Table-5
Wa
What, when,
V
where, which, how
English
Dyirbal
Words
UW
H
H
Wana
do you want?
V
Y
0
Words
UW
H
H0
Wata
What is it?
V
Y
9 3
6 8
5.8
6.0
6 4
5 7
5.7
5.8
Wamba
Who is it/he?
V
Y
Kamilaroi
4
9
6
3
Wara
Where is/are
V
Y
it/you
Table-6
Waay
look out
V
Y
Wiradjuri
Birra
move/went away
V
Wiradjuri
English
Yagara / Yugambeh
Words
UW
H
H
Gaja
get away
V
Y
Wiradjuri
0
Words
UW
H
H0
Gaa
to take
V
Y
Kamilaroi
5 9
5 2
5.6
5.7
4 9
4 1
5.2
5.3
Eta
it/he/she is a
Pron
Russian
3
0
6
6
Bom(b)
Bomb
N
Buma
to hit, kill
V
Y
Kamilaroi
3. MODEL OF A PIDGIN LANGUAGE
Samba
dance
N
Portuguese
Jet
Aircraft
N
3.1 GPL Vocabulary
Noka
Ship
N
Bengali
Advantages of GPL in computer games are –
Villi
Enemy
N
1) The more limited vocabulary and simple grammar makes
speech processing across many ethnic backgrounds much
Buddy
Friend
N
English
easier.
Limbo
trouble
N
slang
2 ) Games move fast, and short simple utterances are more
Gali
Water
N
Kamilaroi
appropriate.
Kola
drink
N
3) Peer groups and sub-cultures love to have their own "cool"
Rivi
River
N
vocabulary. An extension to the work we discuss will look
at adaptive mechanisms for modifying the language by the
Croc
Crocodile
N
English
game players themselves.
Duma
House
N
Russian
4) It has great significance in building the cognitive models
Humpy
shelter
N
Gunyah
for the animats in the game. The mini-language defines
Dud
dumb, slack
Adj
English
the cognitive framework within which they operate and
Gubi
swim
V
Y
Kamilaroi
determines also the sophistication (and feasibility) of
their world view.
Guju
jump
V
Y
Gumo
Climb
V
Y
The words in the sample GPL dictionary has been compiled
Zoom
to fly, drive, run,
V
Y
English
from words in English, Australian Aboriginal languages and
fast movement
some International languages, they are –
Yami
Food
N
Arabic, English, Bengali, Gunyah [14], Hindi, Kamilaroi [11],
Yaki
dirty: adj
Adj
Portuguese, Russian, Slang English, Wiradjuri [9].
Wee
sit
V
Y
Wiradjuri
The sample GPL dictionary has the following words –
Kam
Work
N
Here N = noun; Pron = pronoun; V = verb; Adj = adjective.
Jumbo
huge
Adj
English
Dagi
to pierce with
V
Y
sharp object
i (é)
Is
V
9 3
The dictionary we have developed is an example that
<!ELEMENT grammar (nouns, verbs, adjectives,
illustrates the idea of developing a Pidgin language for Game
pronouns)>
play.
<!ELEMENT nouns (noun)+>
3.2 GPL Grammar
<!ELEMENT noun (word, meaning, feature+)>
The GPL has a very limited set of grammatical rules. It
<!ELEMENT verbs (verb)+>
primarily has eight rules –
<!ELEMENT verb (word, meaning, feature+)>
1 ) The Sentence may begin with a noun or noun phrase
followed by Verb.
<!ELEMENT adjectives (adjective)+>
2) A Sentence may begin with a Verb Phrase followed by a
<!ELEMENT adjective (word, meaning, feature+)>
Noun.
<!ELEMENT pronouns (pronoun)+>
3)
An Adjective is a valid sentence.
<!ELEMENT pronoun (word, meaning, feature+)>
4)
A Noun is a valid sentence.
<!ELEMENT word (#PCDATA)>
5)
A Verb Phrase is a Valid Sentence.
<!ELEMENT meaning (#PCDATA)>
6)
The auxiliary ‘i’ (é) is added as suffix to a noun that does
not end with ‘i’, ‘a’ or ‘y’ sounding alphabet.
<!ELEMENT feature (phoneme, emotion?)>
7)
There is no tense in this language as Computer games are
<!ELEMENT phoneme (#PCDATA)>
played in real time.
<!ELEMENT emotion (#PCDATA)>
8)
The language has no gender.
<!ATTLIST noun type CDATA #REQUIRED>
A valid sentence is any word or phrase that satisfies
<!ATTLIST verb phrase (yes|no) "yes">
conditions 1 to 5 and is composed of word or words within the
GPL dictionary. A valid sentence may generate a response from
<!--"prosody" features as attributes in
the system.
"emotion", the
prob attribute provides accuracy in
Examples of some GPL sentences:
judgement -->
1)
I went home. – Ma birra doma.
<!ATTLIST emotion pitch CDATA #IMPLIED range
2)
I had food and water. – Ma yami gali gaa.
CDATA #IMPLIED
3)
Shoot the Aircraft. – Jet buma.
prob CDATA #IMPLIED>
4)
Jump in the river. – Rivi guju.
The GPL DTD has grammar as its root node. The grammar
5)
He is a dumb guy. – Eta dud.
has words along with their meanings and features
6)
A crocodile is swimming in the dirty river. – Kroki gubi
classified in nouns, verbs, adjectives and pronouns.
yaki rivi.
The feature has phoneme and emotion tags that store
7)
Look out for the enemy. – Waay villi.
phonetic representation and emotion features of every word.
8)
Gosh! You killed a friend. – Fat! tu buddy buma.
The emotion tag has the prosody features of pitch and range
inherited from JavaTM Speech API Markup Language [16] as
9)
Do you want a coke? – Wana kola?
attribute along with prob that stores the probability of
10) Where are the enemy aircrafts? - Wara villi Jet?
accuracy of judgement of emotion. The Prob attribute will
provide flexibility of programming to the Game developer.
11) Run for shelter! – Zoomi humpy!
XML schema definition language (XSD) is a powerful but
3.3 Document Type Definition (DTD) of the
flexible document definition language that provides control
GPL Grammar
not only over elements and attribute existence, content and
order but also specifies when and how elements and attributes
A DTD defines rules for validating an XML document using
can be used along with the content of attribute based on the
Backus-Naur-Form grammar to identify, which elements are
position of attribute elements within the document hierarchy
valid for a particular XML document and which attributes are
[17]. The work on XML Schema began in 1999 and got
then valid to be used with each of those elements [17]. The
recommendation status in May 2001 [10].
popularity of DTD is due to its ease of development coupled
by the availability of more Software’s for validating and
The GPL Schema for the GPL grammar is as follows:
testing the conformance of XML documents.
<? xml version=”1.0” encoding=”utf-8”?>
The GPLXML we have developed will be used for word
<xsd:schema
recognition. The simple tags will make it easier to synthesize
xmlns:xsd=http://www.w3.org/2001/XMLSchema>
utterances from cognitive models. It has the following DTD -
<xsd:element name=”GPLXML”>
<?xml version="1.0" standalone="yes"?>
<xsd:attributeGroup name=”prosody”>
<!ELEMENT GPLXML (grammar)>
<xsd:attribute name=”pitch” type=”xsd:string”
9 4
use=”optional”/>
<xsd:element name=”word” type=”xsd:string”/>
<xsd:attribute name=”range” type=”xsd:string”
<xsd:element name=”meaning”
use=”optional”/>
type=”xsd:string”/>
<xsd:attribute name=”prob” use=”optional”>
<xsd:element name=”features” minOccurs=”1”
<xsd:simpleType >
maxOccurs=”unbounded” >
<xsd:restriction base=”xsd: nonNegativeInteger”>
<xsd:complexType>
<xsd:minInclusive value=”0”/>
<xsd:sequence>
<xsd:maxInclusive value=”100”/>
<xsd:element name=”phoneme”
</xsd:restriction>
type=”xsd:string”/>
</xsd:simpleType >
<xsd:element name=”emotion”
</xsd:attribute >
type=”xsd:string” minOccurs=”1”
</xsd:attributeGroup>
maxOccurs=”1” >
<xsd:complexType>
<xsd:complexType>
<xsd:sequence>
<xsd:attributeGroup
ref=”prosody”/>
<xsd:element name=”noun” minOccurs=”1”
</xsd:complexType>
maxOccurs=”unbounded”>
</xsd:element>
<xsd:complexType>
</xsd:sequence>
<xsd:sequence>
</xsd:complexType>
<xsd:element name=”word” type=”xsd:string”/>
</xsd:element>
<xsd:element name=”meaning”
</xsd:sequence>
type=”xsd:string”/>
<xsd:attribute name=”phrase” use=”required“
<xsd:element name=”features” minOccurs=”1”
default=”yes”>
maxOccurs=”unbounded” >
<xsd:simpleType >
<xsd:complexType>
<xsd:restriction base=”xsd:string”>
<xsd:sequence>
<xsd:enumeration value=”yes”/>
<xsd:element name=”phoneme”
<xsd:enumeration value=”no”/>
type=”xsd:string”/>
</xsd:restriction>
<xsd:element name=”emotion”
</xsd:simpleType >
type=”xsd:string” minOccurs=”1”
</xsd:attribute>
maxOccurs=”1” >
</xsd:complexType>
<xsd:complexType>
</xsd:element>
<xsd:attributeGroup
<xsd:element name=”adjective” minOccurs=”1”
ref=”prosody”/>
maxOccurs=”unbounded”>
</xsd:complexType>
<xsd:complexType>
</xsd:element>
<xsd:sequence>
</xsd:sequence>
</xsd:complexType>
<xsd:element name=”word” type=”xsd:string”/>
</xsd:element>
<xsd:element name=”meaning”
</xsd:sequence>
type=”xsd:string”/>
<xsd:attribute name=”type” type=”xsd:string”
<xsd:element name=”features” minOccurs=”1”
use=”required”/>
maxOccurs=”unbounded” >
</xsd:complexType>
<xsd:complexType>
</xsd:element>
<xsd:sequence>
<xsd:element name=”verb” minOccurs=”1”
<xsd:element name=”phoneme”
maxOccurs=”unbounded”>
type=”xsd:string”/>
<xsd:complexType>
<xsd:element name=”emotion”
<xsd:sequence>
type=”xsd:string” minOccurs=”1”
maxOccurs=”1” >
9 5
<xsd:complexType>
nodes. There can be more than one feature tag per noun that
<xsd:attributeGroup
stores phonetic representation of the word along with its
ref=”prosody”/>
emotional content as its sub-nodes. The emotion tag has
optional attributes of pitch, range and prob to store the pitch,
</xsd:complexType>
frequency range and probability (of judgement in the
</xsd:element>
estimation of emotional content) respectively. The verbs tag
has multiple verb(s) within it and each verb has phrase as
</xsd:sequence>
attribute along with word, meaning and feature as sub-nodes.
</xsd:complexType>
The rest is similar to that of nouns tag. The adjectives tag has
</xsd:element>
multiple adjective(s) within it and each adjective has word,
</xsd:sequence>
meaning and feature as sub-nodes. The rest is similar to that of
nouns tag. The pronouns tag has multiple pronoun(s) within
it and each pronoun has word, meaning and feature as sub-
</xsd:complexType>
nodes. The rest is similar to that of nouns tag.
</xsd:element>
An instance that conforms to the GPLXML DTD is given below
–
<xsd:element name=”pronoun” minOccurs=”1”
maxOccurs=”unbounded”>
<!DOCTYPE GPLXML SYSTEM "GPLXML.dtd">
<xsd:complexType>
<GPLXML>
<xsd:sequence>
<grammar>
<xsd:element name=”word” type=”xsd:string”/>
<nouns>
<xsd:element name=”meaning”
<noun type="proper">
type=”xsd:string”/>
<word>duma</word>
<xsd:element name=”features” minOccurs=”1”
<meaning>home</meaning>
maxOccurs=”unbounded” >
<feature>
<xsd:complexType>
<phoneme> D UW M AH </phoneme>
<xsd:sequence>
<emotion>
<xsd:element name=”phoneme”
joy
type=”xsd:string”/>
</emotion>
<xsd:element name=”emotion”
</feature>
type=”xsd:string” minOccurs=”1”
</noun>
maxOccurs=”1” >
<noun type="common">
<xsd:complexType>
<word> buddy </word>
<xsd:attributeGroup
<meaning> friend </meaning>
ref=”prosody”/>
<feature>
</xsd:complexType>
<phoneme> B AH D IY </phoneme>
</xsd:element>
<emotion>
</xsd:sequence>
joy
</xsd:complexType>
</emotion>
</xsd:element>
</feature>
</xsd:sequence>
</noun>
</xsd:complexType>
</nouns>
</xsd:element>
<verbs>
</xsd:sequence>
<verb phrase="yes">
</xsd:complexType>
<word>wata</word>
</xsd:element>
<meaning>what is it</meaning>
</xsd:schema>
<feature>
<phoneme>HH W AH T . IH Z . IH T
3.4 An Instance of GPLXML
</phoneme>
The instance of GPLXML begins with the grammar tag, which
has nouns, verbs, adjectives and pronouns as sub-nodes. The
<emotion prob="90">
nouns tag has multiple noun(s) within it and each noun has
Anger
type as attribute along with word, meaning and feature as sub-
9 6
</emotion>
2 . Lower value of entropy in GPL over its English
</feature>
equivalent.
</verb>
3 . Lower difference between the values of perfect
<verb phrase="no">
information and entropy of GPL.
<word>i</word>
The values are listed in Table 8. From the table we can see that
only 30 words of GPL can be used to convey the same
<meaning>is, was</meaning>
information expressed by 52 words in English, resulting i n
<feature>
lower value of Entropy of GPL over English. This signifies
substantial reduction in grammar in GPL over English. We also
<phoneme>IY</phoneme>
find that the ratio of perfect information of GPL to English i s
</feature >
less than 1 and the unique words (UW) to Words ratio is much
</verb>
higher in GPL than English, implying that the core of GPL lies
in its vocabulary, which is one of the desired criteria.
</verbs>
<adjectives>
Table-8
<adjective>
English
GPL
Words
UW
H
H0
Words
UW
H
H0
<word>jumbo</word>
5 2
3 8
5.0
5.2
3 0
2 7
4.7
4.7
<meaning>huge</meaning>
5
5
1
5
<feature>
<phoneme>JH AH M B OW</phoneme>
The requirements of the game are for rather short utterances,
<emotion prob="80">
where we would like to have a significant semiotic weight
astonishment
carried by every word. Thus the entropy should approach the
maximum value where every word contributes to an utterance
</emotion>
with equal probability. We show some examples of typical
</feature>
utterances and calculate the corresponding entropy. Note that
we do not use all of the words of our vocabulary, but use this
</adjective>
to demonstrate the procedure by which we measure word
</adjectives>
independence.
<pronouns>
<pronoun>
4. CONCLUSION
1) Both AL’s and have low values of H
<word>tu</word>
0(X) and H(X) and
hence number of unique words as compared to English
<meaning>you</meaning>
for expressing the same information, thus we can express
<feature>
more information with minimum number of words b y
both AL and GPL.
<phoneme>T UW</phoneme>
2)
Low differences in the values of H
<emotion>
0(X) and H(X) in AL’s
and GPL shows that both AL’s and GPL are rich i n
anger
vocabulary.
</emotion>
3) The extremely low values and difference between H0(X)
</feature >
and H(X) of GPL over English signifies minimal grammar
and richness in vocabulary of GPL.
</pronoun>
</pronouns>
We have shown in this paper how a CPL with small vocabulary
with cues from aboriginal and other languages can be used t o
</grammar>
develop a GPL. Although much work is needed to be done in
</ GPLXML>
generating a cross continental, socio-culturally acceptable
vocabulary, we believe that GPLXML will be sufficient for
representing the grammar of GPL. The limitation of analysis of
3.5 INFORMATION CONTENT OF GPL
a language with Information theory is that it is silent about the
Since the use of GPL is in short bursts, we have analyzed the
characteristics and complexity of vocabulary in that language,
examples of section 3.2 to extract the value of Entropy and
hence we have focused on developing a limited set of
perfect information of English and GPL.
vocabulary with least number of syllables that is bound b y
simple and non-rigid grammatical rules for use with speech
Favorable values of the measures of information theory for
interactive Computer Games.
GPL are as follows –
5. ACKNOWLEDGMENTS
1 . Lower value of perfect information and hence unique
words in GPL over its English equivalent.
This research project is funded by Australian Research Council
(ARC) through a linkage grant numbered LP0216837.
9 7
6. REFERENCES
[1]
Australia Aboriginal Languages.
http://www.geocities.com/Athens/9479/guugu.html.
[2]
Bagou, O., Fougeron,C., Frauenfelder, U.,H., Contribution
of Prosody to the Segmentation and storage of “Words”
in the Acquisition of a New Mini-Language, Speech
Prosody 2002 an International Conference, France, 11-13
April 2002.
[3]
Gray, R., M., Entropy and Information Theory, Springer
Verlag, 1990.
[4]
Grover, C., Holt, A., Klein, E., Moens, M., Designing a
controlled language for interactive model checking,
CLAW 2000, 3rd International Controlled Language
Applications Workshop,Seattle, Washington, USA, 29-
30 April 2000
[5]
Hall, R., A., Pidgin and Creole Languages, 1966, Cornell
University Press, London.
[6]
Helmuth, L., Dyslexia: same brain different languages,
Science, Vol. 291, pp 2064-5, 2001.
[7]
Hinde, S., Belrose,G., Computer Pidgin Language: A new
Language to talk to your Computer?.
http://www.hpl.hp.com/techreports/2001/HPL-2001-
182.pdf.
[8]
Horton, D., The Encyclopaedia of Aboriginal Australia,
Vol 1,1994, Australian Institute of Aboriginal and Torres
Strait Islander Studies, Australia.
[9]
Hosking, D., McNicol, S., Wiradjuri, Panther Publishing
and Printing, Canberra, 1993
[10] Hunter, D., Cagle, K., Dix, C., Kovack, R., Pinnock, J.,
Rafter, J., Beginning XML 2nd Edition, 2001, Wrox Press
Ltd, UK.
[11] Kamilaroi/Gamilaraay Dictionary.
http://coombs.anu.edu.au/WWWVLPages/AborigPages/L
ANG/GAMDICT/GAMDICTF.HTM.
[12] MacKay, D., J., C., A Short Course in Information Theory:
Lecture notes 1 & 2.
http://www.inference.phy.cam.ac.uk/mackay/info-
theory/course.html.
[13] Pulman, S., Controlled language for knowledge
representation, CLAW96: Proceedings of the first
International workshop on Controlled language
Applications, Belgium, March 1996, pp 233-242.
[14] Robinson, J., Voices of Queensland, 2001, Oxford
University Press, Australia.
[15] Shannon, C., E., A Mathematical Theory of
Communication, The Bell System Technical Journal, vol
27, pp 379-423, 1948.
[16] Sun Microsystems Inc, JavaTM Speech API Markup
Language Specification version 0.6, 2001.
http://java.sun.com/products/java-
media/speech/forDevelopers/JSML/.
[17] Williamson, H., XML: The Complete Reference, 2001,
Tata McGraw-Hill Publishing Company Ltd, New Delhi.
9 8
Add New Comment