International Journal of
Biometrics and Bioinformatics
(IJBB)
Volume 4, Issue 4, 2010
Edited By
Computer Science Journals
www.cscjournals.org
Editor in Chief Professor João Manuel R. S. Tavares
International
Journal
of
Biometrics
and
Bioinformatics (IJBB)
Book: 2010 Volume 4, Issue 4
Publishing Date: 30-10-2010
Proceedings
ISSN (Online): 1985-2347
This work is subjected to copyright. All rights are reserved whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting,
re-use of illusions, recitation, broadcasting, reproduction on microfilms or in any
other way, and storage in data banks. Duplication of this publication of parts
thereof is permitted only under the provision of the copyright law 1965, in its
current version, and permission of use must always be obtained from CSC
Publishers. Violations are liable to prosecution under the copyright law.
IJBB Journal is a part of CSC Publishers
http://www.cscjournals.org
© IJBB Journal
Published in Malaysia
Typesetting: Camera-ready by author, data conversation by CSC Publishing
Services – CSC Journals, Malaysia
CSC Publishers
Editorial Preface
This is the fourth issue of volume four of International Journal of Biometric
and Bioinformatics (IJBB). The Journal is published bi-monthly, with papers
being peer reviewed to high international standards. The International
Journal of Biometric and Bioinformatics are not limited to a specific aspect of
Biology but it is devoted to the publication of high quality papers on all
division of Bio in general. IJBB intends to disseminate knowledge in the
various disciplines of the Biometric field from theoretical, practical and
analytical research to physical implications and theoretical or quantitative
discussion intended for academic and industrial progress. In order to position
IJBB as one of the good journal on Bio-sciences, a group of highly valuable
scholars are serving on the editorial board. The International Editorial Board
ensures that significant developments in Biometrics from around the world
are reflected in the Journal. Some important topics covers by journal are Bio-
grid, biomedical image processing (fusion), Computational structural biology,
Molecular sequence analysis, Genetic algorithms etc.
The coverage of the journal includes all new theoretical and experimental
findings in the fields of Biometrics which enhance the knowledge of scientist,
industrials, researchers and all those persons who are coupled with
Bioscience field. IJBB objective is to publish articles that are not only
technically proficient but also contains information and ideas of fresh interest
for International readership. IJBB aims to handle submissions courteously
and promptly. IJBB objectives are to promote and extend the use of all
methods in the principal disciplines of Bioscience.
IJBB editors understand that how much it is important for authors and
researchers to have their work published with a minimum delay after
submission of their papers. They also strongly believe that the direct
communication between the editors and authors are important for the
welfare, quality and wellbeing of the Journal and its readers. Therefore, all
activities from paper submission to paper publication are controlled through
electronic systems that include electronic submission, editorial panel and
review system that ensures rapid decision with least delays in the publication
processes.
To build its international reputation, we are disseminating the publication
information through Google Books, Google Scholar, Directory of Open Access
Journals (DOAJ), Open J Gate, ScientificCommons, Docstoc and many more.
Our International Editors are working on establishing ISI listing and a good
impact factor for IJBB. We would like to remind you that the success of our
journal depends directly on the number of quality articles submitted for
review. Accordingly, we would like to request your participation by
submitting quality manuscripts for review and encouraging your colleagues to
submit quality manuscripts for review. One of the great benefits we can
provide to our prospective authors is the mentoring nature of our review
process. IJBB provides authors with high quality, helpful reviews that are
shaped to assist authors in improving their manuscripts.
Editorial Board Members
International Journal of Biometrics and Bioinformatics (IJBB)
Editorial Board
Editor-in-Chief (EiC)
Professor. João Manuel R. S. Tavares
University of Porto (Portugal)
Associate Editors (AEiCs)
Assistant Professor. Yongjie Jessica Zhang
Mellon University (United States of America)
Professor. Jimmy Thomas Efird
University of North Carolina (United States of America)
Professor. H. Fai Poon
Sigma-Aldrich Inc (United States of America)
Professor. Fadiel Ahmed
Tennessee State University (United States of America)
Mr. Somnath Tagore (AEiC - Marketing)
Dr. D.Y. Patil University (India)
Professor. Yu Xue
Huazhong University of Science and Technology (China)
Professor. Calvin Yu-Chian Chen
China Medical university (Taiwan)
Associate Professor. Chang-Tsun Li
University of Warwick (United Kingdom)
Editorial Board Members (EBMs)
Dr. Wichian Sittiprapaporn
Mahasarakham University (Thailand)
Assistant Professor. M. Emre Celebi
Louisiana State University (United States of America)
Dr. Ganesan Pugalenthi
Genome Institute of Singapore (Singapore)
Dr. Vijayaraj Nagarajan
National Institutes of Health (United States of America)
Dr. Paola Lecca
University of Trento (Italy)
Associate Professor. Renato Natal Jorge
University of Porto (Portugal)
Assistant Professor. Daniela Iacoviello
Sapienza University of Rome (Italy)
Professor. Christos E. Constantinou
Stanford University School of Medicine (United States of America)
Professor. Fiorella SGALLARI
University of Bologna (Italy)
Professor. George Perry
University of Texas at San Antonio (United States of America)
Table of Content
Volume 4, Issue 4, October 2010
Pages
136 - 146 Gene Expression Based Acute Leukemia Cancer Classification: a
Neuro-Fuzzy Approach
B. B. M. Krishna Kanth, U. V. Kulkarni, B. G. V. Giridhar
147 - 160
Bimodal Biometric Person Authentication System Using Speech
and Signature Features
Prof. M.N.Eshwarappa, Prof. (Dr.) Mrityunjaya V. Latte
International Journal of Biometrics and Bioinformatics (IJBB), Volume (4): Issue (4)
B. B. M. Krishna Kanth, U. V. Kulkarni & B. G. V. Giridhar
Gene Expression Based Acute Leukemia
Cancer Classification: a Neuro-Fuzzy
Approach
B. B. M. Krishna Kanth
bbkkanth@yahoo.com
Research Scholar S.R.T.M.University
Nanded, Maharastra, India
U. V. Kulkarni
kulkarniuv@yahoo.com
Dean of Academics and Head Department
of Computer Science S.R.T.M.University,
Nanded,Maharastra, India
B. G. V. Giridhar
murarihamlet@rediffmail.com
Assistant Professor Department of Endocrinology
Andhra Medical College Visakhapatnam,
A.P, India
Abstract
In this paper, we proposed the Modified Fuzzy Hypersphere Neural Network
(MFHSNN) for the discrimination of acute lymphoblastic leukemia (ALL) and
acute myeloid leukemia (AML) in leukemia dataset. Dimensionality reduction me-
thods, such as Spearman Correlation Coefficient and Wilcoxon Rank Sum Test
are used for gene selection. The performance of the MFHSNN system is encour-
aging when benchmarked against those of Support vector machine (SVM) and
the K-nearest neighbor (KNN) classifiers. A classification accuracy of 100% has
been achieved using the MFHSNN classifier using only two genes. Furthermore,
MFHSNN is found to be much faster with respect to training and testing time.
Keywords: gene expression data, cancer classification, AAL/AML, membership function, hypersphere
1. INTRODUCTION
Microarrays [1], also known as gene chips or DNA chips, provide a convenient way of obtaining
gene expression levels for a large number of genes simultaneously. Each spot on a microarray
chip contains the clone of a gene from a tissue sample. Some mRNA samples are labeled with
two different kinds of dyes, for example, Cy5 (red) and Cy3 (blue). After mRNA interacts with the
genes, i.e., hybridization, the color of each spot on the chip will change. The resulted image re-
flects the characteristics of the tissue at the molecular level. Microarrays can thus be used to help
classify and predict different types of cancers. Traditional methods for diagnosis of cancers are
mainly based on the morphological appearances of the cancers; however, sometimes it is ex-
tremely difficult to find clear distinctions between some types of cancers according to their ap-
pearances. Hence the microarray technology stands to provide a more quantitative means for
cancer diagnosis. For example, gene expression data have been used to obtain good results in
the classifications of Lymphoma, Leukemia [2], Breast cancer, and Liver cancer etc. It is challeng-
ing to use gene expression data for cancer classification because of the following two special as-
International Journal of Biometrics and Bioinformatics, (IJBB), Volume (4): Issue (4)
136
B. B. M. Krishna Kanth, U. V. Kulkarni & B. G. V. Giridhar
pects of gene expression data. First, gene expression data are usually very high dimensional.
The dimensionality ranges from several thousands to over ten thousands. Second, gene expres-
sion data sets usually contain relatively small numbers of samples, e.g., a few tens. If we treat
this pattern recognition problem with supervised machine learning approaches, we need to deal
with the shortage of training samples and high dimensional input features.
Recent approaches to solve this problem include unsupervised methods, such as Clustering [3]
and Self-Organizing Maps (SOM) [4] and supervised methods, such as Support Vector Machines
(SVM)[5], Multi-Layer Perceptrons (MLP) [6], Decision Trees (DT) [7] and K-Nearest Neigh-
bor(KNN) [8, 9]. Su et al [10] employs modular neural networks to classify two types of acute leu-
kemia’s and the best 75% correct classification was reached. Xu et al [11] adopted the ellipsoid
ARTMAP to analyze the AAL/AML data set and the best result was 97.1%. But most of the cur-
rent methods in microarray analysis can not completely bring out the hidden information in the
data. Meanwhile, they are generally lacking robustness with respect to noisy and missing data.
Some studies have shown that a small collection of genes [12] selected correctly can lead to
good classification results [13]. Therefore gene selection is crucial in molecular classification of
cancer. Although most of the algorithms mentioned above can reach high prediction rate, any
misclassification of the disease is still intolerable in acute leukemia’s treatment. Therefore the
demand of a reliable classifier which gives 100% accuracy in predicting the type of cancer there-
with becomes urgent.
In this paper, we apply a robust MFHSNN classifier which is an extension of Fuzzy Hypersphere
Neural Network (FHSNN) proposed by Kulkarni et al [14] to the problem of cancer classification
based on gene expression data. To reduce the dimensionality of genes correlation method such
as Spearman Correlation Coefficient and statistical method such as Wilcoxon Rank Sum Test are
used. The MFHSNN utilizes fuzzy sets as pattern classes in which each fuzzy set is a union of
fuzzy set hyperspheres. The fuzzy set hypersphere is an n-dimensional hypersphere defined by a
center point and radius with its membership function. We first experiment the classifier with 38
leukemia samples and test the classifier with another 34 samples to obtain the accuracy rate.
Meanwhile, this study reveals that the classification result is greatly affected by the correlativity
with the class distinction in the data set. The remainder of the paper is organized as follows. The
gene selection methods for choosing effective predictive genes in our work are introduced in Sec-
tion 2. Then Sections 3 gives a brief introduction for the architecture of the MFSHNN, followed by
its learning algorithm in section 4. Section 5 examines the experimental results of the classifiers
operated on leukemia data set. Conclusions are made in Section 6.
2. GENE SELECTION METHODS
Among the large number of genes, only a small part may benefit the correct classification of can-
cers. The rest of the genes have little impact on the classification. Even worse, some genes may
act as noise and undermine the classification accuracy. Hence, to obtain good classification accu-
racy, we need to pick out the genes that benefit the classification most. In addition, gene selection
is also a procedure of input dimension reduction, which leads to a much less computation load to
the classifier. Maybe more importantly, reducing the number of genes used for classification can
help researchers put more attention on these important genes and find the relationship between
the genes and the development of the cancer.
2.1. Correlation Analysis for Gene Selection
In order to score the similarity of each gene, an ideal feature vector [15] is defined. It is a vector
consisting of 0’s in one class (ALL) and 1’s in other class (AML). It is defined as follows:
ideal = (0,0,0,0,0,0,1,1,1,1,1,1) (1)
i
The ideal feature vector is highly correlated to a class. If the genes are similar with the ideal vec-
tor (the distance from the ideal vector and the gene is small), we consider that the genes are in-
International Journal of Biometrics and Bioinformatics, (IJBB), Volume (4): Issue (4)
137
B. B. M. Krishna Kanth, U. V. Kulkarni & B. G. V. Giridhar
formative for classification. The similarity of g and g
using similarity measure such as the
i
ideal
Spearman coefficient is defined as follows
2
n
6∑(ideal − g
i
i )
SC=
i 1
1
=
−
(2)
n × ( 2
n − )
1
Where n is the number of samples; g is the i real value of the gene vector and ideal is the
i
th
i
corresponding i binary value of the ideal feature vector.
th
2.2. Wilcoxon Rank-Sum Test (WRST) for Gene Selection
The Wilcoxon rank-sum test [16, 17] is a big category of non-parametric tests. The general idea is
that, instead of using the original observed data, we can list the data in the value ascending or-
der, and assign each data item a rank, which is the place of the item in the sorted list. Then, the
ranks are used in the analysis. Using the ranks instead of the original observed data makes the
rank sum test much less sensitive to outliers and noises than the classical (parametric) tests [18].
The WRST organizes the observed data in value ascending order. Each data item is assigned a
rank corresponding to its place in the sorted list. These ranks, rather than the original observed
values are then used in the subsequent analysis. The major steps in applying the WRST are as
follows:
(i) Merge all observations from the two classes and rank them in value ascending order.
(ii) Calculate the Wilcoxon statistics by adding all the ranks associated with the observations from
the class with a smaller number of observations.
3. MODIFIED FUZZY HYPERSPHERE NEURAL NETWORK CLASSIFIER
The MFHSNN consists of four layers as shown in Figure 1(a). The first, second, third and fourth
layer is denoted as F , F , F and F respectively. The F layer accepts an input pattern and
R
M
N
O
R
consists of n processing elements, one for each dimension of the pattern. The F layer consists
M
of q processing nodes that are constructed during training and each node represents hyper-
sphere fuzzy set characterized by hypersphere membership function. The processing performed
by each node of F layer is shown in Figure 1(b). The weights between F and F layer
M
R
M
represent
centre
points
of
the
hyperspheres.
As
shown
in
Figure
1(b),
C = (c ,c ,c .........c
m . In addition to this each
j
1
j
j 2
j3
jn ) represents center point of the hypersphere
j
hypersphere takes one more input denoted as threshold T, which is set to one and the weight
assigned to this link is ξ . The ξ represents radius of the hypersphere m , which is updated dur-
j
j
j
ing training. The center points and radii of the hyperspheres are stored in matrix C and vector ξ
respectively. The maximum size of hypersphere is bounded by a user defined value λ ,
where 0 ≤ λ ≤ 1. The λ is called as growth parameter that is used for controlling maximum size of
the hypersphere and it puts maximum limit on the radius of the hypersphere. Assuming the train-
ing set defined as R ∈{R h = 1, 2,.....P , where R = (r , r , r .....
n
r
∈ I is the
h
1
h
h 2
h3
hn )
h pattern the,
h
}
th
membership function of the hypersphere node m is m
R C ζ
= − f l ζ γ
j (
,
,
h
j
j )
1
( , ,j ) (3)
j
where f ( ) is three-parameter ramp threshold function defined as
0 , if (0 ≤ l ≤ ζ )
j
f (l , ζ , γ = l − ζ
γ
if
ζ
≤ l ≤
(4)
j
)
(
) ,
(
1)
j
j
1, if (l ≥ 1)
International Journal of Biometrics and Bioinformatics, (IJBB), Volume (4): Issue (4)
138
Add New Comment