INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
35
Extreme Learning Machines - A Review
and State-of-the-art
R. Rajesh, J. Siva Prakash
Abstract
Learning time is an important factor while designing any computational intelligent
algorithms for classifications, medication, control etc. Recently, Extreme Learning
Machine has been proposed, which significantly reduce the amount of time needed
to train a Neural Network. It has been widely used for many applications. This paper
surveys ELM and it applications.
Index Terms
Extreme learning machine, Neural Network, Single layer feedforward neural
network, classification, regression
I. INTROUCTION
Neural Networks have been extensively used in many fields due to their ability
to approximate complex nonlinear mappings directly from the input sample; and
to provide models for a large class of natural and artificial phenomena that are
difficult to handle using classical parametric techniques. There are many algorithm
for training Neural Network like Back propagation, Support Vector Machine
(SVM) [41], Hidden Markov Model (HMM) etc. One of the disadvantages of
the Neural Network is the learning time.
Recently, Huang et al [25], [67]proposed a new learning algorithm for Single
Layer Feedforward Neural Network architecture called Extreme Learning Ma-
chine (ELM) which overcomes the problems caused by gradient descent based
algorithms such as Back propagation applied in ANNs. ELM can significantly
reduce the amount of time needed to train a Neural Network.
This paper presents a survey of Extreme Learning Machine (ELM). This paper is
organized as follows, Section 2 describes about the working of ELM, and Section
3 presents the learning of ELM. Applications of ELM are reviewed in Section 4
and Section 5 concludes of the paper.
II. EXTREME LEARNING MACHINE - A REVIEW
Extreme Learning Machine proposes by Huang at el [25], [29] uses Single Layer
Feedforward Neural Network (SLFN) Architecture [1]. It randomly chooses the
input weights and analytically determines the output weights of SLFN. It has much
better generalization performance with much faster learning speed. It requires less
human interventions and can run thousands times faster than those conventional
methods. It automatically determines all the network parameters analytically,
Dr. R. Rajesh is at School of Computer Science and Engineering, Bharathiar University. He can be contacted
by kollamrajeshr@ieee.org
Mr. J. Siva Prakash at Daffodills India Technologies, 211, TVS nagar, Edayarpalayam, Coimbatore -25.
He has done his Master of Philosophy in Computer Science at Bharathiar University. He can be contacted
by siva5200@gmail.com
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
36
which avoids trivial human intervention and makes it efficient in online and real-
time applications. Extreme Learning Machine has several advantages, Ease of
use, Faster Learning Speed, Higher generalization performance, suitable for many
nonlinear activation function and kernel functions.
2.1. A Note on Single Hidden Layer Feedforward Neural Network
Single Hidden Layer Feedforward Neural Network (SLFN) function with A
hidden nodes [31], [49] can be represented as mathematical description of SLFN
incorporating both additive and RBF hidden nodes in a unified way is given as
follows.
A
O
O
(1)
U
U
U
3/4
E
3/4
E
A
1/2
where
and
are the learning parameters of hidden nodes and
the weight
connecting the th hidden node to the output node.
is the output of
U
the th hidden node with respect to the input
. For additive hidden node with
U
the activation function
(e.g., sigmoid and threshold),
U
E
E
U
is given by
(2)
U
U
*
3/4
E
where
is the weight vector connecting the input layer to the th hidden node
and
is the bias of the th hidden node.
denotes the inner product of vector
U
and
in
O
.
U
E
For RBF hidden node with activation function
(e.g., Gaussian),
U
E
E
given by
U
*
(3)
U
U
3/4
E
where
and
are the center and impact factor of th RBF node.
Y
indicates
E
the set of all positive real values. The RBF network is a special case of SLFN
with RBF nodes in its hidden layer.
For
, arbitrary distinct samples
O
N
. Here,
is a
AE
U
O
3/4
E
E
U
O
1/2
input vector and
is a
target vector. If an SLFN with
hidden nodes can
O
N
1/2
A
approximate these
samples with zero error. If then implies that there exist
,
AE
and
such that
A
(4)
U
U
1/2
AE
A
1/2
Equation (4) can be written compactly as
(5)
A
I
where
3/4
U
U
1/2
1/2
1/2
A
A
1/2
(6)
A
U
U
U
1/2
1/2
AE
A
A
AE
AE
A
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
37
with
;
;
.
U
U
U
1/2
A
1/2
A
1/2
AE
3/4
3/4
I
I
O
1/2
1/2
.
.
.
.
.
.
(7)
O
I
I
I
O
A
AE
AN
AE
N
is the hidden layer output matrix of SLFN with th column of
being the
A
A
ith hidden node's output with respect to inputs
.
U
U
U
1/2
1/2
AE
2.2. Principles of ELM - A Survey
ELM [25], [29] designed as a SLFN with L hidden neurons can learn L distinct
samples with zero error. Even if the number of hidden neurons (L)
the number
of distinct samples (N), ELM can still assign random parameters to the hidden
nodes and calculate the output weights using pseudoinverse of H giving only a
small error
. The hidden node paremeters of ELM
and
(input weights
1/4
and biases or centers and impact factors) need not be tuned during training and
may simply be assigned with random values. The following theorems state the
same.
Theorem 1: (Liang et.al.[49]) Let an SLFN with
additive or RBF hidden
A
nodes and an activation function
which is infinitely differentiable in any
U
interval of R be given. Then, for arbitrary
distinct input vectors
A
U
U
3/4
O
and
A
randomly generated with any continuous
E
1/2
A
1/2
probability distribution, respectively, the hidden layer output matrix is invertible
with probability one, the hidden layer output matrix H of the SLFN is invertible
and A I
1/4
Theorem 2: (Liang et.al.[49])Given any small positive value
and
1/4
activation function
which is infinitely differentiable in any
U
E
E
interval, there exists
such that for
arbitrary distinct input vectors
A
AE
AE
O
, for any
A
randomly generated according to
U
U
3/4
E
1/2
A
1/2
any continuous probability distribution
with probability
A
I
AE
A
AN
AE
N
one.
Since the hidden node paremeters of ELM need not be tuned during training
and since they are simply assigned with random values, eqn (5) becomes a linear
system and the output weights can be estimated as follows.
Y
(8)
A
I
where
Y
is the Moore-Penrose generalized inverse [60] of the hidden layer
A
output matrix
and can be calculated using several methods including orthogonal
A
projection method, orthogonalization method, iterative method, singular value
decomposition (SVD) [60] etc. The orthogonal projection method can be used only
when
I
is nonsingular and
Y
I
1/2
I
. Due to the use of searching
A
A
A
A
A
A
and iterations, orthogonalization method and iterative method have limitations.
Implementations of ELM uses SVD to calculate the Moore-Penrose generalized
inverse of
, since it can be used in all situations. ELM is thus a batch learning
A
method.
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
38
Universal approximation capability of ELM has been analyzed in [31] in an
incremental method and it has been shown that single SLFNs with randomly
generated additive or RBF nodes with a widespread of piecewise continuous
activation functions can universally approximate any continuous target function
on any compact subspace of the Euclidean space
O
.
E
Theorem 3: (Huang et.al. [31])Given any bounded nonconstant piecewise con-
tinuus function
for additive nodes or any integrable piecewise
E
E
E
continuous function
and
for RBF nodes, for
E
E
U
U
1/4
E
any continous target function
and any randomly generated function sequence
holds with probability one if
N
1/4
O
O
1/2
O
O1/2
O
(9)
O
3/4
O
Incremental algorithm (also called I-ELM) has been proposed by Huang et.al.
[31] for SLFN and TLFN (Two hidden layer feedforward neural network) which
increases the hidden neurons one-by-one until the error becomes less than a
predefined constant .
Convex incremental ELM (CI-ELM) [33] is another extension of ELM. In CI-
ELM, the output weights of existing nodes are recalculated based on the Barron's
convex optimization concept [4], when a new hidden node is randomly added
using
, where
.
1/2
*
1/4
1/2
O
O
O1/2
O
O
O
Theorem 4: (Huang et.al [33]) Given any nonconstant piecewise continuous
function
, if span
is dense in
3/4
,
E
E
U
3/4
E
E
A
then for any continous target function f and any function sequence
U
O
randomly generated based on any continuous sampling distribution,
U
O
O
holds with probability 1 if
N
1/2
*
1/4
O
O
O
O1/2
O
O
O1/2
O
O1/2
O
O
O1/2
where G(x,a,b) is the output of hidden nodes.
Based on the above theorem, the output weight for the newly added hidden
node is
I
I
=
A
A
A
A
A
A
A
E
E
AE
AE
, the output weights
O
O
O
O
O
O
O
O
1/2
O
1/2
of existing hidden nodes are recalculated by
, and the residual error
1/2
A
after addeding the new hidden node L is
, where the
1/2
*
A
A
A
A
estimates based on the training set are
I
is the activation
A
1/2
AE
vector of the new node for all the N training samples,
I
is
1/2
AE
the residual vector before the new hidden node is added and
I
is
O
O
1/2
AE
the target vector
The following theorem states universal approximator for any type of piecewise
continuous computational hidden nodes.
Theorem 5: (Huang et.al [33]) Given any nonconstant piecewise continuous
function
, if
is
E
E
xO
O
U
3/4
E
E
dense in
3/4
, for any continuous target function f and any function sequence
A
randomly generated based on any continuous sampling
U
U
O
O
O
distribution,
holds with probabilty 1 if the output parameters
N
1/4
O
1/2
O
are determined by ordinary least square to minimize
E
O
.
U
U
1/2
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
39
Later Huang et.al. came up with EI-ELM (Enhanced random search based
incremental learning machine) [35] and they found that some of the hidden nodes
in networks play a very minor role in the network output thereby increasing the
complexity of the system. So in EI-ELM, at each learning step several hidden
nodes are randomly generated and among them the hidden node leading to the
largest residual error decreasing will be added to the existing network. The output
wieght is calculated as in I-ELM. The following theorem states the same.
Theorem 6: Given an SLFN with any nonconstant piecewise continuous hidden
nodes
, if
is dense in
3/4
, for any
U
xO
O
U
3/4
E
E
A
continuous target function f and any randomly generated function sequence
O
and any positive integer k,
holds with probabilty 1 if if
N
1/4
O
1/2
O
O1/2
O
O
3/4
O
where
E
O
,
and =
N
O
O1/2
*1/2
O
O
1/2
O
O
O
O1/2
.
O
Gradient-based algorithms cannot directly train neural network with threshold
functions as they are nondifferentiable. Hence most of the literature uses sigmoid
function as an approximation to threshold functions. The following lemma 1,
theorems 7,8 by Huang et.al. [32] states the use of threshold functions for extreme
learning machines.
Lemma 1: (Huang et.al. [32]) A SLFN with
hidden neurons with the
AE
activation function
U
and with randomly chosen input weights
U
1/2
1/2
*
and hidden biases can learn
distinct observations with any arbitrarily small
AE
error.
Theorem 7: (Huang et.al. [32]) For a SLFN with the activation function U
U
in the hidden layer, given any constant
, there always exists
1/2
1/2
*
1/4
an integer
such that a SLFN with
hidden neurons and with randomly
A
AE
A
chosen input weights and hidden biases can learn
distinct observations with a
AE
training error less than .
Theorem 8: (Huang et.al. [32]) Suppose that threshold activation function
is used in the hidden layer. Given any nonzero constant
U
1/2
*
1/4
U
1/4
U
1/4
there always exists an integer
such that a SLFN with
such hidden
1/4
A
AE
A
neurons and with randomly chosen input weights and hidden biases can learn AE
distinct observations with its training error less than .
Online Sequential learning algorihtm [49] has been proposed by Liang et.al.
which can learn data one-by-one or chunk by chunck. For this first, it is needed
to select the type of node (additive or RBF), the corresponding activation function
g, and the number of hidden neurons L. Then initialize the learning using a small
chunck of data
AE
1/4
from the given training set
U
O
U
O
U
3/4
1/4
1/2
O
N
,
. Then find hidden layer output matrix
E
O
3/4
E
1/2
AE
AE
1/4
and the intial output weight
1/4
I
, where
I
1/2
and
A
E
A
I
E
A
A
1/4
1/4
1/4
1/4
1/4
1/4
1/4
I
.
I
O
O
1/4
1/2
AE
1/4
Then for each
th chunk of data,
*
1/2
*1/2
(10)
U
O
AE
AE
*
1/2
*1/2
1/4
1/4
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
40
where
denotes the number of observations in the
th chunk, the
AE
*
1/2
*1/2
sequential learning phase is given as
1) Calculate
.
A
*1/2
2) Set
E
E
*1/2
I
.
I
O
AE
*
1/2
O
AE
*1/2
1/4
1/4
3) Calculate
*1/2
using
I
I
1/2
(11)
E
E
E
A
A
*
A
E
A
A
E
*1/2
*1/2
*1/2
*1/2
*1/2
*1/2
I
(12)
*
E
A
I
A
*1/2
*1/2
*1/2
*1/2
4) Do these same procedure for other chunks of new data
Error Minimized ELM (EM-ELM) with automatic Growth of Hidden Nodes
and fast Incremental output weight Learning has been proposed by Feng et.al.
[18] with Lemma 2 and theorem 9.
Lemma 2: (Feng et.al. [18]) Given an SLFN, let
=
A
A
1/2
1/2
A
1/2
1/4
denote the hidden layer output matrix of the SLFN with
U
U
A
A
1/2
AE
1/4
1/4
hidden nodes
A
1/4
. If
new hidden nodes are added to the SLFN,
A
A
1/2
1/4
1/2
the new hidden layer output matrix of the SLFN becomes
,
A
A
3/4
1/2
A
1/2
then
U
U
A
N
O
A
I
A
N
O
A
1/2
A
1/2
AE
3/4
3/4
3/4
1/2
1/2
1/2
1/2
where
denotes the output error functions of SLFNs.
I
A
Theorem 9: (Feng et.al. [18]) (Convergence Theorem): For a given set of
distinct training samples
AE
, given an arbitrary positive value , there
U
O
1/2
exists a positive integer
such that
.
A
N
O
A
I
Given a set of training data
AE
, the maximum number of hidden
U
O
1/2
nodes
, a small positive integer
and the expected learning accuracy ,
A
A
N
U
1/4
the recursive EM-ELM algorithm will randomly add
hidden nodes (total
AE
A
1/2
hidden nodes is
) until the learning error
and the
A
A
*
AE
A
A
1/2
1/2
output weights
is updated recursively by
Y
I
, where
A
I
I
I
*1/2
*1/2
Y
Y
Y
,
I
and
is
A
A
A
AE
A
I
A
A
AE
A
A
A
AE
A
A
A
A
A
*1/2
A
A
A
the hidden layer output matrix with
*
1/2
*
1/2
*
1/2
*
1/2
1/2
1/2
A
A
A
A
1/2
1/2
(13)
AE
A
*
1/2
*
1/2
A
A
AE
A
A
AE
1/2
1/2
AE
AE
A
1/2
2.2.1 Demonstrating XOR classification
Inorder to demonstrate the working of ELM, an XOR problem (2 class problem)
with 4 instances containing 2 input attributes is solved. Table I shows the data
set for the problem. The random inputs weights and the output weights of a 3
hidden layer SLFN generated by ELM is shown in figure 1 which is able to fully
classify the XOR problem.
2.3. Extensions and Applications of ELM
A number of papers based on ELM algorithm have been appeared since its
introduction by Huang in 2003. A breif outline of some of the works are given
below.
In [28], ELM is extended to the case of radial basis function (RBF) networks,
which allows the centres and impact widths of RBF kernels to be rnadomly
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
41
TABLE I
INPUTS AND OUTPUTS OF XOR
class
U
U
1/2
3/4
0
0
0
0
1
1
1
0
1
1
1
0
w1
H1
O1
w2
O2
x1
w3
O3
Y1
O4
w4
H2
x2
w5
O5
Y2
w6
O6
H3
Fig. 1.
A SLFN with one hidden layer with 3 nodes for solving XOR problem. The randlomly generated
input weights are. The output weights calculated using ELM are
generated and the output weights calculated as in ELM. They have shown that it
can learn exteremely fast and produce generalization performance close to SVM.
Fully complex extreme learning machines (C-ELM) have been suggested by Li
et.al. [44], where they extend the ELM algorithm from the real domain to complex
domain and applied to nonlinear channel equalization problem.
Since fuzzy inference system is equivalent to an SLFN, Rong et. al. proposed
online sequential fuzzy extreme learning machines (OS-Fuzzy-ELM) [57], where
the antecedent parameters, namely membership function parameters, of Takagi-
Sugeno-Kang model are generated randomly and the consequent parameters are
determined analytically.
Amal Mohamad Aqlan [2] presents a Hybrid Extreme Learning Machine with
Levenberg-Marquardt Algorithm using AHP method, provides better generaliza-
tion performance and faster convergence rate.
ELM may need higher number of hidden neurons due to the random deter-
mination of the input weights and hidden biases. Hence in E-ELM [77], the
inputs weights and hidden baises are determined using differential evolutions.
Each chromosome is composed of input weights and hidden biases and the fitness
of the chromosome is calculated using
U
U
E
E
AE
3/4
U
U
U
*
*
*
O
O
1/2
1/2
(14)
N
AE
Runxuan Zhang [75] implements Multicategory classification using an Extreme
Learning Machine for Microarray Gene Expression Cancer Diagnosis, and it
provides good classification accuracy, lower training time and much more compact
network compared to SVM-OVO, SANN.
Real-Coded Genetic Algorithm ELM (RCGA-ELM) is proposed in [65], which
selects the best number of hidden neurons and the corresponding input and bias
weights using to genetic operators namely 'weight based' and 'network based'
for both crossover and mutation. Due to the high computational time, in [65],
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
42
TABLE II
OTHER MAJOR APPLICATIONS USING ELM
Work
Journ./Conf.
Year
Robust Object Tracking [3]
IEEE
2007
Time Series Pridiction [60], [61]
PICDM
2008
Optimal Pruned KNN [74]
ICHIS
2008
Reducing effects of Outliers [36]
JDCTA
2008
Variable Selection approach [55]
ESTSP
2008
Mental Tasks from EEG[48]
IJNS
2006
Building Regression Models [53]
ESANN
2008
Image Quality Assessment [64]
Soft Comp.
2009
Text Classification [41]
ICIAAI
2005
Land Cover Classification [56]
ICEGITA
2008
Terrain Reconstruction [73]
IEEE
2006
Channel Equalization [46]
ISNN
2006
Predicting HLA-Peptide Binding [22]
ANN
2006
Active Noise Control [76]
ISNN
2008
QoS Violation Application [11]
Neural Process Lett
2008
ELM and SVM [70]
ICISIP
2006
protein secondary structure prediction [69]
Neurocomputing
2008
multicategory classification method [62], [75]
Bioinformatics
2005
Melting Point of Organic compounds [6]
ACSIECR
2008
Medical Image Annotation and Retrieval [59]
LNCS
2005
Sparse-ELM (S-ELM) is presented which searches the best parameters of ELM
using K-fold validation scheme with less computational time. Suresh et.al. applied
both these algorithms for multi-category sparse data classification and compared
the performance.
Nanying Liang proved Non-Identity Learning Vector Quantization Applied to
Evoked Potential Detection using a new algorithm LVQ-ELM [47]. It is the
combination of LVQ and ELM. The LVQ-ELM algorithm provide the best testing
accuracy using less hidden neurons compared to original version of ELM.
Chul Kwak implements ELM based classification in Cardiac Disorder Classifi-
cation [40], using segmentation algorithm by heart sound signals. It significantly
improves the classification accuracy in cardiac disorder categories compared to
HMM, MLP, and SVM-based classifiers.
Dianhui Wang presents a Protein Sequence Classification [67] using ELM. It can
be used with many nonlinear activation function and kernel functions to provide
less training time, classification accuracy is slightly better than compared to BP.
Table II shows a list of other major application papers based on ELM. Table
III shows a list of other major extensions papers based on ELM
III. SIMULATIONS AND RESULTS
3.1. Classification of Fisher's Iris Dataset
In 1936, Sir Ronald Aylmer Fisher developed Fisher's Iris data set. It is
sometimes called Anderson's Iris data set because Edgar Anderson collected the
data to quantify the geographic variation of Iris flowers in the Gaspe Peninsula.
The dataset [5] consists of 50 samples from each of three species of Iris flowers
(Iris setosa, Iris virginica and Iris versicolor). Four features were measured from
each samples, they are length of sepal, width of sepal, length of petal and width
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
43
TABLE III
OTHER MAJOR EXTENSIONS OF ELM
Work
Journ./Conf.
Year
Ensembling ELM [10]
ISNN
2007
Robust OS-ELM [23]
ISNN
2007
Improved OS-ELM [45]
ISNN
2007
Recursive C-ELM [50]
ISNN
2006
Robust Recursive C-ELM [51]
ISNN
2006
E-ELM based on PSO [72]
ISNN
2006
Improved Learning Algorithms for SLFN [20]
ICIC: AICTA
2008
A Priori Information in ELM [21]
ISNN
2006
Multi-Stage ELM [24]
NCA
2008
Extreme SVM [43]
PAKDD
2008
Novel Algorithm for Feedforward NN [12]
ISNN
2006
fast pruned-ELM [58]
Neurocomputing
2008
Partial Lanczos ELM [66]
Neurocomputing
2009
ELM-Bacterial Foraging [14]
EESRI
2007
TABLE IV
CLASSIFICATION ACCURACY OF IRIS DATA SET
Algorithm
Accuracy %
Chen-and-Fang method (2005) [9]
97.33
Hong-and-Lee's method (1996) [38]
96.67
Wu-and-Chen's method (1999) [71]
96.28
Castro's method(1999) [7]
96.72
Chang-and-Chen's method (2001) [15]
96.07
ANN (2008) [63]
94.87
Our simulation using ELM
98.67
of petal. In our simulations, ELM with 25 hidden nodes is able to learn the data
within 1 minute in Pentium dual core machine (3.0 GHz) with 1GB RAM. ELM
is able to achieve a testing accuracy of 98.67. The performance comparison of
the ELM algorithm with other algorithms are shown in table IV.
3.2. Classification of Liver Disorders
The data is obtained by taking blood tests which are thought to be sensitive to
liver disorders that might arise from excessive alcohol consumption. BUPA dataset
obtained from BUPA medical research ltd (created by Richard S. Forsyth in 1990)
is used in our study for classification. It has 345 instance and 7 attributes including
class attribute. The first 5 variables are all blood tests and the sixth variable is
for the number of alchol units consumed per day. Table V 200 and 145 samples
are used for training and testing respectivily. In our simulation, ELM with 3000
hidden nodes is able to learn the data within 1.2810 minute in Pentium dual core
machine (3.0 GHz) with 1 GB RAM. ELM is able to achieve a testing of 76.50%.
The performance comparison of the ELM algorithm is shown in table VI
3.3. Classification of Lymphography Dataset - A reallife medical example
Lymphography is an x-ray study of lymph nodes and lymphatic vessels made
visible by the injection of a special dye. Classifying/predicting the Lymphography
INTERNATIONAL JOURNAL OF WISDOM BASED COMPUTING, VOL. 1(1), 2011
44
TABLE V
DESCRIPTION OF BUPA ATTRIBUTES
Attribute
Description
mcv
mean corpuscular volume
alkphos
alkaline phosphotase
sgpt
alamine aminotransferase
sgot
aspartate aminotransferase
gammagt
gamma-glutamyl transpeptidase
drinks
number of half-pint equivalents
of alcoholic beverages drunk per day
selector
field used to split data into two sets
TABLE VI
COMPARISON OF PERFORMANCE OF BUPA DATASET
Algorithm
Accuracy
S. Dehuri et.al., MOPPSO technique (2009) [17]
70.3%
In our simulation ELM
76.5%
data into four classes (namely, normal, metastases, malign lymph, and fibrosis) is
one of the difficult tasks in machine learning. The data for our study is obtained
from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia
provided by M. Zwitter and M. Soklic. The database consists of 148 instance
(normal - 2, metastases - 81, malign lymph - 61, fibrosis - 4) and 19 attributes
including the class attribute. The attribute names and its possible values are
provided in table VII.
In our simulations, ELM with 700 hidden nodes is able to learn the data within
4.55 minutes in Pentium dual core machine (3.0 GHz) with 1GB RAM. ELM is
TABLE VII
LYMPHOGRAPHY - ATTRIBUTE NAMES AND ITS VALUES
Attribute Name
Attribute Value
class
normal find, metastases, malign lymph, fibrosis
lymphatics
normal, arched, deformed, displaced
block of affere
no, yes
block of lymph. c
no, yes
block of lymph. s
no, yes
by pass
no, yes
extravasates
no, yes
regeneration
no, yes
early uptake
no, yes
lym.nodes dimin
0-3
lym.nodes enlar
1-4
changes in lym.
bean, oval, round
defect in node
no, lacunar, lac. marginal, lac. central
changes in node
no, lacunar, lac. margin, lac. central
changes in stru
no, grainy, drop-like, coarse, diluted, reticular,
stripped, fain
special forms
no, chalices, vesicles
dislocation
no, yes
exclusion of no
no, yes
no. of nodes
0-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69,
=70
Add New Comment