Informatica 29 (2005) 89–98
89
Traffic Accident Analysis Using Machine Learning Paradigms
Miao Chong1, Ajith Abraham2 and Marcin Paprzycki1, 3
1Computer Science Department, Oklahoma State University, USA, marcin@cs.okstate.edu
2School of Computer Science and Engineering, Chung-Ang University, Korea, ajith.abraham@ieee.org
3Computer Science, SWPS, Warszawa, Poland
Keywords: traffic accident, data mining, machine learning, hybrid system, decision trees, support vector machine
Received: December 20, 2004
Engineers and researchers in the automobile industry have tried to design and build safer automobiles,
but traffic accidents are unavoidable. Patterns involved in dangerous crashes could be detected if we
develop accurate prediction models capable of automatic classification of type of injury severity of
various traffic accidents. These behavioral and roadway accident patterns can be useful to develop
traffic safety control policies. We believe that to obtain the greatest possible accident reduction effects
with limited budgetary resources, it is important that measures be based on scientific and objective
surveys of the causes of accidents and severity of injuries. This paper summarizes the performance of
four machine learning paradigms applied to modeling the severity of injury that occurred during traffic
accidents. We considered neural networks trained using hybrid learning approaches, support vector
machines, decision trees and a concurrent hybrid model involving decision trees and neural networks.
Experiment results reveal that among the machine learning paradigms considered the hybrid decision
tree-neural network approach outperformed the individual approaches.
Povzetek: Štirje pristopi strojnega u?enja so uporabljeni za preiskovanje zakonitosti poškodb v
prometnih nesre?ah.
1 Introduction
The costs of fatalities and injuries due to traffic
and traffic exposure. Their study illustrated that village
accidents have a great impact on the society. In recent
sites are less hazardous than residential and shopping
years, researchers have paid increasing attention to sites. Abdalla et al. [25] studied the relationship between
determining factors that significantly affect severity of
casualty frequencies and the distance of the accidents
driver injuries caused by traffic accidents [29][30]. There
from the zones of residence. As might have been
are several approaches that researchers have employed to
anticipated, the casualty frequencies were higher nearer
study this problem. These include neural network, to the zones of residence, possibly due to higher
nesting logic formulation, log-linear model, fuzzy ART
exposure. The study revealed that the casualty rates
maps and so on.
amongst residents from areas classified as relatively
Applying data mining techniques to model deprived were significantly higher than those from
traffic accident data records can help to understand the
relatively affluent areas.
characteristics of drivers’ behaviour, roadway condition
Miaou et al. [26] studied the statistical
and weather condition that were causally connected with
properties of four regression models: two conventional
different injury severity. This can help decision makers
linear regression models and two Poisson regression
to formulate better traffic safety control policies. Roh et
models in terms of their ability to model vehicle
al. [22] illustrated how statistical methods based on accidents and highway geometric design relationships.
directed graphs, constructed over data for the recent Roadway and truck accident data from the Highway
period, may be useful in modelling traffic fatalities by
Safety Information System (HSIS) have been employed
comparing models specified using directed graphs to a
to illustrate the use and the limitations of these models. It
model, based on out-of-sample forecasts, originally was demonstrated that the conventional linear regression
developed by Peltzman [23]. The directed graphs model
models lack the distributional property to describe
outperformed Peltzman’s model in root mean squared
adequately random, discrete, nonnegative, and typically
forecast error.
sporadic vehicle accident events on the road. The Poisson
Ossenbruggen et al. [24] used a logistic regression models, on the other hand, possess most of the
regression model to identify statistically significant desirable statistical properties in developing the
factors that predict the probabilities of crashes and injury
relationships.
crashes aiming at using these models to perform a risk
Abdelwahab et al. studied the 1997 accident
assessment of a given region. These models were data for the Central Florida area [2]. The analysis
functions of factors that describe a site by its land use
focused on vehicle accidents that occurred at signalized
activity, roadside design, use of traffic control devices
intersections. The injury severity was divided into three
90 Informatica
29 (2005) 89–98
M. Chong et al.
classes: no injury, possible injury and disabling injury.
was in operation. They also found that failure to provide
They compared the performance of Multi-layered speed data at a station could significantly deteriorate
Perceptron (MLP) and Fuzzy ARTMAP, and found that
model performance within that section of the freeway.
the MLP classification accuracy is higher than the Fuzzy
Shankar et al. applied a nested logic formulation
ARTMAP. Levenberg-Marquardt algorithm was used for
for estimating accident severity likelihood conditioned
the MLP training and achieved 65.6 and 60.4 percent
on the occurrence of an accident [14]. They found that
classification accuracy for the training and testing there is a greater probability of evident injury or
phases, respectively. The Fuzzy ARTMAP achieved a
disabling injury/fatality relative to no evident injury if at
classification accuracy of 56.1 percent.
least one driver did not use a restraint system at the time
Yang et al. used neural network approach to
of the accident.
detect safer driving patterns that have less chances of
Kim et al. developed a log-linear model to
causing death and injury when a car crash occurs [17].
clarify the role of driver characteristics and behaviors in
They performed the Cramer’s V Coefficient test [18] to
the causal sequence leading to more severe injuries. They
identify significant variables that cause injury to reduce
found that alcohol or drug use and lack of seat belt use
the dimensions of the data. Then, they applied data greatly increase the odds of more severe crashes and
transformation method with a frequency-based scheme to
injuries [8].
transform categorical codes into numerical values. They
Abdel-Aty et al. used the Fatality Analysis
used the Critical Analysis Reporting Environment Reporting System (FARS) crash databases covering the
(CARE) system, which was developed at the University
period of 1975-2000 to analyze the effect of the
of Alabama, using a Backpropagation (BP) neural increasing number of Light Truck Vehicle (LTV)
network. They used the 1997 Alabama interstate alcohol-
registrations on fatal angle collision trends in the US [1].
related data, and further studied the weights on the They investigated the number of annual fatalities that
trained network to obtain a set of controllable cause resulted from angle collisions as well as collision
variables that are likely causing the injury during a crash.
configuration (car-car, car-LTV, LTV-car, and LTV-
The target variable in their study had two classes: injury
LTV). Time series modeling results showed that fatalities
and non-injury, in which injury class included fatalities.
in angle collisions will increase in the next 10 years, and
They found that by controlling a single variable (such as
that they are affected by the expected overall increase of
the driving speed, or the light conditions) they potentially
the percentage of LTVs in traffic.
could reduce fatalities and injuries by up to 40%.
Bedard et al. applied a multivariate logistic
Sohn et al. applied data fusion, ensemble and
regression to determine the independent contribution of
clustering to improve the accuracy of individual driver, crash, and vehicle characteristics to drivers’
classifiers for two categories of severity (bodily injury
fatality risk [3]. They found that increasing seatbelt use,
and property damage) of road traffic accidents [15]. The
reducing speed, and reducing the number and severity of
individual classifiers used were neural network and driver-side impacts might prevent fatalities. Evanco
decision tree. They applied a clustering algorithm to the
conducted a multivariate population-based statistical
dataset to divide it into subsets, and then used each analysis to determine the relationship between fatalities
subset of data to train the classifiers. They found that
and accident notification times [6]. The analysis
classification based on clustering works better if the demonstrated that accident notification time is an
variation in observations is relatively large as in Korean
important determinant of the number of fatalities for
road traffic accident data.
accidents on rural roadways.
Mussone et al. used neural networks to analyze
Ossiander et al. used Poisson regression to
vehicle accident that occurred at intersections in Milan,
analyze the association between the fatal crash rate (fatal
Italy [12]. They chose feed-forward MLP using BP crashes per vehicle mile traveled) and the speed limit
learning. The model had 10 input nodes for eight increase [13]. They found that the speed limit increase
variables (day or night, traffic flows circulating in the
was associated with a higher fatal crash rate and more
intersection, number of virtual conflict points, number of
deaths on freeways in Washington State.
real conflict points, type of intersection, accident type,
Finally, researchers studied the relationship
road surface condition, and weather conditions). The between drivers’ age, gender, vehicle mass, impact speed
output node was called an accident index and was or driving speed measure with fatalities and the results of
calculated as the ratio between the number of accidents
their work can be found in [4, 9, 10, 11, 16].
for a given intersection and the number of accidents at
This paper investigates application of neural
the most dangerous intersection. Results showed that the
networks, decision trees and a hybrid combination of
highest accident index for running over of pedestrian decision tree and neural network to build models that
occurs at non-signalized intersections at nighttime.
could predict injury severity. The remaining parts of the
Dia et al. used real-world data for developing a
paper are organized as follows. In Section 2, more details
multi-layered MLP neural network freeway incident about the problem and the pre-processing of data to be
detection model [5]. They compared the performance of
used are presented, followed, in Section 3, by a short
the neural network model and the incident detection description the different machine learning paradigms
model in operation on Melbourne’s freeways. Results used. Performance analysis is presented in Section 4 and
showed that neural network model could provide faster
finally some discussions and conclusions are given
and more reliable incident detection over the model that
towards the end.
TRAFFIC ACCIDENT ANALYSIS USING...
Informatica 29 (2005) 89–98
91
2 Accident Data Set
collision only. Head-on collision has a total of 10,386
records, where 160 records show the result as a fatal
A. Description of the Dataset
injury; all of these 160 records have the initial point of
impact categorized as front.
This study used data from the National Automotive
The initial point of impact has 9 categories: no
Sampling System (NASS) General Estimates System damage/non-collision, front, right side, left side, back,
(GES) [21]. The GES datasets are intended to be a front right corner, front left corner, back right corner,
nationally representative probability samples from the back left corner. The head-on collision with front impact
annual estimated 6.4 million accident reports in the has 10,251 records; this is 98.70% of the 10,386 head-on
United States. The initial dataset for the study contained
collision records. We have therefore decided to focus on
traffic accident records from 1995 to 2000, a total front impact only and removed the remaining 135
number of 417,670 cases. According to the variable records. Travel speed and speed limit were not used in
definitions for the GES dataset, this dataset has drivers’
the model because in the dataset there are too many
records only and does not include passengers’ records with unknown value. Specifically, for 67.68% of
information. The total set includes labels of year, month,
records the travel speed during accident and local speed
region, primary sampling unit, the number describing the
limit were unknown. This means that the remaining input
police jurisdiction, case number, person number, vehicle
variables were: drivers’ age, gender, alcohol usage,
number, vehicle make and model; inputs of drivers’ age,
restraint system, eject, vehicle body type, vehicle role,
gender, alcohol usage, restraint system, eject, vehicle vehicle age, rollover, road surface condition, light
body type, vehicle age, vehicle role, initial point of condition. Table 1 summarizes the driver injury severity
impact, manner of collision, rollover, roadway surface
distribution for head-on collision and front impact point
condition, light condition, travel speed, speed limit and
dataset. From Table 1, it is immediately evident that the
the output injury severity. The injury severity has five
alcohol usage and not using seat belt, ejection of driver,
classes: no injury, possible injury, non-incapacitating
driver’s age (>65), vehicle rollover, and lighting
injury, incapacitating injury, and fatal injury. In the condition can be associated with higher percentages of
original dataset, 70.18% of the cases have output of no
fatal injury, incapacitating injury and non-incapacitating
injury, 16.07% of the cases have output of possible injury.
injury, 9.48% of the cases have output of non-
There are only single vehicles with ages 37, 41,
incapacitating injury, 4.02% of the cases have output of
46 and 56 years reported in the dataset and therefore
incapacitating injury, and 0.25% of the cases have fatal
these four records were deleted from the dataset (since
injury.
they were clear outliers). After the preprocessing was
Our task was to develop machine learning based
completed, the final dataset used for modeling had
intelligent models that could accurately classify the 10,247 records. There were 5,171 (50.46%) records with
severity of injuries (5 categories). This can in turn lead to
no injury, 2138 (20.86%) records with possible injury,
greater understanding of the relationship between the 1721 (16.80%) records with non-incapacitating injury,
factors of driver, vehicle, roadway, and environment and
1057 (10.32%) records with incapacitating injury, and
driver injury severity. Accurate results of such data 160 (1.56%) records with fatal injury. We have separated
analysis could provide crucial information for the road
each output class and used one-against-all approach. This
accident prevention policy. The records in the dataset are
approach selects one output class to be the positive class,
input/output pairs with each record have an associated
and all the other classes are combined to be the negative
output. The output variable, the injury severity, is class. We set the output value of the positive class to 1,
categorical and (as described above) has five classes. A
and the (combined) negative classes to 0. We divided the
supervised learning algorithm will try to map an input
datasets randomly into 60%, 20%, and 20% for training,
vector to the desired output class.
cross-validation, and testing respectively.
B. Data Preparation
To make sure that our data preparation is valid,
we have checked the correctness of attribute selection.
When the input and output variables are considered there
There are several attribute selection techniques to find a
are no conflicts between the attributes since each variable
minimum set of attributes so that the resulting probability
represents its own characteristics. Variables are already
distribution of the data classes is as close as possible to
categorized and represented by numbers. The manner in
the original distribution of all attributes. To determine the
which the collision occurred has 7 categories: non-
best and worst attributes, we used the chi-squared (?2)
collision, rear-end, head-on, rear-to-rear, angle, test to determine the dependence of input and output
sideswipe same direction, and sideswipe opposite variables. The ?2 test indicated that all the variables are
direction. For these 7 categories the distribution of the
significant (p-value < 0.05).
fatal injury is as follows: 0.56% for non collision, 0.08%
for rear-end collision, 1.54% for head-on collision,
0.00% for rear-to-rear collision, 0.20% for angle 3. Machine Learning Paradigms
collision, 0.08% for sideswipe same direction collision,
A. Artificial Neural Networks Using Hybrid Learning
0.49% for sideswipe opposite direction collision. Since
head-on collision has the highest percent of fatal injury; A Multilayer Perceptron (MLP) is a feed forward neural
therefore, the dataset was narrowed down to head-on network with one or more hidden layers.
92 Informatica
29 (2005) 89–98
M. Chong et al.
Table 1: Driver injury severity distribution
Non-
Factor
No Injury
Pos injury
incapacitating Incapacitating Fatal Total
Age
0 (24&under)
1629(52.80%)
608(19.71%)
505(16.37%)
307(9.95%)
36(1.17%)
3085
1 (25-64)
3171(49.88%)
1362(21.43%)
1075(16.91%)
654(10.29%)
95(1.49%)
6357
2 (65+)
373(46.11%)
168(20.77%)
143(17.68%)
96(11.87%)
29(3.58%)
809
Gender
0 (Female)
1749(41.95%)
1072(25.71%)
778(18.66%)
507(12.16%)
63(1.51%)
4169
1 (Male)
3424(56.30%)
1066(17.53%)
945(15.54%)
550(9.04%)
97(1.59%)
6082
Eject
0
(No
Eject) 5171(50.55%) 2137(20.89%)
1719(16.80%) 1047(10.23%) 156(1.52%) 10230
1 (Eject)
2(9.52%)
1(4.76%)
4(19.05%)
10(47.62%)
4(19.05%)
21
Alcohol
0 (No Alcohol)
4997(51.35%)
2067(21.24%)
1600(16.44%)
935(9.61%)
133(1.37%) 9732
1 (Alcohol)
176(33.91%)
71(13.68%)
123(23.70%)
122(23.51%)
27(5.20%)
519
Restraining System
0
(Not
Used) 337(27.44%) 193(15.72%) 336(27.36%) 283(23.05%) 79(6.43%) 1228
1 (Used)
4836(53.60%)
1945(21.56%)
1387(15.37%)
774(8.58%)
81(0.90%)
9023
Body Type
0 (cars)
3408(47.49%)
1600(22.30%)
1272(17.73%)
780(10.87%) 116(1.62%) 7176
1 (SUV &Van)
747(56.59%)
259(19.62%)
189(14.32%)
111(8.41%)
14(1.06%)
1320
2 (Truck)
1018(58.01%)
279(15.90%)
262(14.93%)
166(9.46%)
30(1.71%)
1755
Vehicle Role
1 (Striking)
4742(49.86%)
2011(21.15%)
1636(17.20%)
970(10.20%) 151(1.59%) 9510
2 (Struck)
261(72.70%)
54(15.04%)
29(8.08%)
15(4.18%)
0(0%)
359
3 (Both)
170(44.50%)
73(19.11%)
58(15.18%)
72(18.85%)
9(2.36%)
382
Rollover
0
(No-rollover) 5069(50.78%) 2123(20.85%)
1699(16.69%) 1037(10.19%) 152(1.49%) 10180
1 (Rollover)
4(5.63%)
15(21.13%)
24(33.80%)
20(28.17%)
8(11.27%)
71
Road Surface Condition
0 (Dry)
3467(49.97%)
1404(20.24%)
1190(17.15%)
750(10.81%) 127(1.83%) 6938
1 (Slippery)
1706(51.49%)
734(22.16%)
533 (16.09%)
307(9.27%)
33(1.00%)
3313
Light Condition
0 (Daylight)
3613(51.18%)
1487(21.06%)
1174(16.63%)
688(9.75%)
98(1.39%)
7060
1(Partial dark)
1139(52.71%)
465(21.52%)
348(16.10%)
186(8.61%)
23(1.06%)
2161
2
(Dark)
421(40.87%) 186(18.06%) 201(19.51%) 183(17.77%) 39(3.79%) 1030
data classified correctly, or in other words, is to map
The network consists of an input layer of source neurons,
{x
at least one hidden layer of computational neurons, and
1(p) to d1(p)}, …, {xi(p) to di(p)}, and eventually {xn(p)
to d
an output layer of computational neurons. The input layer
n(p)}. The algorithm starts with initializing all the
weights (w) and threshold (?) levels of the network to
accepts input signals and redistributes these signals to all
small random numbers. Then calculate the actual output
neurons in the hidden layer. The output layer accepts a
of the neurons in the hidden layer as:
stimulus pattern from the hidden layer and establishes the
y
x
output pattern of the entire network. The MLP neural
i(p) = f [?(i=1 to n) i(p) * wij(p) - ?j],
networks training phase works as follows: given a where n is the number of inputs of neuron j in the hidden
collection of training data {x
layer. Next calculate the actual outputs of the neurons in
1(p), d1(p)}, …, {xi(p),
d
the output layer as:
i(p)}, …, {xn(p), dn(p)}, the objective is to obtain a set
y
x
of weights that makes almost all the tuples in the training
k(p) = f [?(j=1 to m) jk(p) * wjk(p) - ?k],
TRAFFIC ACCIDENT ANALYSIS USING...
Informatica 29 (2005) 89–98
93
where m is the number of inputs of neuron k in the
represent class labels or class distribution. CART
output layer. The weight training is to update the weights
operates by choosing the best variable for splitting the
using the Backpropagation (BP) learning method with
data into two groups at the root node, partitioning the
the error function:
data into two disjoint branches in such a way that the
E (w) = ?
?
class labels in each branch are as homogeneous as
(p= 1 to PT)
(i= 1 to l) [di(p) – yi(p)]2 ,
where
possible, and then splitting is recursively applied to each
E (w) = error function to be minimized,
branch, and so forth.
w = weight vector,
If a dataset T contains examples from n classes,
PT = number of training patterns,
gini index, gini(T) is defined as: gini (T) = 1 - ?j=1 to n
l = number of output neurons,
pj^2, where pj is the relative frequency of class j in T
di(p) = desired output of neuron I when pattern p [31]. If dataset T is split into two subsets T1 and T2 with
is introduced to the MLP, and
sizes N1 and N2, the gini index of the split data contains
yi(p) = actual output of the neuron I when examples from n classes, the gini index gini(T) is defined
pattern p is introduced to the MLP. The objective of
as:
weight training is to change the weight vector w so that
gini split (T) = N1/N gini(T1) + N2/N gini(T2).
the error function is minimized. By minimizing the error
CART exhaustively searches for univariate
function, the actual output is driven closer to the desired
splits. The attribute provides the smallest gini split (T) is
output.
chosen to split the node. CART recursively expands the
Empirical research [19] has shown that the BP
tree from a root node, and then gradually prunes back the
used for training neural networks has the following large tree. The advantage of a decision tree is the
problems:
extraction of classification rules from trees that is very
•
straightforward. More precisely, a decision tree can
BP often gets trapped in a local minimum mainly
represent the knowledge in the form of if-then rules; one
because of the random initialization of weights.
rule is created for each path from the root to a leaf node.
• BP usually generalizes quite well to detect the global C. Support Vector Machines
features of the input but after prolonged training the
network will start to recognize individual Support Vector Machine (SVM) is based on statistical
input/output pair rather than settling for weights that
learning theory [28] . SVMs have been successfully
generally describe the mapping for the whole training
applied to a number of applications ranging from
set.
handwriting recognition, intrusion detection in computer
The second popular training algorithm for networks, and text categorization to image classification,
neural networks is Scaled Conjugate Gradient Algorithm
breast cancer diagnosis and prognosis and
(SCGA). Moller [20] introduced it as a way of avoiding
bioinformatics. SVM involves two key techniques, one is
the complicated line search procedure of conventional
the mathematical programming and the other is kernel
conjugate gradient algorithm (CGA). According to the
functions. Here, parameters are found by solving a
SCGA, the Hessian matrix is approximated by
quadratic programming problem with linear equality and
E '
inequality constraints; rather than by solving a non-
(
'
"
k
w + ? k pk ) ? E ( k
w
E (w
convex, unconstrained optimization problem. SVMs are
k ) pk =
) + ?k pk
?
k
kernel-based learning algorithms in which only a fraction
of the training examples are used in the solution (these
where E' and E" are the first and second derivative are called the support vectors), and where the objective
information of global error function E (wk). The other of learning is to maximize a margin around the decision
terms pk, ?k and ?k represent the weights, search direction,
surface. The flexibility of kernel functions allows the
parameter controlling the change in weight for the SVM to search a wide variety of hypothesis spaces. The
second derivative approximation and parameter for basic idea of applying SVMs to pattern classification can
regulating the indefiniteness of the Hessian. In order to
be stated briefly as: first map the input vectors into one
obtain a good, quadratic, approximation of E, a feature space (possible with a higher dimension), either
mechanism to raise and lower ?k is needed when the linearly or nonlinearly, whichever is relevant to the
Hessian is positive definite. Detailed step-by-step selection of the kernel function; then within the feature
description can be found in [20].
space, seek an optimized linear division, i.e. construct a
In order to minimize the above-mentioned hyperplane which separates two classes.
problems resulting from the BP training, we used a
For a set of n training examples (x
combination of BP and SCG for training.
i, yi), where xi
? Rd and yi ?
{-1, +1}, suppose there is a hyperplane,
B. Decision Trees
which separates the positive from the negative examples.
The points x which lie on the hyperplane (H
Decision trees are well-known algorithm for
0) satisfy w ·
x + b = 0, the algorithm finds this hyperplane (H
classification problems. The Classification and
0) and
other two hyperplanes (H
Regression Trees (CART) model consists of a hierarchy
1, H2) parallel and equidistant to
H
of univariate binary decisions. Each internal node in the
0,
H
tree specifies a binary test on a single variable, branch
1: w · xi + b = 1, H2: w · xi + b = -1,
represents an outcome of the test, each leaf node
94 Informatica
29 (2005) 89–98
M. Chong et al.
H1 and H2 are parallel and no training points fall between
higher level [33]. The overall functioning of the
them. Support vector algorithm looks for the separating
system depends on the correct functionality of all the
hyperplane and maximizes the distance between H1 and
layers. Figure 1 illustrates the hybrid decision tree-ANN
H2. So there will be some positive examples on H1 and
(DTANN) model for predicting drivers’ injury severity.
some negative examples on H2. These examples are We used a concurrent hybrid model where traffic
called support vectors. The distance between H1 and H2
accidents data are fed to the decision tree to generate the
is 2/||w||, in order to maximize the distance, we should
node information. Terminal nodes were numbered left to
minimize ||w|| = wTw, subject to constraints yi (w · xi + b)
right starting with 1. All the data set records were
>= 1, ?
assigned to one of the terminal nodes, which represented
i
Introducing Lagrangian multipliers ?
the particular class or subset. The training data together
1, ?2, …, ?n>=0,
the learning task becomes
with the node information were supplied for training the
L (w, b, ?) = ½ wTw - ?
ANN. Figure 2 illustrates a decision tree structure with
i=1 to n ?I[yi(w · xi + b) – 1]
The above equation is for two classes that are linearly
the node numbering. For the hybrid decision tree–ANN,
separable. When the two classes are non-linearly we used the same hybrid learning algorithms and
separable, SVM can transform the data points to another
parameters setting as we used for ANN (except for the
high dimensional space. Detailed description to the number of hidden neurons). Experiments were performed
theory of SVMs for pattern recognition can be found in
with different number of hidden neurons and models
[32].
were selected with the highest classification accuracy for
the output class.
4. Performance Analysis
A. Neural Networks
In the case of neural network based modeling, the
hyperbolic activation function was used in the hidden
layer and the logistic activation function in the output
layer. Models were trained with BP (100 epochs,
learning rate 0.01) and SCGA (500 epochs) to minimize
the Mean Squared Error (MSE). For each output class,
we experimented with different number of hidden
neurons, and report the model with highest classification
accuracy for the class. From the experiment results, for
the no injury class the best model had 65 hidden neurons,
Fig. 1. Hybrid concurrent decision tree-ANN model for
and achieved training and testing performance of 63.86%
accident data
and 60.45% respectively. For the possible injury class,
the best model had 65 hidden neurons achieving it’s
training and testing performance of 59.34% and 57.58%
respectively. For the non-incapacitating injury class, the
best model had 75 hidden neurons achieving training and
testing performance of 58.71% and 56.8% respectively.
For the incapacitating injury class, the best model had 60
hidden neurons achieving training and testing
performance of 63.40% and 63.36% respectively.
Finally, for the fatal injury class, the best model had 45
hidden neurons achieving training and testing
performance of 78.61% and 78.17% respectively. These
results are the summary of multiple experiments (for
variable no of hidden neurons and for a number of
Fig. 2. Decision tree structure
attempts with random initial weight distributions
resulting in almost exact performance of the trained
D. Hybrid Decision Tree-ANN (DTANN)
network) and are presented in Table 2.
A hybrid intelligent system uses the approach of B. Decision Trees
integrating different learning or decision-making models.
Each learning model works in a different manner and
We have experimented with a number of setups of
exploits different set of features. Integrating different decision tree parameters and report the best results
learning models gives better performance than the obtained for our dataset. We trained each class with Gini
individual learning or decision-making models by goodness of fit measure, the prior class probabilities
reducing their individual limitations and exploiting their
parameter was set to equal, the stopping option for
different mechanisms. In a hierarchical hybrid intelligent
pruning was misclassification error, the minimum n per
system each layer provides some new information to the
node was set to 5, the fraction of objects was 0.05, the
TRAFFIC ACCIDENT ANALYSIS USING...
Informatica 29 (2005) 89–98
95
maximum number of nodes was 1000, the maximum testing ensured that the patterns found will hold up when
number of levels in the tree was 32, the number of applied to new data.
surrogates was 5, we used 10 fold cross-validation, and
generated comprehensive results. The cross-validation
Table 2. Neural network performance
Table 2. Neural
network performance
Possible Injury
Non-incapacitating
Incapacitating
Fatal Injury
No Injury
#
Accuracy %
#
Accuracy %
#
Accuracy %
#
Accuracy %
#
Accuracy %
neuron
neuron
neuron
neuron
neuron
s
Train Test s
Train Test s
Train
Test
s
Train
Test
s
Train
Test
60 63.57
59.67 65
59.34 57.58
60 57.88
55.25
60
63.4
63.36
45
77.26
75.17
65
63.86 60.45
70 59.56
55.15
65 57.69
54.66
65 62.23
61.32 57 74.78
70.65
70 63.93
60.25 75 58.88
57.29
75
58.71
56.80
75 61.06
61.52 65 69.81
69.73
75 64.38
57.43 80 58.39
56.22
80 57.78
54.13
84 63.23
58.41 75 60.19
59.62
80 63.64
58.89 95 60.07
55.93
85 57.83
55.59
90 59.32
59.08 80 74.33
71.77
Table 3: Performance of SVM using radial basis function kernel
g=0.0001
g=0.001
g=0.5
g=1.2
g=1.5
g=2
g=0.00001 g=0.0001 g=0.001
c=42.8758 c=4.6594
c=0.5
c=0.5
c=2
c=10
c=100
c=100
c=100
No injury
Class
0
59.76 59.80 57.95 57.65
53.62
54.12 57.34 59.76 60.46
Class 1
60.14
60.14
60.82
55.63
55.73
55.53
62.88 60.14 60.14
Possible injury
Class 0
100.00
100.00
100.00
99.88
95.33
95.58
100.00
100.00
100.00
Class 1
0.00
0.00
0.00
0.00
3.67 3.42 0.00
0.00 0.00
Non-incapacitating
Class 0
100.00
100.00
100.00
100.00
97.43
97.49
100.00
100.00
100.00
Class 1
0.00
0.00
0.00
0.00
3.21 2.92 0.00
0.00 0.00
Incapacitating
Class 0
100.00
100.00
100.00
99.89
98.06
98.11
100.00
100.00
100.00
Class 1
0.00
0.00
0.00
0.00
2.83
2.83 0.00 0.00 0.00
Fatal Injury
Class 0
100.00
100.00
100.00
100.00
99.95
99.95
100.00
100.00
100.00
Class 1
0.00
0.00
0.00
0.00
3.33
3.33 0.00 0.00 0.00
96 Informatica
29 (2005) 89–98
M. Chong et al.
Table 4. Decision tree performance
Injury Class
Accuracy (%)
No Injury
67.54
Possible Injury
64.40
Non-incapacitating Injury
60.37
Incapacitating Injury
71.38
Fatal Injury
89.46
The performance for no injury, possible injury, non-
Fig. 5: Non-incapacitating injury tree structure
incapacitating injury, incapacitating injury and fatal
injury models was 67.54%, 64.39%, 60.37%, 71.38%,
and 89.46% respectively. Empirical results including
classification matrix are illustrated in Table 4. The
developed decision trees are depicted in Figures 3-7.
Each of these trees has a completely different structure
and number of nodes and leaves. Note, that information
stored in leaves of exactly these decision trees has been
used in developing the hybrid decision tree – neural
network model.
Fig. 6: Incapacitating injury tree structure
Fig. 3: No injury tree structure
Fig. 7: Fatal injury tree structure
C. Support Vector Machines
In our experiments we used the SVMlight [27] and
selected the polynomial and radial basis function kernels.
For an unknown reason, the polynomial kernel was not
successful and hence we only focused on the radial basis
function (RBF) kernels. Table 3 illustrates the SVM
performance for the different parameter settings and the
obtained accuracies for each class.
D. Hybrid DT-ANN Approach
Fig. 4: Possible injury tree structure
In the case of the hybrid approach, for the no injury class
the best model had 70 hidden neurons, with training and
testing performance of 83.02% and 65.12% respectively.
TRAFFIC ACCIDENT ANALYSIS USING...
Informatica 29 (2005) 89–98
97
For the possible injury class, the best model had 98 neural network. The no injury and the possible injury
hidden neurons with training and testing performance of
classes could be best modeled directly by decision trees.
74.93% and 63.10% respectively. For the non-
Past research focused mainly on distinguishing
incapacitating injury class, the best model had 109 between no-injury and injury (including fatality) classes.
hidden neurons with training and testing performance of
We extended the research to possible injury, non-
71.88% and 62.24% respectively. For the incapacitating
incapacitating injury, incapacitating injury, and fatal
injury class, the best model had 102 hidden neurons, with
injury classes. Our experiments showed that the model
training and testing performance of 77.95% and 72.63%
for fatal and non-fatal injury performed better than other
respectively. Finally, for the fatal injury class, the best
classes. The ability of predicting fatal and non-fatal
model had 76 hidden neurons with training and testing
injury is very important since drivers’ fatality has the
performance of 91.53% and 90.00% respectively. These
highest cost to society economically and socially.
are the best models out of multiple experiments varying
It is well known that one of the very important
various parameters of the ANN and the decision tree.
factors causing different injury level is the actual speed
Empirical results are presented in Table 5 and the final
that the vehicle was going when the accident happened.
comparison between ANN, DT and DTANN is Unfortunately, our dataset doesn’t provide enough
graphically illustrated in Figure 8. For all the output information on the actual speed since speed for 67.68%
classes, the hybrid DTANN outperformed the ANN. For
of the data records’ was unknown. If the speed was
non-incapacitating injury, incapacitating injury, and fatal
available, it is extremely likely that it could have helped
injury classes, the hybrid DTANN outperformed both to improve the performance of models studied in this
ANN and DT.
paper.
6. References
[1] Abdel-Aty, M., and Abdelwahab, H., Analysis and
Prediction of Traffic Fatalities Resulting From
Angle Collisions Including the Effect of Vehicles’
Configuration and Compatibility. Accident Analysis
and Prevention, 2003.
Fig. 8. Performance comparison of the different learning
paradigms
[2] Abdelwahab, H. T. and Abdel-Aty, M. A.,
Development of Artificial Neural Network Models
Table 5. Test performance of DTANN
to Predict Driver Injury Severity in Traffic
Accidents at Signalized Intersections.
Transportation Research Record 1746, Paper No.
Injury type
% Accuracy
01-2234.
[3] Bedard, M., Guyatt, G. H., Stones, M. J., & Hireds,
No injury
65.12
J. P., The Independent Contribution of Driver,
Possible injury
63.10
Crash, and Vehicle Characteristics to Driver
Non-incapacitating injury
62.24
Fatalities. Accident analysis and Prevention, Vol.
34, pp. 717-727, 2002.
Incapacitating injury
72.63
Fatal injury
90.00
[4] Buzeman, D. G., Viano, D. C., & Lovsund, P., Car
Occupant Safety in Frontal Crashes: A Parameter
Study of Vehicle Mass, Impact Speed, and Inherent
Vehicle Protection. Accident Analysis and
5. Concluding Remarks
Prevention, Vol. 30, No. 6, pp. 713-722, 1998.
[5] Dia, H., & Rose, G., Development and Evaluation
In this paper, we analyzed the GES automobile accident
of Neural Network Freeway Incident Detection
data from 1995 to 2000 and investigated the performance
Models Using Field Data. Transportation Research
of neural network, decision tree, support vector machines
C, Vol. 5, No. 5, 1997, pp. 313-331.
and a hybrid decision tree – neural network based
approaches to predicting drivers’ injury severity in head-
[6] Evanco, W. M., The Potential Impact of Rural
on front impact point collisions. The classification
Mayday Systems on Vehicular Crash Fatalities.
accuracy obtained in our experiments reveals that, for the
Accident Analysis and Prevention, Vol. 31, 1999,
non-incapacitating injury, the incapacitating injury, and
pp. 455-462.
the fatal injury classes, the hybrid approach performed
[7] Hand, D., Mannila, H., & Smyth, P., Principles of
better than neural network, decision trees and support
Data Mining. The MIT Press, 2001.
vector machines. For the no injury and the possible
injury classes, the hybrid approach performed better than
[8] Kim, K., Nitz, L., Richardson, J., & Li, L., Personal
and Behavioral Predictors of Automobile Crash and
98 Informatica
29 (2005) 89–98
M. Chong et al.
Injury Severity. Accident Analysis and Prevention,
Algorithm for Fast Supervised Learning, Neural
Vol. 27, No. 4, 1995, pp. 469-481.
Networks, Volume (6), pp. 525-533, 1993.
[9] Kweon, Y. J., & Kockelman, D. M., Overall Injury
[21] National Center for Statistics and Analysis
Risk to Different Drivers: Combining Exposure,
http://www-nrd.nhtsa.dot.gov/departments/nrd-
Frequency, and Severity Models. Accident Analysis
30/ncsa/NASS.html
and Prevention, Vol. 35, 2003, pp. 441-450.
[22] Roh J.W., Bessler D.A. and Gilbert R.F., Traffic
[10] Martin, P. G., Crandall, J. R., & Pilkey, W. D.,
fatalities, Peltzman’s model, and directed graphs,
Injury Trends of Passenger Car Drivers In the USA.
Accident Analysis & Prevention, Volume 31, Issues
Accident Analysis and Prevention, Vol. 32, 2000,
1-2, pp. 55-61, 1998.
pp. 541-557.
[23] Peltzman, S., The effects of automobile safety
[11] Mayhew, D. R., Ferguson, S. A., Desmond, K. J., &
regulation. Journal of Political Economy 83, pp.
Simpson, G. M., Trends In Fatal Crashes Involving
677–725, 1975.
Female Drivers, 1975-1998. Accident Analysis and
[24] Ossenbruggen, P.J., Pendharkar, J. and Ivan, J.,
Prevention, Vol. 35, 2003, pp. 407-415.
Roadway safety in rural and small urbanized areas.
[12] Mussone, L., Ferrari, A., & Oneta, M., An analysis
Accid. Anal. Prev. 33 4, pp. 485–498, 2001.
of urban collisions using an artificial intelligence
[25] Abdalla, I.M., Robert, R., Derek, B. and
model. Accident Analysis and Prevention, Vol. 31,
McGuicagan, D.R.D., An investigation into the
1999, pp. 705-718.
relationships between area social characteristics and
[13] Ossiander, E. M., & Cummings, P., Freeway speed
road accident casualties. Accid. Anal. Prev. 29 5,
limits and Traffic Fatalities in Washington State.
pp. 583–593, 1997.
Accident Analysis and Prevention, Vol. 34, 2002,
[26] Miaou, S.P. and Harry, L., Modeling vehicle
pp. 13-18.
accidents and highway geometric design
[14] Shankar, V., Mannering, F., & Barfield, W.,
relationships. Accid. Anal. Prev. 25 6, pp. 689–709,
Statistical Analysis of Accident Severity on Rural
1993.
Freeways. Accident Analysis and Prevention, Vol.
[27] SVMlight.
28, No. 3, 1996, pp.391-401.
http://www.cs.cornell.edu/People/tj/svm_light/.
[15] Sohn, S. Y., & Lee, S. H., Data Fusion, Ensemble
Access date: May, 2003.
and Clustering to Improve the Classification [28] Vapnik, V. N., The Nature of Statistical Learning
Accuracy for the Severity of Road Traffic
Theory. Springer, 1995.
Accidents in Korea. Safety Science, Vol. 4, issue1,
February 2003, pp. 1-14.
[29] Chong M., Abraham A., Paprzycki M., Traffic
Accident Data Mining Using Machine Learning
[16] Tavris, D. R., Kuhn, E. M, & Layde, P. M., Age
Paradigms, Fourth International Conference on
and Gender Patterns In Motor Vehicle Crash
Intelligent Systems Design and Applications
injuries: Improtance of Type of Crash and Occupant
(ISDA'04), Hungary, ISBN 9637154302, pp. 415-
Role. Accident Analysis and Prevention, Vol. 33,
420, 2004.
2001, pp. 167-172.
[30] Chong M., Abraham A., Paprzycki M., Traffic
[17] Yang, W.T., Chen, H. C., & Brown, D. B.,
Accident Analysis Using Decision Trees and Neural
Detecting Safer Driving Patterns By A Neural
Networks, IADIS International Conference on
Network Approach. ANNIE ’99 for the Proceedings
of Smart Engineering System Design Neural
Applied Computing, Portugal, IADIS Press, Nuno
Network, Evolutionary Programming, Complex
Guimarães and Pedro Isaías (Eds.), ISBN:
Systems and Data Mining
9729894736, Volume 2, pp. 39-42, 2004.
, Vol. 9, pp 839-844, Nov.
1999.
[31] Eui-Hong (Sam) Han, Shashi Shekhar, Vipin
Kumar, M. Ganesh, Jaideep Srivastava, Search
[18] Zembowicz, R. and Zytkow, J. M., 1996. From
Framework for Mining Classification Decision
Contingency Tables to Various Forms of
Trees, 1996. umn.edu/dept/users/kumar/dmclass.ps
Knowledge in Database. Advances in knowledge
Discovery and Data Mining, editors, Fayyad, U. M.,
[32] N. Cristianini and J. Shawe-Taylor, An Introduction
Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.
to Support Vector Machines, Cambridge University
AAAI Press/The MIT Press, pp.329-349.
Press, 2000.
[19] Abraham, A., Meta-Learning Evolutionary [33] Abraham, Intelligent Systems: Architectures and
Artificial Neural Networks, Neurocomputing
Perspectives, Recent Advances in Intelligent
Journal, Elsevier Science, Netherlands, Vol. 56c,
Paradigms and Applications, Abraham A., Jain L.
pp. 1-38, 2004.
and Kacprzyk J. (Eds.), Studies in Fuzziness and
Soft Computing, Springer Verlag Germany,
[20] Moller, A.F., A Scaled Conjugate Gradient
Chapter 1, pp. 1-35, 2002.
Add New Comment