Journal of Transportation Technologies, 2017, 7, 206-219
http://www.scirp.org/journal/jtts
ISSN Online: 2160-0481
ISSN Print: 2160-0473
Road Traffic Crash Data:
An Overview on Sources, Problems, and
Collection Methods
Azad Abdulhafedh
University of Missouri-Columbia, MO, USA
How to cite this paper: Abdulhafedh, A.
(2017) Road Traffic Crash Data: An Over-
view on Sources, Problems, and Collection
Methods. Journal of Transportation Tech-
nologies, 7, 206-219.
https://doi.org/10.4236/jtts.2017.72015
Received: December 19, 2016
Accepted: April 27, 2017
Published: April 30, 2017
Copyright © 2017 by author and
Scientific Research Publishing Inc.
This work is licensed under the Creative
Commons Attribution International
License (CC BY 4.0).
http://creativecommons.org/licenses/by/4.0/
Open Access
Abstract
Road traffic crash data are useful tools to support the development, imple-
mentation, and assessment of highway safety programs that tend to reduce
road traffic crashes. Collecting road traffic crash data aims at gaining a better
understanding of road traffic operational problems, locating hazardous road
sections, identifying risk factors, developing accurate diagnosis and remedial
measures, and evaluating the effectiveness of road safety programs. Further-
more, they can be used by many agencies and businesses such as: law en-
forcements to identify persons at fault in road traffic crashes; insurers seeking
facts about traffic crash claims; road safety researchers to access traffic crash
reliable database; decision makers to develop long-term, statewide strategic
plans for traffic and highway safety; and highway safety administrators to help
educate the public. Given the practical importance of vehicle crash data, this
paper presents an overview of the sources, trends and problems associated
with road traffic crash data.
Keywords
Road Safety, Vehicle Crash Data, Over-Dispersion, Under-Dispersion,
Under-Reporting, FARS, NASS, HSIS
1. Introduction
Throughout the world, cars, buses, trucks, motorcycles, pedestrians, animals,
taxis and other categories of travelers, share the roadways, contributing to eco-
nomic and social development in many countries. Yet each year, many vehicles
are involved in crashes that are responsible for millions of deaths and injuries.
Globally, every year, about 1.25 million people are killed in motor vehicle crash-
*PhD in Civil Engineering.
DOI: 10.4236/jtts.2017.72015 April 30, 2017
A. Abdulhafedh
es and approximately 50 million more are injured. Vehicular crashes are the
world’s leading cause of death for individuals between the ages of one and twen-
ty-nine [1]. Following current trends, about two million people could be ex-
pected to be killed in motor vehicle crashes each year by 2030 [1]. Currently,
road crashes are ranked as the ninth most serious cause of death in the world,
and without new initiatives to improve road safety, fatal crashes will likely rise to
the third place by the year 2020 [1]. In developed countries, road traffic death
rates have decreased since the 1960s because of successful interventions such as
seat belt safety laws, enforcement of speed limits, warnings about the dangers of
mixing alcohol consumption with driving, and safer design and use of roads and
vehicles. For example, road traffic fatalities have declined by about 25.0 percent
in the United States from 2005 to 2014 and the number of people injured has
decreased 13.0 percent from 2005 to 2014 [2]. In Canada, the number of road
traffic fatalities has declined by about 62.0 percent from 1990 to 2014, and the
number of injuries has declined by about 68.0 percent during the same period
[3]. However, traffic fatalities have increased in developing countries from 1990
to 2014 (i.e. 44.0 percent in Malaysia and about 243.0 percent in China) [1]. De-
veloping countries bear a large share of the burden, accounting for 85.0 percent
of annual deaths and 90.0 percent of the disability-adjusted life years. More than
one-half of all road traffic deaths globally involve people ages 15 to 44, during
their most productive earning years. Moreover, the disability burden for this age
group accounts for about 60.0 percent of all disability-adjusted life years. The
costs and consequences of these losses are significant. Three-quarters of all poor
families who lost a member in a traffic crash reported a decrease in their stan-
dard of living, and about 61.0 percent reported having to borrow money to cover
expenses following their loss [4]. The World Bank estimates that road traffic in-
juries cost 2.0 percent to 3.0 percent of the Gross National Product of developing
countries, or twice the total amount of development aid received worldwide by
developing countries [5]. Crash-related fatalities and injuries can be prevented
or at least minimized by a joint involvement from multiple sectors (i.e. trans-
portation agencies, police, health departments, education institutions) that
oversee road safety, vehicles, and the drivers themselves. Effective interventions
include design of safer infrastructure and incorporation of road safety features
into land-use and transport planning; improvement of vehicle safety features;
improvement of post-crash care for victims of road crashes, and improvement of
driver behavior, such as setting and enforcing laws relating to key risk factors,
and raising public awareness [6]. In addition, vehicular crash data can assist with
the development of generalized theories concerning road safety. A range of basic
laws have been put forth to help explain the relationship between the occurrence
of road crashes and potential risk factors, such as: the universal law of learning,
which implies that the crash rate tends to decline as the number of kilometers
travelled increases; the law of rare events, which states that rare events, such as
environmental hazards, would have more effect on crash rates than regular
events; and the law of complexity, which implies that the more complex the traf-
207
A. Abdulhafedh
208
fic situation road users encounter, the higher the probability of crash occurrence
[7]. Although transportation agencies often try to identify the most dangerous
road sites, and put great efforts into preventive measures, such as illumination
and policy enforcement, the annual number of traffic crashes has not yet signifi-
cantly decreased. For instance, 35,092 traffic fatalities were recorded in the US
during 2015, an increase of 7.2% as compared to the previous year [8]. The fatal-
ity rate per 100 million vehicle miles traveled increased 3.7% from 2014-2015.
Thirty-five States had more motor vehicle fatalities in 2015 than in 2014. Given
this trend, it is imperative to gain a better understanding of crash data sources,
trends and problems.
2. The Importance of Collecting Vehicular Crash Data
Vehicular crash data are used to respond to requests from the congress, federal
agencies, state and local governments, universities and research organizations,
highway safety communities, the media, and private citizens. Accurate data are
required to support the development, implementation, and assessment of high-
way safety programs aimed at reducing crash tolls. An example of the practical
importance of collecting and maintaining vehicular crash data is the recent
emerging of the crash data retrieval tools, commonly referred to as the vehicle
black boxes. Based upon a rule imposed by the National Highway Traffic Safety
Administration (NHTSA), most vehicles manufactured and sold in North
America after 2012 are equipped with Event Data Recorders (EDRs) that collect,
store, and retrieve vehicle crash event data. The EDRs can help law enforcement
investigating vehicle crashes to recover crucial crash data parameters from a ve-
hicle that has been involved in a crash, including pre-crash data that will help
better understand important factors that led to the crash occurrence [9]. Anoth-
er practical example is the use of the Crash Outcome Data Evaluation System
(CODES), which is a program managed by NHTSA, to link crash records to in-
jury outcome records collected at the scene by emergency medical services.
CODES data has been utilized to improve traffic safety issues in different ways,
such as examining whether the increased crash rates for teen drivers have re-
sulted in an increased injury to their passengers, and exploring the seat belt
usage in preventing injuries and fatalities. CODES data has also been used to in-
form and educate traffic safety decision-makers at federal, state, and local levels
in many circumstances, for instance, providing federal and state legislators with
CODES reports on the importance of seat belt use in preventing injuries and fa-
talities; delivering data to the state highway administrations to develop long-
term, statewide strategic plans for traffic and highway safety; and publishing
CODES fact sheets that can help educate the public [10].
3. Road Traffic Data Collection Methods
Most studies of traffic related problems begin with the collection of data. Gener-
ally, traffic data collection methods can be classified as one of two categories: in-
trusive and non-intrusive methods. Intrusive methods typically involve a data
A. Abdulhafedh
recorder and a sensor placing on or in the road [11]. The most common intru-
sive devices are:
• Pneumatic road tubes: rubber tubes placed across the road lanes to detect ve-
hicles from pressure changes that are produced when a vehicle tire passes
over the tube. The pulse of air that is created is recorded and processed by a
counter located on the side of the road. The main drawback of this technolo-
gy is that it has limited lane coverage and its efficiency is subject to weather,
temperature and traffic conditions.
• Piezoelectric sensors: sensors are placed in a groove along roadway surface of
the lane(s) monitored. The principle is to convert mechanical energy into
electrical energy. The amplitude and frequency of the signal is directly pro-
portional to the degree of deformation.
• Magnetic loops: this is the most conventional technology used to collect traf-
fic data. The loops are embedded in roadways in a square formation that ge-
nerates a magnetic field. The information is then transmitted to a counting
device placed on the side of the road. This has a generally short life expec-
tancy because it can be damaged by heavy vehicles, but is not affected by bad
weather conditions.
Non-intrusive techniques are based on remote observations ranging from
human observation to those based on new technologies [12]:
• Manual counts: Trained observers gather traffic data such as vehicle occu-
pancy rate, pedestrians and vehicle classifications that cannot be efficiently
obtained through automated counts. Equipment needs are rather basic with
the observers usually requiring only a tally sheet, mechanical and/or elec-
tronic counting devices.
• Passive and active infra-red sensors: the presence, speed and type of vehicles
can be detected based on the infrared energy radiating from the detection
area. The main drawbacks of this method are the sensor’s performance dur-
ing bad weather, and limited lane coverage.
• Passive magnetic sensors: magnetic sensors can be fixed under or on top of
the roadbed. The sensors record the number of vehicles, their type and speed.
However, in some operating conditions, the sensors have difficulty differen-
tiating between closely spaced vehicles.
• Microwave radar sensors: these sensors can detect moving vehicles and
record vehicle counts, speed and vehicle classification and are not usually
compromised by weather conditions.
• Ultrasonic and passive acoustic sensors: these devices emit sound waves to
detect vehicles by measuring the time for the signal to return to the device.
The ultrasonic sensors can be placed directly over the lane or alongside the
road to collect vehicle counts, speed and classification data However, the col-
lection ability of these sensors can be adversely affected by temperature or
bad weather.
• Video image detection: video cameras can be used to record vehicle numbers,
type and speed by means of different video techniques e.g. trip line and
209
A. Abdulhafedh
210
tracking. Video detection systems can be sensitive to weather conditions.
The Floating Car Data (FCD) can be used to collect traffic data by locating the
vehicle via mobile phones or GPS over the entire road network. Data such as car
location, speed and direction of travel can then be sent anonymously to a central
processing center. After being collected and extracted, useful information can be
redistributed to the drivers on the road [13].
There are two important traffic measures that are widely used in modeling
traffic data, namely: the average annual daily traffic (AADT); and the vehicle
miles travelled (VMT). These two traffic variables, usually derived from fixed
sensors measurements, play a key role in traffic crash analysis and policy deci-
sions [14]. AADT is the average (calculated over a year) number of vehicles
passing a point along a particular counting section each day. Thus, AADT
represents the vehicle flow over a road section (e.g. highway segment) on an av-
erage day of the year. Methods for calculating AADT are generally based on data
from two types of counts: permanent automatic traffic counts and short-period
traffic counts. A combination of these two measurements is generally used to
obtain an AADT estimate over a larger road network. In the US, the factoring
method is a common methodology used to estimate AADT. This method has
been adopted by many transportation agencies as a standard protocol corres-
ponding with federal guidelines. The 2013 Traffic Monitoring Guide serves as a
reference document that provides general guidance on the development of traffic
monitoring programs for highway agencies. In particular, the TMG provides
guidance on the collection of traffic volume, vehicle classification, and weight
information [15]. VMT refers to the distance travelled by vehicles. It is often
used as an indicator of traffic demand and for analyzing mobility patterns and
travel trends. It plays a key role in various important decision-makings such as
air quality compliance, roadway pavement maintenance, and crash analysis.
There are four methods commonly used to calculate VMT [16]:
• Odometer readings (vehicle-based method) at regular vehicle inspections, the
average distance travelled by the vehicles is determined and then multiplied
by the number of road vehicles.
• Traffic counts (road-based method) for one considered link, the VMT is cal-
culated by multiplying the AADT by the length of the link. VMT for a road-
way can then be obtained by summing the VMT of each segment.
• Driver survey questionnaires sent to households with one or more cars soli-
citing information such as the number of miles driven by each vehicle during
the whole year and unit consumption.
• Fuel consumption the volume of road traffic is estimated from information
about fuel supply and fuel consumption as derived from estimates of miles
driven per fuel gallon for typical types of vehicles.
4. Sources of Vehicular Crash Data
In the U.S., a variety of efforts to collect, maintain and/or distribute information
on vehicular crash data have been utilized. Some of the crash data sources that
A. Abdulhafedh
are publicly available are listed below:
4.1. Fatality Analysis Reporting System (FARS)
FARS is an online database of fatal motor vehicle crashes that documents all fa-
talities that occurred within the 50 States since 1975. FARS qualifying crashes
had to involve a motor vehicle traveling on a public traffic way, and must have
resulted in the death of a motorist or a non-motorist within 30 days of the crash.
FARS is administered by the National Center for Statistics and Analysis (NCSA)
within the National Highway Traffic Safety Administration (NHTSA). FARS
data are collected from each State’s government by trained state employees, who
are responsible for gathering, and transmitting their state’s data to NCSA in a
standard format. After the data file is created, quality checks are performed on
the data, and the electronic data are made available online to the public in Statis-
tical Analysis System (SAS) data files as well as Database Files (DBF).The main
SAS data files include: the Accident file, which contains information about crash
characteristics and environmental conditions at the time of the crash; the Ve-
hicle file, which contains information describing the in-transport motor vehicles
and the drivers of in-transport motor vehicle who are involved in the crash; the
Person file, which contains information describing all persons involved in the
crash including motorists and non-motorists (e.g., pedestrians); the Damage file,
which contains information about all areas on the vehicle that were damaged in
the crash; the Drimpair file, which contains information about physical impair-
ments of drivers of motor vehicles; the Factor file, which contains information
about vehicle circumstances that may have contributed to the crash; the Violatn
file, which contains information about violations that were charged to drivers;
and the Vindecode file, which contains vehicle descriptors based on the vehicle’s
VIN. The temporal coverage of FARS data includes some variables such as, the
time of the crash, the date, the month, and the year. The spatial coverage of
FARS data includes the latitude and longitude coordinates of each crash loca-
tion. The FARS data are generally complete, reliable, and publicly available on-
line [17]. However, one of the FARS data weaknesses is that FARS data cannot
be downloaded for multiple years at a time due to the system complexities, and
when data is downloaded from FARS website, the user can obtain data by only
one variable at a time. In addition, as mentioned above, the FARS data does not
provide the injury-severity only crashes, and property-damage only crashes.
4.2. The NASS-GES
The National Automotive Sampling System (NASS)-General Estimates System
(GES) obtains its data from a representative crash sample selected from more
than five million police-reported crashes annually in the US. These crashes in-
clude those that result in a fatality or injury and those involving major property
damage as well. The data are obtained by NASS-GES data collectors in 60 geo-
graphic sites across the United States. These data collectors make visits to ap-
proximately 400 police agencies within the 60 sites, where they randomly sample
211
A. Abdulhafedh
212
about 50,000 crashes per year. NASS-GES data are made available to the public
in Statistical Analysis System (SAS) data files as well as Database Files (DBF).
The main SAS data files of NASS-GES include similar FARS files mentioned
above. The temporal coverage of the NASS-GES data includes variables such as,
time of the crash, the date, the month, and the year. The spatial coverage only
includes the land use of the crash location without providing the latitude and
longitude of the crash location or the x, y coordinates. One weakness in NASS-
GES data is that it uses a weighted data element that produces the overall na-
tional estimates that may differ from the true state-level values because they are
based on a probability sample of crashes among the country, and this cannot
give the accurate state-level estimates, which decreases the reliability of the data.
Another weakness is that the NASS-GES data are obtained either directly from
the police accident report (PAR) or by interpreting the information provided in
the PAR through reviewing the crash diagram, or combinations of data elements
on the PAR. Because of this interpretation, an important portion of data can be
missing in the system [18].
4.3. The NASS-CDS
The National Automotive Sampling System (NASS)-Crashworthiness Data Sys-
tem (CDS) obtains its data from 24 geographic sites in the US. These data are
weighted to represent all police reported motor vehicle crashes occurring in the
USA during the year including light vehicles, such as, passenger cars, SUVs, and
vans. The NASS-CDS files are available in a Statistical Analysis System (SAS)
dataset, and contain similar FARS files. The NASS-CDE system provides tem-
poral coverage of data through variables such as, time of the crash, the date, the
month, and the year. There is no spatial coverage within the NASS-CDS data, as
it does not provide the latitude and longitude of the crash location nor the x, y
coordinates. One weakness of the NASS-CDS data is that the data from these
crashes are weighted to produce national estimates, and cannot give the state-
level estimates, which decreases the reliability of data [19].
4.4. The State Data System (SDS)
The State Data System (SDS) is maintained by NHTSA’s National Center for
Statistics and Analysis (NCSA), and only thirty-two states are participating in
the system, including the state of Missouri. While the (FARS) only has fatal
crash data, SDS provides data on injury and property-damage-only crashes as
well. In contrast to the data in (NASS-GES), the SDS consists of census data
taken directly from police accident reports. The law enforcement agencies within
a state are the primary source of information on crashes occurring within a state.
All states have requirements for documenting fatal, injury or property damage
crashes (with damage above a certain dollar threshold). Each participating state
has its own reporting system, for instance, in the state of Missouri, the Missouri
Statewide Traffic Accident Records System (STARS) is managed by the Missouri
State Highway Patrol (MSHP), and all Missouri law enforcement agencies are
A. Abdulhafedh
required by law to submit a Missouri Uniform Traffic Crash Report to STARS if
a traffic crash occurred that involves a death, a personal injury, or a property
damage. STARS involves many recording files, such as, the Crash and Personal
Severity, which includes fatal, personal injury, and property damage; the Crash
Circumstances file, which includes motorcycles crashes by year; Speed Involved
Traffic Crash file; Alcohol Involved Traffic Crash file; Young Driver Involved
Traffic Crash file; and Mature Driver Involved Traffic Crash file. All files are
provided in excel and PDF format, complete, reliable, and available online for
the public (MSHP 2016). The temporal coverage of the SDS data includes va-
riables such as, time of the crash, the date, the month, and the year. The spatial
coverage only includes the x, y coordinates of the crash locations in only some
spots. One weakness of the SDS data is that it does not provide a comprehensive
list of risk variables and details that exist in the FARS and NASS-GES systems
[20].
4.5. The Highway Safety Information System (HSIS)
The Highway Safety Information System (HSIS) is a highway data system
funded by the U.S. Federal Highway Administration (FHWA), with data volun-
tarily provided to HSIS by the participating states, which are California, Wash-
ington, Minnesota, Illinois, Ohio, Maine, and North Carolina. HSIS began oper-
ation in 1987, and the participating states were selected based on their data
availability, quantity, and quality of data. HSIS supports the FHWA safety re-
search program, and can be accessed online by researchers, universities, and
safety professionals. The HSIS files are available in a (SAS) format, and the main
files include four basic files namely; the Accident file, the Vehicle file, the Occu-
pant file, and the Roadway file. The temporal coverage of the HSIS data includes
variables such as, time of the crash, date, month, and the year. The spatial cov-
erage only includes the section length, and the milepost of the crash location
without providing the latitude and longitude of the crash location nor the x, y
coordinates. The HSIS data are generally complete with very few missing data,
reliable, and publicly available. One weakness of the HSIS data is that it does not
cover all states within the US, and also their main files should be merged in or-
der to get the required information [21].
4.6. Data.Gov
The Data.gov is a federal open US government online database that includes all
states, and local government’s metadata describing their open data resources.
Data.gov began operation in 2009, and is managed and hosted by the U.S. Gen-
eral Services Administration, Office of Citizen Services and Innovative Tech-
nologies, and follows the Project Open Data schema that includes fields, such as
title, description, tags, publisher, etc. for every data set displayed on the website.
Different data topics are available, such as Agriculture, Health, Business, Cli-
mate, Energy, Finance, and Science. The transportation statistics series consists
of analyzed statistical information on motor fuel, vehicle crashes, motor vehicle
213