logo资料库

道路交通事故预测模型:不同的统计建模方法.pdf

第1页 / 共16页
第2页 / 共16页
第3页 / 共16页
第4页 / 共16页
第5页 / 共16页
第6页 / 共16页
第7页 / 共16页
第8页 / 共16页
资料共16页,剩余部分请下载后查看
Road Crash Prediction Models: Different Statistical Modeling Approaches
Abstract
Keywords
1. Introduction
2. The Importance of Traffic Accidents Prediction Models
3. Factors Affecting Road Traffic Accidents
4. The Costs of Road Traffic Accidents
5. Literature Review
6. A Review on the Statistical Approaches of Road Crash Prediction Models
6.1. Multiple Linear Regression
6.2. Poisson Regression
6.3. Negative Binomial Regression Model (NB)
6.4. Poisson-Lognormal Regression Model
6.5. Zero Inflated Poisson and Negative Binomial Regression Models
6.6. Conway-Maxwell Poisson Regression Models
6.7. Random-Parameter Models
6.8. Artificial Neural Networks and Fuzzy Logic models
6.9. Logit and Probit Models
7. Conclusion
References
Journal of Transportation Technologies, 2017, 7, 190-205 http://www.scirp.org/journal/jtts ISSN Online: 2160-0481 ISSN Print: 2160-0473 Road Crash Prediction Models: Different Statistical Modeling Approaches Azad Abdulhafedh University of Missouri-Columbia, MO, USA How to cite this paper: Abdulhafedh, A. (2017) Road Crash Prediction Models: Different Statistical Modeling Approaches. Journal of Transportation Technologies, 7, 190-205. https://doi.org/10.4236/jtts.2017.72014 Received: December 20, 2016 Accepted: April 27, 2017 Published: April 30, 2017 Copyright © 2017 by author and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access Abstract Road crash prediction models are very useful tools in highway safety, given their potential for determining both the crash frequency occurrence and the degree severity of crashes. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or intersection in a time period, while crash severity models generally explore the relation- ship between crash severity injury and the contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environment conditions. Effective interventions to reduce crash toll include design of safer infrastructure and incorporation of road safety features into land-use and transportation planning; improvement of vehicle safety features; improve- ment of post-crash care for victims of road crashes; and improvement of driv- er behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness. Despite the great efforts that transportation agencies put into preventive measures, the annual number of traffic crashes has not yet significantly decreased. For instance, 35,092 traffic fatalities were recorded in the US in 2015, an increase of 7.2% as compared to the previous year. With such a trend, this paper presents an overview of road crash prediction models used by transportation agencies and researchers to gain a better understand- ing of the techniques used in predicting road accidents and the risk factors that contribute to crash occurrence. Keywords Crash Prediction Models, Poisson, Negative Binomial, Zero-Inflated, Logit and Probit, Neural Networks 1. Introduction Road traffic accidents are the world’s leading cause of death for individuals be- *PhD in Civil Engineering. DOI: 10.4236/jtts.2017.72014 April 30, 2017
A. Abdulhafedh tween the ages of one and twenty-nine [1]. Throughout the world, cars, buses, trucks, motorcycles, pedestrians, animals, taxis and other categories of travelers, share the roadways, contributing to economic and social development in many countries. Yet each year, many vehicles are involved in crashes that are responsi- ble for millions of deaths and injuries. Globally, every year, about 1.25 million people are killed in motor vehicle crashes and approximately 50 million more are injured. Following current trends, about two million people could be ex- pected to be killed in motor vehicle crashes each year by 2030 [1]. Currently, road crashes are ranked as the ninth most serious cause of death in the world, and without new initiatives to improve road safety, fatal crashes will likely rise to the third place by the year 2020 [1]. In developed countries, road traffic death rates have decreased since the 1960s because of successful interventions such as seat belt safety laws, enforcement of speed limits, warnings about the dangers of mixing alcohol consumption with driving, and safer design and use of roads and vehicles. For example, road traffic fatalities have declined by about 25.0 percent in the United States from 2005 to 2014 and the number of people injured has decreased 13.0 percent from 2005 to 2014 [2]. In Canada, the number of road traffic fatalities has declined by about 62.0 percent from 1990 to 2014, and the number of injuries has declined by about 68.0 percent during the same period [3]. However, traffic fatalities have increased in developing countries from 1990 to 2014 (i.e. 44.0 percent in Malaysia and about 243.0 percent in China) [1]. De- veloping countries bear a large share of the burden, accounting for 85.0 percent of annual deaths and 90.0 percent of the disability-adjusted life years. More than one-half of all road traffic deaths globally involve people ages 15 to 44, during their most productive earning years. Moreover, the disability burden for this age group accounts for about 60.0 percent of all disability-adjusted life years. The costs and consequences of these losses are significant. Three-quarters of all poor families who lost a member in a traffic crash reported a decrease in their stan- dard of living, and about 61.0 percent reported having to borrow money to cover expenses following their loss [4]. The World Bank estimates that road traffic in- juries cost 2.0 percent to 3.0 percent of the Gross National Product of developing countries, or twice the total amount of development aid received worldwide by developing countries [5]. Although transportation agencies often try to identify the most hazardous road sites, and put great efforts into preventive measures, such as illumination and policy enforcement, the annual number of traffic crashes has not yet significantly decreased. For instance, 35,092 traffic fatalities were recorded in the US during 2015, an increase of 7.2% as compared to the previous year [6]. The fatality rate per 100 million vehicle miles traveled (VMT) increased 3.7% between 2014-2015. Thirty-five States had more motor vehicle fatalities in 2015 than in 2014. Every month except November saw increases in fatalities from 2014 to 2015, and the highest increases occurred in July and Sep- tember [6]. Given this trend, it is imperative to gain a better understanding of the risk factors that may be associated with traffic crashes. This paper aims at presenting an overview of road crash prediction models used by transportation 191
A. Abdulhafedh 192 agencies and researchers to help understanding the techniques used in predict- ing road accidents and the risk factors that contribute to crash occurrence. 2. The Importance of Traffic Accidents Prediction Models Traffic accidents prediction models are very useful tools in highway safety, given their potential for determining both the frequency of accident occurrence and the contributing factors that could then be addressed by transportation policies. Vehicular crash data can be used to model both the frequency of crash occur- rence and the degree of crash severity. Crash frequency refers to the prediction of the number of crashes that would occur on a specific road segment or inter- section in a time period [7]. Crash severity methods generally explore the rela- tionship between crash severity injury categories and contributing factors such as driver behavior, vehicle characteristics, roadway geometry, and road-environ- ment conditions. Traffic accident related-fatalities and injuries can be prevented or at least minimized by a joint involvement from multiple sectors (i.e. transpor- tation agencies, police, health departments, education institutions) that oversee road safety, vehicles, and the drivers themselves. Effective interventions include design of safer infrastructure and incorporation of road safety features into land- use and transport planning; improvement of vehicle safety features; improve- ment of post-crash care for victims of road crashes, and improvement of driver behavior, such as setting and enforcing laws relating to key risk factors, and raising public awareness [8]. Transportation agencies and research institutions often seek to identify the most dangerous road sites, and this will require mod- eling road crash data to determine both crash frequency and crash severity de- gree. In addition, traffic accidents prediction models can also assist with the de- velopment of generalized theories concerning road safety. A range of basic laws have been put forth to help explain the relationship between the occurrence of road crashes and potential risk factors, such as: the universal law of learning, which implies that the crash rate tends to decline as the number of kilometers travelled increases; the law of rare events, which states that rare events, such as environmental hazards, would have more effect on crash rates than regular events; and the law of complexity, which implies that the more complex the traf- fic situation road users encounter, the higher the probability of crash occurrence [9]. 3. Factors Affecting Road Traffic Accidents A traffic accident may have many contributing factors, such as those related to driver behavior, road geometry, traffic volumes, vehicle, and environment. The influence of such variables on crash occurrence could significantly vary on a case-by-case basis, but in general, both behavioral factors related to the driver’s errors, and non-behavioral factors related to road geometry, traffic flow condi- tions, vehicle, and environment are thought to significantly affect traffic crashes [10]. Research has revealed that there are generally six major groups of risk fac- tors affecting traffic crash occurrence [11] [12] [13] [14] [15]:
A. Abdulhafedh 1) Driver behavior: alcohol and drug use, reckless operation of vehicle, failure to properly use occupant protection devices, the use of cell phones or texting, and fatigue. 2) Vehicle factors: vehicle type, and the engineering and the safety design standards for vehicle performance. For example, the design of windshield glass and the location and durability of gas tanks can increase safety. Passenger pro- tection systems in vehicles (i.e. air bags, safety belts), if used, can eliminate inju- ries or reduce their severity. 3) Roadway characteristics: road geometries and road side conditions, such as well-designed curves and grades, wide lanes, adequate sight distance, clearly vis- ible striping, flared guardrails, good quality shoulders, roadsides free of obsta- cles, well-located crash attenuation devices, and well-planned use of traffic sig- nals. 4) Traffic volumes: average annual daily traffic (AADT) or the vehicle miles travelled (VMT). AADT is the average number of vehicles passing a point along a particular road section each day. Thus, AADT represents the vehicle flow over a road section on an average day of the year. VMT refers to the distance travelled by vehicles on roads. It is often used as an indicator of traffic demand and is commonly applied to evaluate mobility patterns and travel trends. 5) Environmental factors: weather conditions, and light conditions. 6) Time factors: the season of the year, the month of the year, weekdays, and the hour of crash occurrence. 4. The Costs of Road Traffic Accidents The highest cost of traffic crashes is in the loss of human lives; however, society also bears the consequences of many costs associated with motor vehicle crashes. Highway crashes currently cost the USA about $1078.0 billion a year, approx- imately 5.0 percent higher than 2000. Total costs include both economic costs and societal harm [16]. In the year 2010, 3.9 million people were injured and 32,999 killed in 13.6 million motor vehicle crashes in the US [2]. The economic costs of these crashes totaled $242.0 billion including lost productivity, medical costs, legal and court costs, emergency service costs, insurance administration costs, congestion costs, property damage, and workplace losses. The $242.0 bil- lion cost of motor vehicle crashes represents the equivalent of nearly $784.0 for each person living in the United States, and 1.6 percent of the $14.96 trillion U.S. Gross Domestic Product for 2010 [16]. When quality of life valuation is consi- dered, the total value of societal harm from motor vehicle crashes in 2010 was $836.0 billion, roughly three and a half times the value measured by economic impacts alone. Lost market and household productivity accounted for $77.0 bil- lion of the total $242.0 billion economic costs, while property damage accounted for $76.0 billion. Medical expenses totaled $23.0 billion. Congestion caused by crashes, including travel delay, excess fuel consumption, greenhouse gases and criteria pollutants accounted for $28.0 billion. Each fatality resulted in an aver- age discounted lifetime cost of $1.4 million. Each critically injured survivor cost 193
A. Abdulhafedh 194 an average of $1.0 million [16]. 5. Literature Review Early crash analysis models were generally based on simple multiple linear re- gression methods assuming normally distributed errors. However, researchers soon discovered that crash occurrence could be better fitted with a Poisson dis- tribution. Hence, a Poisson regression model based upon a generalized linear framework was soon adopted over conventional multiple linear regression tech- niques. Several such Poisson regression approaches for exploring the relation- ship between the risk factors and crash frequency have been proposed [15] [16] [17] [18] [19] [20]. However, it has been found that Poisson regression ap- proaches have one important constraint that the mean must be equal to the va- riance which if violated, the standard errors estimated by the maximum likelih- ood method, will be biased, and the test statistics derived from the model will be incorrect. Recent studies have shown that crash data are usually over-dispersed, when the variance exceeds the mean, therefore, incorrect estimation of the like- lihood of crash occurrence could result in applications of the Poisson regression model [7]. In efforts to overcome the problem of over-dispersion, researchers began to employ the Negative Binomial (NB) distribution (also called the Pois- son-Gamma) instead of the Poisson distribution, which relaxes the mean equals to variance constraint, and hence can accommodate over-dispersion in crash data counts [7]. NB models have been widely used in crash frequency modeling [14] [15] [19] [21] [22] [23]. However, NB models have some limitations such as the inability to handle under-dispersion of crash counts when the mean of the crash counts is higher than the variance. Although rare, this phenomenon can arise when the sample size is very small, leading to erroneous parameter estimates [24] [25]. To address the limitations of NB models, Poisson-lognormal models have been proposed, in which the error term is Poisson-lognormal rather than gamma-distributed to better handle the under-dispersed crash counts [21] [26] [27]. Another widely used type of crash prediction model is the zero-inflated Poisson and zero-inflated negative binomial models, which have been intro- duced mainly to deal with the over-dispersion problem caused by excessive ze- roes (i.e. locations where no crashes can be observed) in traffic data counts. The zero-inflated models have shown great flexibility, although their applicability in crash prediction has been criticized because of the long term mean equals zero in the safe state that could produce some biased estimates [7] [22]. Generalized ad- ditive modeling approaches have also been proposed which provide smoothing functions for the explanatory variables. However, these models typically include more parameters than the traditional count models, and therefore their applica- bility to the crash prediction has been very limited [28] [29]. Random- parame- ters models have been applied to take the effect of the unobserved heterogeneity from one roadway site to another, however, their application in practice has been very limited [30] [31] [32]. The finding that road crashes are poorly ex- plained by linear functions of independent variables, has encouraged the explo-
A. Abdulhafedh ration of non-linear approximators such as fuzzy logic and neural networks. For example, a fuzzy logic approach was used for prediction of urban highway crash occurrence and it was found that the use of fuzzy sets in crash prediction is in- deed a viable approach [33]. Neural networks have been applied to highway safety applications as predictive tools, such as in driver behavior analysis, pave- ment maintenance, vehicle detections, traffic signal control, and vehicle emis- sions, however, their application to crash analysis has been limited [28] [34] [35]. For instance, an artificial neural network was utilized to analyze the free- way crash frequency in Taiwan, and the results indicated that an artificial neural network can provide a consistent alternative method for analyzing crash fre- quency [36]. Also, a group of artificial neural networks was applied to model the non-linear relationships between the injury severity levels and crash-related fac- tors. The findings indicated that artificial neural network models can predict crashes more effectively than the traditional statistical methods [37]. In crash severity models, a wide variety of statistical approaches such as the binary and the multinomial logit models, nested logit models, mixed logit models and or- dered probit models have been investigated. For example, the ordered probit model was applied to predict crash severity on roadway sections, signalized in- tersections and toll plazas in Florida [38]. A mixed logit model was applied that used the injury outcome of the crash using limited crash data to investigate the proportion of crashes of each severity level on a specific roadway segment over a specified time period. Then, the number of crashes by severity level was deter- mined without the need for detailed crash-specific data [39]. Also, a multinomial logistic regression was applied to model the severity injury of different vehicle collision patterns in urban highways in Arkansas, and the researchers recom- mended the use of the MNL over other models [40]. 6. A Review on the Statistical Approaches of Road Crash Prediction Models There are different statistical approaches for modeling traffic crashes. The fol- lowing approaches present some of the mostly used methods. 6.1. Multiple Linear Regression Early models of traffic accident models were based on the simple multiple linear regression approach assuming normally distributed errors. The general form of the linear crash prediction model can be expressed as follows: ( ) θ with θ = (1) Y θ ∼ Dist where, ( f X , ) βε , Y: the dependent variable (i.e. crash frequency), θ: the crash dataset, Dist(θ): the model distribution, X: a vector representing different independent variables (i.e. risk factors), β: a vector of regression coefficients, f(.): link function that relates X and Y together, 195
A. Abdulhafedh ε: the disturbance or error terms of the model. 6.2. Poisson Regression Although multiple linear regression models have been widely applied, it has been found that crash occurrence can often be better fitted with a Poisson dis- tribution. One frequent pitfall is to model crash data as continuous data by ap- plying an ordinary least square regression [41]. This approach is inappropriate because regression models can produce predicted values that are non-integers and can also predict values that are negative, both of which are inconsistent with continuous data modeling. In addition, many distributions of crash data are po- sitively skewed with many observations in the data set having a value of 0.0. The high number of zeros in the data set prevents the transformation of a skewed distribution into a normal one, which is a requirement of normal distribution. An alternative is to use a Poisson distribution or one of its variants. Poisson dis- tributions have a number of advantages over an ordinary normal distribution, including a skew, discrete distribution, and the restriction of predicted values to non-negative numbers [41]. Hence, generalized linear modeling variates of the Poisson regression model have been proposed to explore the relationship be- tween the risk factors and traffic accident modeling [15] [17] [18] [19]. Poisson regression has been applied to a wide range of transportation count data, in- cluding crash frequency. A Poisson regression model is similar to an ordinary linear regression, with two exceptions. First, it assumes that the errors follow a Poisson (not normal) distribution. Second, rather than modeling the response variable Y as a linear function of the regression coefficients, it models the natural log of the response variable, ln(Y), as a linear function of the coefficients [7]. The Poisson model can be expressed as follows: iEX P n (2) i λ− ( P n i ) ( ! λ = ) where, P (ni): the probability of n crashes occurring on a highway segment i, ni: the number of observations per time period (such as a year), λi: the expected crash frequency on road segment i per time period (i.e. the mean of distribution) which can be estimated as follows: where λ i = EXP ( X β i ) (3) Xi: a vector of the independent variables (i.e. risk factors), β: a vector of the estimates (coefficients) of the independent variables Xi. This model is estimable by standard maximum likelihood methods, with the log likelihood (LL) function given as: LL ( ) β = ∑ n 1 − EXP   ( X β i ) + n ( β Xi ) − ( Ln n ) !   (4) One assumption of Poisson Models is that the mean and the variance are equal, an assumption that is sometimes violated [7]. This can be dealt with by using a dispersion parameter if the difference is small, or by using a negative bi- 196
A. Abdulhafedh nomial regression model if the difference is large [42]. 6.3. Negative Binomial Regression Model (NB) In order to overcoming the problem of over-dispersion, the Negative Binomial (NB) distribution (also called the Poisson-Gamma) has been investigated as an alternative to the Poisson distribution given that it relaxes the condition of mean equals to variance, and hence can take into account over-dispersion in the crash data counts [7]. As a result, NB models have been widely applied in crash fre- quency modeling [14] [15] [19] [21] [22] [23]. The NB uses a Gamma probability distribution and can relax the assumption of the mean equals the variance and, hence, the NB can accommodate over-dis- persion that may exist in the crash data counts [43]. A primary source of over- dispersion is the clustering of data, and the possible omission of relevant inde- pendent variables influencing the Poisson rate across observations [44]. In order to obtain the NB model, the Poisson regression can be rewritten by adding an error term to its expected number of crashes, and becomes [7]: λ ( )i β ε Xi + = ) + ( 1 ( E n i i EXP = (5) where EXP (εi) is a gamma-distributed error with mean equals one and variance equals α. The addition of this term allows the variance VAR (ni) to differ from the mean E (ni) as shown in Eq. 6: ( ) VAR n i (6) This error term is called the over-dispersion parameter, and both α and β can be estimated from the maximum likelihood function. When α is zero, the model becomes Poisson regression, and if α is found to be significantly different from zero, then the NB regression can be used instead of the Poisson regression model to handle the over-dispersion in crash data. However, the NB model also has some limitations such as its inability to handle the case of under-dis- persion of the data count, when the mean of the crash counts is higher than the variance [25] [44]. ( E nα i ) ) 6.4. Poisson-Lognormal Regression Model To address the limitations of the NB models, the Poisson-lognormal model was introduced, in which the error term is Poisson-lognormal rather than gamma- distributed so as to better handle under-dispersed data counts [21] [26] [27]. The Poisson-lognormal model is similar to the negative binomial model, how- ever, the EXP (εi) term used in the model is lognormal-rather than gam- ma-distributed. The Poisson-lognormal model provides more flexibility than the negative binomial model, but it does have some limitations, such as, its complex estimation of parameters due to the fact that the Poisson-lognormal distribution does not have a closed form [26]. 6.5. Zero Inflated Poisson and Negative Binomial Regression Models Another widely used crash frequency modeling approach is the zero-inflated 197
分享到:
收藏