Road accidents data are the essential measure of safety with the assistance of which we will establish the size and therefore the nature of road safety problems. Therefore, accidents database is the key to think about the management of road safety. The major thing that helps us to find the factors of road accidents is the data set. The country data are not accurate either they are lack of useful information or they have no strong format system. For the analysis, the data set must be accurate. It has been found that common causes of accidents are road way conditions, weather conditions, lack of traffic indication, vehicle problem, driver’s behavior etc. Researchers have used various techniques to analyze the problems like Neural Networks, Fuzzy logic, Data mining and Machine Learning.
Different parameters have different effect on road accidents. The most difficult thing for the analysis is the dissimilarity. Therefore, the segmentation is required. We can measure the dissimilarities with help of analysis of the given information and finding the connection between these dissimilarities that can help us to find the hidden pattern required for analysis of the road accidents. Below is a simple web app that helps identify the cause of an accident using the xgboost machine learning algorithm.
The goal of this research is to analyze chicago car accident reports data in order to classify the primary cause of an accident and answer the following questions:
Q1 - What is the distribution of car accident causes?
Q2 - What regions do the most car accidents occur?
Q3 - What effect do external factors have on the amount of car crashes and car crashes with injuries?
The time of the days effect on car accidents.
The weather’s effect on car accidents.
WHAT'S THE DISTRIBUTION OF THE CAUSES OF ACCIDENTS?
The most deadly types of crashes leading in proportions are Turning at 19% and Angle at 13%. I’ll recommend focusing on these as they account for the most fatalities.
The most types of car accidents are Rear Ends accidents accounting for 30% of car crashes. Followed by Sideswipe Same Direction accidents accounting for 16% of car crashes.
The most deadly types of crashes leading in proportions are: Turning at 19% and Angle at 13%. I’ll recommend focusing on these by making better and seperate traffic signals for turning as they account for the most fatalities.
Random Forest, X Boosting & LinearSVC classifiers where implimented after re-sampling with SMOTE since the dataset was heavily imbalanced and they all gave roughly the same results give or take 5%. So I opted to go with X Boosting classifier using PCA as its feature selection parameter. The features included where:
Driver’s Action, Driver’s Vision, Roadway Surface Condition, Device Condition, First Crash Type, Posted Speed Limit, Age, Physical Condition.
The model gave a log loss of 12.5 which and accuracy of 64%. This means the amount the model penelizes for incorrect predictions 12.5 but it only predicted 64% of the primary causes of accidents accurately.
Its total recall is 64% which is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of times that category was indeed the cause.
Its total precision is 64% and this is the total amount of times the model classified the cause of an accident was a category correctly out of the total amount of the predictions made for that category.