CN108717786B

CN108717786B - Traffic accident cause mining method based on universality meta-rule

Info

Publication number: CN108717786B
Application number: CN201810781739.9A
Authority: CN
Inventors: 曾维理; 赵子瑜; 李娟�; 任禹蒙; 孙煜时; 羊钊
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2022-06-17
Anticipated expiration: 2038-07-17
Also published as: CN108717786A

Abstract

The invention discloses a traffic accident cause mining method based on a universal element rule. According to the method, historical traffic accident information is read, data are preprocessed, then, each accident record is graded according to road traffic accident classification standards, on the basis, an association rule analysis method is applied, reasonable threshold values of minimum support, minimum confidence coefficient and frequency index are set, association rule mining with the threshold values being consistent is conducted on multiple data sets, binary data sets of the data sets and strong association rules are built, element rule sets are extracted, the element rule sets and the data sets are collected for secondary mining, element rules in the multiple data sets are integrated, and output rules which are output in a cellular mode and formed by multiple rules with universal characteristics are obtained. The method can dig out hidden association information in the traditional association rule, screen valuable rules, eliminate the association rule without the universal characteristic of the multi-partition, and provide decision assistance for a traffic safety manager.

Description

Traffic accident cause mining method based on universality meta-rule

Technical Field

The invention belongs to the technical field of traffic safety, and particularly relates to a traffic accident cause mining method based on a universality element rule.

Background

In recent years, urban road traffic is rapidly developed, the scale and density of urban road networks are greatly improved, catering delivery service, shared bicycles and shared automobiles and the internet taxi renting industry emerge like spring bamboo shoots after raining, and the urban road traffic is under heavy pressure and traffic accidents are also in an ascending trend after flourishing. With the basic perfection of the recording conditions of the current traffic accident data, how to effectively utilize the data is a main problem facing the current stage when finding a symptom from a large amount of traffic accident data. The method has the advantages that the method provides basis for decision-making level by analyzing accident occurrence reasons and finding out the intrinsic rules of all attribute relations in the traffic accident, achieves targeted goal, and reduces the probability of the traffic accident by deleting the conditions of the traffic accident through artificial intervention and control.

With the development of artificial intelligence and big data technology, concepts and methods of data mining are also beginning to be applied to the traffic field in a large number. The association rule mining method is used for analyzing the association among different transaction attributes in a data set, and is a mainstream traffic accident data mining means at present. The mining algorithm of the association rules is also one of the main representatives of unsupervised learning, exactly accords with the characteristics of strong randomness and uneven data distribution of traffic accidents, can reflect the potential association between transaction attributes, further analyzes valuable association rules and makes reasonable decisions.

The problems existing in the prior art are as follows: based on the association rule obtained by traffic data mining, in order to ensure readability, the threshold value is set to be higher, the probability of accidents with different severity degrees in the traffic accidents is different, and if the association rule with the consistent threshold value is directly adopted for mining the whole traffic data set, part of hidden association in the association rule cannot be embodied; in a traffic data set with more data attributes, the number of attributes of the front item and the back item in the association rule is too large, and no logic embodiment exists between the front item and the back item, so that the analysis of a decision level is inconvenient; the data mining is carried out on the data set singly to obtain the association rule, the universality of the rule is not always considered, the obtained rule part is only suitable for the current data set, and only the accident cause of a single data set block can be reflected, so that the common traffic accident cause characteristic of multiple blocks cannot be reflected.

Disclosure of Invention

In order to solve the technical problems of the background art, the invention aims to provide a traffic accident cause mining method based on a universal meta-rule, which extracts the meta-rule with universal characteristics and outputs the rule in a cellular mode form formed by the meta-rule, thereby realizing the universality and easy interpretability of the rule.

In order to achieve the technical purpose, the technical scheme of the invention is as follows:

a traffic accident cause mining method based on a universal meta-rule comprises the following steps:

step one, data preparation

Step 1.1: reading traffic accident information of the past year, and dividing the traffic accident information into 5 types of traffic accident cause information including accident basic information, accident driver information, accident vehicle information, road condition information and environment information, wherein the traffic accident cause information is described by multiple attributes;

step 1.2: performing data quality analysis on the read traffic accident information, and screening and reserving the attribute variables with qualified quality;

step 1.3: selecting attributes of the screened traffic accident information, and removing the attributes which are irrelevant or redundant to the mining task, wherein the attribute selection aims to find out a minimum attribute set and simultaneously ensures that the probability distribution of a data set is as close as possible to the original distribution obtained by utilizing all the attributes;

step 1.4: carrying out data cleaning on the traffic accident information obtained in the step 1.3, wherein the data cleaning comprises missing value processing and noise filtering; the missing value processing adopts a deleting method to delete the information of which the attribute missing degree exceeds a preset missing threshold value in the information of the causes of the 5 types of traffic accidents; the noise filtering adopts an outlier detection algorithm based on a statistical method, and outliers in the data are diagnosed and deleted;

step 1.5: clustering the continuously distributed attributes, and classifying each accident according to the road traffic accident classification standard;

step two, parameter selection

Step 2.1: the support and confidence of the rules are calculated according to the following methods:

rule R:

the support in the traffic data set T is as follows:

wherein the content of the first and second substances,

rule R:

the confidence in the traffic data set T is as follows:

wherein the content of the first and second substances,

for rule R:

x is called the antecedent of the rule, Y is called the postent of the rule, and the support of the rule R

The probability of accident cause X and Y occurring simultaneously, and the confidence of rule R

The condition probability that the accident causes Y occur simultaneously when the accident causes X occur is shown, when the confidence coefficient of the rule R is greater than a preset threshold value, the occurrence of the Y event is considered to be induced by the occurrence of the X event, and the higher the confidence coefficient is, the closer the relation between the X event and the Y event is;

step 2.2: selecting a minimum support S threshold: after different types of accidents of traffic data sets of different administrative regions are distinguished, support degree calculation is carried out on rules of different regions according to a formula (1), and a relation graph of the number of association rules meeting a minimum support degree threshold value and the minimum support degree threshold value is obtained; selecting different minimum support threshold values, taking the minimum support threshold value as a horizontal coordinate, taking the number of association rules meeting the minimum support threshold value as a vertical coordinate, obtaining a support threshold value selection trend graph of each area of various accidents, and selecting the minimum support threshold value;

step 2.3: selecting a minimum confidence level Cthreshold: carrying out support degree and confidence degree calculation on the rules of different areas according to formulas (1) and (2) for traffic accident data sets of different types, setting different support degree and confidence degree thresholds for comparison and analysis, obtaining a bubble relation graph of rule distribution meeting threshold conditions and threshold setting, and balancing the selection range of the support degree and the confidence degree thresholds, wherein the abscissa corresponds to the support degree threshold, the ordinate corresponds to the confidence degree threshold, and the larger the number of bubbles is, the more the number of the included association rules is represented;

step 2.4: selecting a frequent index Fthreshold: screening indexes of universal element rules into frequent indexes among different traffic data sets, establishing an association rule frequent index table based on multiple data sets according to association rules obtained by mining in different data sets respectively, taking the association rules meeting frequent index threshold values as the universal element rules, wherein each data set adopts consistent support degree and confidence threshold values when mining the association rules, indicating existence rules and nonexistence rules respectively through Boolean variables 1 and 0, and a rule R_iThe frequent index of (c) is defined as follows:

wherein p is_ijIs rule R_iIn the data set T_jA judgment value of (1), rule R_iIn the data set T_jIn presence of, then p_ijGet 1, otherwise p_ijTaking 0 and n as the number of the data sets;

in order to obtain element rules with universality in multiple regions and ensure that the obtained element rules with universality have analytical significance, association screening is carried out on different types of traffic accident data sets in each region, association rules which repeatedly appear in each region are screened out, association trend graphs of association rule regions of various accidents are obtained by taking a frequent index threshold as an abscissa and the number of strong association rules as an ordinate, and the frequent index threshold is selected;

step three, association rule mining based on meta-rules

Step 3.1: for multiple data sets T of the same format₁,T₂,…,T_iSetting consistent minimum support degree S and minimum confidence degree C, and mining association rules for one time to obtain corresponding association rules R₁,R₂,…,R_i；

Step 3.2: according to the frequent indexes of the association rules in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set;

step 3.3: combining meta-rule set and data set T₁,T₂,…,T_iPerforming secondary mining, and integrating meta-rules in a plurality of groups of data sets;

step 3.4: strong association rule output is carried out according to the minimum support degree S and the minimum confidence degree C, meta-rules based on association factors causing traffic accident types are derived, and output rules which are output in a cellular mode and are composed of multi-element rules with pervasive characteristics are obtained;

step four, rule analysis

And qualitatively and quantitatively analyzing the relevance among the accident causes according to a strong association rule formed by the generated multivariate rules under different types of accidents, and providing a reference basis for a decision-making level.

Further, in step 1.1, the attributes included in the category 5 traffic accident cause information are as follows:

the accident basic information includes: the type of the accident, the accident grade, the casualties, the direct property loss of the accident, the accident time and the accident site;

the information about drivers involved in the accident includes: the sex, age, occupation and driving age of the driver, the physical and psychological conditions when an accident occurs and the driving operation behaviors before and after the accident occur;

the accident vehicle information includes: vehicle type, vehicle safety condition, and vehicle drivability;

the road condition information includes: road grade, road surface form, road surface condition, safety facilities and line shape design;

the environment information includes: road traffic conditions, driving sight distance, weather conditions, sign and line and illumination at the moment of occurrence.

Further, in step 1.2, attribute variables with a non-empty occupancy of greater than 70% are retained by a value analysis method.

Further, in step 1.4, the preset absence threshold is 30%.

Further, in step 1.5, the accident is classified as follows:

loss of property accident: causing damage to vehicles, cargo or other property items, or with minor injury to personnel;

injury accidents: cause the principal to be seriously injured or slightly injured, or can be accompanied with property loss;

death accident: causing death of the party or accompanied personal injury and property loss.

Further, in steps 2.2, 2.3 and 2.4, when selecting various thresholds, firstly, readability of the rules should be ensured, so that the number of the rules is kept within a readable range, and meanwhile, validity of the rules should be ensured, so that the attributes included in the rules are as many as possible.

Adopt the beneficial effect that above-mentioned technical scheme brought:

the method mainly carries out universality analysis through association rules obtained by carrying out primary mining on different data sets, extracts rules with the universality characteristics as meta-rules, integrates the meta-rules in a plurality of groups of data sets through secondary mining, and outputs the meta-rules in a cellular mode to obtain output rules formed by the multi-element rules with the universality characteristics. Compared with the association rule obtained by the traditional mining means, the method has the advantages that: (1) hidden association information in the traditional association rule is disclosed through the universal meta rule, and the information is integrated into a mining result to be displayed; (2) eliminating the association rules without the universal characteristics of the multi-partition, and screening valuable universal rules; (3) the defects that the attributes of the traditional association rule in a decision-making level are various and are not beneficial to decision-making are overcome, the logicality among the attributes in the front item and the back item of the rule is emphasized, and the association rule is expressed in a more readable mode. The invention not only can dig out the causes causing various traffic accidents, but also can find out the incidence relation among the causes, thereby helping traffic management departments to find out the most important and most key intervention factors and improving the accident prevention effect.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following with the accompanying drawings.

The invention provides a traffic accident cause mining method based on a universal meta rule, which comprises the following specific steps as shown in figure 1.

The method comprises the following steps: data preparation

Step 1.1: reading in traffic accident information of all the years, and dividing the traffic accident information into five aspects of traffic accident cause information according to accident basic information, accident-related driver information, accident vehicle information, road condition information and environment information to realize multi-angle description of accident attributes:

1) the accident basic information comprises information such as the type of the accident, the accident grade, accident casualties (the number of injured people, the number of dead people, the number of serious injury people and the number of light injury people), direct property loss of the accident, the accident time and the accident site. This information constitutes a basic description of the road traffic accident itself.

2) The driver information is one of important factors of an accident, and includes basic information such as sex, age, occupation, driving age and the like of the driver, and auxiliary information such as physical and psychological conditions when the accident occurs, driving operation behaviors before and after the accident occurs and the like.

3) The accident vehicle information includes a vehicle type, a vehicle safety condition, a vehicle drivability, and the like.

4) The road condition information, which includes road grade, road surface form, road surface condition, safety facilities, alignment design, etc., includes the urban road and the highway where the road traffic accident occurs.

5) The environmental information includes road traffic conditions, driving sight distance, weather conditions, mark lines, illumination and the like at the time of the accident, and the environmental information directly or indirectly influences the occurrence of the traffic accident.

Step 1.2: and carrying out data quality analysis on the traffic accident data information. And (4) incorporating attribute variables with non-empty occupancy of more than 70% into a variable system of the data sample of the next round through a value analysis method.

Step 1.3: and (4) performing attribute selection on the screened traffic accident data information, and removing the attributes which may not be related to the mining task or are redundant. The goal of attribute selection is to find the minimum set of attributes while ensuring that the probability distribution of the data set is as close as possible to the original distribution obtained using all the attributes. This has the advantage of reducing the number of attributes that appear on the discovery pattern, making the pattern easier to understand.

Step 1.4: and carrying out data cleaning on the reserved accident cause information. Missing value processing and noise filtering are mainly performed. The missing value processing is performed first. The missing data represents the independent individual information of the accident, and no obvious correlation exists between accidents, so that the missing data can not be compensated by later analysis theoretically, a deleting method is used for cleaning the data, the accident information with the information attribute missing degree exceeding 30% caused by 5 types of traffic accidents is eliminated, and the data quality and the mining value are improved. Noise filtering is then performed, where a statistical-based outlier detection algorithm is employed. Because the traffic accident data is independent individual information, no obvious correlation exists among accidents, and the traffic accident data has high randomness, the missing data can not be smoothly processed by analysis of a regression method in theory, and the diagnosed outliers are deleted.

Step 1.5: and carrying out data specification on the continuity attributes. In the subsequent data mining process, the characteristics of various traffic accidents can be summarized in a classified manner, attention can be paid to a certain specific class for further analysis, and clustering processing is carried out on continuously distributed attributes. Meanwhile, in order to conveniently and visually present the associated factor information causing various traffic accidents according to the data mining result, classifying each accident based on the number of dead people, the number of light injuries, the number of heavy injuries and property loss recorded by each accident according to the classification standard of the road traffic accidents:

a) and (5) property loss accidents. Damage to vehicles, cargo or other property items, which may be accompanied by minor injury to personnel;

b) an injury accident. Cause the serious injury or the slight injury to the people concerned, and can be accompanied with property loss;

c) and (4) death accidents. The death of the people involved can be accompanied by personal injury and property loss.

Step two: parameter selection

Step 2.1: the support and confidence of the rule are calculated according to the following formulas.

Rule R:

the support in the traffic data set T is as follows:

wherein, the first and the second end of the pipe are connected with each other,

rule R:

the confidence in the traffic data set T is as follows:

wherein the content of the first and second substances,

for rule R:

The probability of accident cause X and Y occurring at the same time, and the confidence of rule R

And when the confidence coefficient of the rule R is greater than a preset threshold value, the occurrence of the X event induces the occurrence of the Y event, and the higher the confidence coefficient is, the tighter the relation between the X event and the Y event is.

Step 2.2: selecting a minimum support S threshold: after different types of accidents of traffic data sets of different administrative regions are distinguished, support degree calculation is carried out on rules of different regions according to a formula (1), and a relation graph of the number of association rules meeting a minimum support degree threshold value and the minimum support degree threshold value is obtained; by selecting different minimum support threshold values, taking the minimum support threshold value as a horizontal coordinate and taking the number of association rules meeting the minimum support threshold value as a vertical coordinate, a support threshold value selection trend graph of each area of various accidents is obtained, and the minimum support threshold value is selected.

Step 2.3: selecting a minimum confidence level Cthreshold: for traffic accident data sets under different types, carrying out support degree and confidence degree calculation on rules in different areas according to formulas (1) and (2), setting different support degree and confidence degree thresholds for comparison and analysis, obtaining a bubble relation graph of rule distribution meeting threshold conditions and threshold setting, and balancing the selection range of the support degree and the confidence degree thresholds, wherein the abscissa corresponds to the support degree threshold, the ordinate corresponds to the confidence degree threshold, and the larger the number of bubbles is, the more the number of contained association rules is.

Step 2.4: selecting a frequent index Fthreshold: screening indexes of meta-rules with universal characteristics among different traffic data sets into frequent indexes, establishing a multi-data-set-based association rule frequent index table according to association rules obtained by mining in different data sets respectively, and taking the association rules meeting a frequent index threshold value as universal meta-rules, wherein the data sets adopt consistent indexes when mining the association rulesSupport degree and confidence degree threshold, wherein Boolean variables 1 and 0 respectively represent existence rule and non-existence rule, and rule R_iThe frequent index of (c) is defined as follows:

wherein p is_ijIs rule R_iIn the data set T_jA judgment value of (1), rule R_iIn the data set T_jIn presence of, then p_ijGet 1, otherwise p_ijAnd taking 0 and n as the number of the data sets.

In order to obtain element rules with universality in multiple regions and ensure that the obtained element rules with universality have analytical significance, association screening is carried out on different types of traffic accident data sets in each region, association rules which repeatedly appear in each region are screened out, association trend graphs of association rule regions of various accidents are obtained by taking a frequent index threshold as an abscissa and the number of strong association rules as an ordinate, and the frequent index threshold is selected.

When selecting various types of thresholds, firstly, readability of the rules should be ensured, so that the number of the rules is kept within a readable range (generally, less than 200 pieces), and meanwhile, validity of the rules should be ensured, so that attributes contained in the rules are as many as possible.

Step three: association rule mining based on meta-rules

Step 3.1: for multiple data sets T of the same format₁,T₂,…,T_iSetting consistent minimum support degree S and minimum confidence degree C, and mining association rules for one time to obtain corresponding association rules R₁,R₂,…,R_i。

Step 3.2: and according to the frequent indexes of the association rules in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set.

Step 3.3: combining meta-rule set and data set T₁,T₂,…,T_iAnd performing secondary mining, and integrating meta-rules in a plurality of groups of data sets.

Step 3.4: and generating an association rule. Outputting strong association rules according to the minimum support degree S and the minimum confidence degree C, deriving meta-rules based on association factors causing the traffic accident type, and obtaining output rules composed of multi-element rules with cell mode output and universal characteristics

Rule template of (1), representing accident cause P₁,...,P_i,P_j,...,P_kThe occurrence of (2) induces the occurrence of the accident cause Q (Y), in the accident cause P₁,...,P_i,P_j,...,P_kMiddle, accident cause P₁,...,P_iThe occurrence of (A) induces the cause of an accident P_j,...,P_kOccurs. The cell mode is selected as the output mode, the wrapping type characteristics of the cell mode are mainly considered, the output rule can comprise both the element rule and the single attribute, and the output rule is formed together, so that the envelope information of the output rule is more complete and has higher analyzability. The association rules obtained by mining in different data sets are screened, and then presented in the form of a cell group in the front item and the back item of the association rules, wherein the cell group comprises three forms of attributes and attributes, attributes and rules. In practical application, by considering the association rule among the influencing factors, the prevention of the traffic accident can be achieved as long as fewer influencing factors are controlled.

Step four: rule analysis

And qualitatively and quantitatively analyzing the relevance among the factors of the accidents according to the strong association rule formed by the multivariate rules under different types of accidents generated in the step three, and providing reference basis for a decision-making level.

For example, with the transportation accident data information of Shenzhen city 2014-: s is more than or equal to 30 percent, C is more than or equal to 70 percent, F is more than or equal to 55 percent, and the correlation rule results under each traffic accident type are obtained, as shown in the following table:

TABLE 1 Shenzhen City traffic accident association rule results (part)

Analyzing the association rule results, the following suggestions can be provided: in terms of weather, when the weather condition is clear, a driver is more likely to cause an accident due to random lane change, when the weather condition is rain, the driver does not keep a safe distance with a front vehicle and unsafe driving becomes a main traffic behavior causing the accident, the accident frequent time period is 17:00-19:59, the accident area is a safe area, and therefore, the time period traffic broadcasting reminding is started from the weather, time and place in a certain weather period, and the safety protection consciousness of the driver is enhanced. On the driver side, drivers aged 19 to 23 and 30 to 35 years are a high-incidence group of type 1 accidents (loss of property accidents), but their association characteristics are different. Drivers aged 19 to 23 years have an approximate rate of occurrence of driving behavior with the number 1225, i.e., other behaviors that hinder safe driving while driving, which is the cause of the type 1 accident, and the imperfect marking is the main cause of such behaviors. And the main illegal action of the class 1 accident of the drivers who age from 30 to 35 years is 1094, i.e. the drivers do not keep a safe distance from the front vehicle, and the accident site is mostly distributed on the general urban roads. Considering that drivers in the age range of 19 to 23 are newly promoted drivers just graduate from driving schools, the training of driving schools is recommended to strengthen the training of the awareness of the traffic safety behaviors of students, the management emphasizes the group difference of the drivers, and the management of novice is emphasized. And most drivers in the age range of 24-29 years have accumulated driving ages of 4-6 years, and are the weakest time period of safety consciousness link, so that the method suggests that the traffic safety driving education can be properly carried out when the first driver license is replaced due, and the safety consciousness is intensively cultured by combining with an actual case, and considering that the audience population is too large, the safety education can be carried out by adopting modes of network answering, network video and the like, and meanwhile, the safety education qualification is brought into the condition range of replacing the first driver license.

The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims

1. A traffic accident cause mining method based on a universal meta-rule is characterized by comprising the following steps:

step one, data preparation

step 1.4: carrying out data cleaning on the traffic accident information obtained in the step 1.3, wherein the data cleaning comprises missing value processing and noise filtering; the missing value processing adopts a deleting method to delete the information of which the attribute missing degree exceeds a preset missing threshold value in the information of the causes of the 5 types of traffic accidents; the noise filtering adopts an outlier detection algorithm based on a statistical method to diagnose outliers in the data and delete the outliers;

step two, parameter selection

rules

The support in the traffic data set T is as follows:

rules

The confidence in the traffic data set T is as follows:

wherein the content of the first and second substances,

for rules

The condition probability that the accident causes Y occur simultaneously when the accident causes X occur is shown, and when the confidence coefficient of the rule R is greater than the preset threshold value, the occurrence of the X event is considered to be inducedThe occurrence of the Y event is led, and the higher the confidence coefficient is, the closer the relation between the two is;

step three, association rule mining based on meta-rules

Step 3.1: for a plurality of data sets T with the same format₁,T₂,…,T_iSetting consistent minimum support degree S and minimum confidence degree C, and carrying out association rule mining once to obtain corresponding association rule R₁,R₂,…,R_i；

Step 3.2: according to the frequent indexes of each association rule in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set;

step four, rule analysis

2. The method for mining the cause of the traffic accident based on the universal meta-rule as claimed in claim 1, wherein in step 1.1, the category 5 traffic accident cause information comprises the following attributes:

the road condition information includes: road grade, road surface form, road surface condition, safety facilities and linear design;

3. The mining method for traffic accident causes based on universal meta-rules according to claim 1, characterized in that in step 1.2, attribute variables with non-null percentage higher than 70% are retained by a value analysis method.

4. The method for mining the cause of the traffic accident based on the universal meta-rule as claimed in claim 1, wherein the predetermined loss threshold is 30% in step 1.4.

5. The method for mining the cause of a traffic accident based on the universal meta-rule as claimed in claim 1, wherein in step 1.5, the accident is classified as follows:

6. The method as claimed in claim 1, wherein in steps 2.2, 2.3 and 2.4, when selecting the thresholds of each type, readability of the rules should be ensured first, so that the number of the rules is kept within a readable range, and validity of the rules should be ensured, so that the attributes included in the rules are as many as possible.