CN108717786B - Traffic accident cause mining method based on universality meta-rule - Google Patents

Traffic accident cause mining method based on universality meta-rule Download PDF

Info

Publication number
CN108717786B
CN108717786B CN201810781739.9A CN201810781739A CN108717786B CN 108717786 B CN108717786 B CN 108717786B CN 201810781739 A CN201810781739 A CN 201810781739A CN 108717786 B CN108717786 B CN 108717786B
Authority
CN
China
Prior art keywords
rules
accident
rule
association
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810781739.9A
Other languages
Chinese (zh)
Other versions
CN108717786A (en
Inventor
曾维理
赵子瑜
李娟�
任禹蒙
孙煜时
羊钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201810781739.9A priority Critical patent/CN108717786B/en
Publication of CN108717786A publication Critical patent/CN108717786A/en
Application granted granted Critical
Publication of CN108717786B publication Critical patent/CN108717786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention discloses a traffic accident cause mining method based on a universal element rule. According to the method, historical traffic accident information is read, data are preprocessed, then, each accident record is graded according to road traffic accident classification standards, on the basis, an association rule analysis method is applied, reasonable threshold values of minimum support, minimum confidence coefficient and frequency index are set, association rule mining with the threshold values being consistent is conducted on multiple data sets, binary data sets of the data sets and strong association rules are built, element rule sets are extracted, the element rule sets and the data sets are collected for secondary mining, element rules in the multiple data sets are integrated, and output rules which are output in a cellular mode and formed by multiple rules with universal characteristics are obtained. The method can dig out hidden association information in the traditional association rule, screen valuable rules, eliminate the association rule without the universal characteristic of the multi-partition, and provide decision assistance for a traffic safety manager.

Description

Traffic accident cause mining method based on universality meta-rule
Technical Field
The invention belongs to the technical field of traffic safety, and particularly relates to a traffic accident cause mining method based on a universality element rule.
Background
In recent years, urban road traffic is rapidly developed, the scale and density of urban road networks are greatly improved, catering delivery service, shared bicycles and shared automobiles and the internet taxi renting industry emerge like spring bamboo shoots after raining, and the urban road traffic is under heavy pressure and traffic accidents are also in an ascending trend after flourishing. With the basic perfection of the recording conditions of the current traffic accident data, how to effectively utilize the data is a main problem facing the current stage when finding a symptom from a large amount of traffic accident data. The method has the advantages that the method provides basis for decision-making level by analyzing accident occurrence reasons and finding out the intrinsic rules of all attribute relations in the traffic accident, achieves targeted goal, and reduces the probability of the traffic accident by deleting the conditions of the traffic accident through artificial intervention and control.
With the development of artificial intelligence and big data technology, concepts and methods of data mining are also beginning to be applied to the traffic field in a large number. The association rule mining method is used for analyzing the association among different transaction attributes in a data set, and is a mainstream traffic accident data mining means at present. The mining algorithm of the association rules is also one of the main representatives of unsupervised learning, exactly accords with the characteristics of strong randomness and uneven data distribution of traffic accidents, can reflect the potential association between transaction attributes, further analyzes valuable association rules and makes reasonable decisions.
The problems existing in the prior art are as follows: based on the association rule obtained by traffic data mining, in order to ensure readability, the threshold value is set to be higher, the probability of accidents with different severity degrees in the traffic accidents is different, and if the association rule with the consistent threshold value is directly adopted for mining the whole traffic data set, part of hidden association in the association rule cannot be embodied; in a traffic data set with more data attributes, the number of attributes of the front item and the back item in the association rule is too large, and no logic embodiment exists between the front item and the back item, so that the analysis of a decision level is inconvenient; the data mining is carried out on the data set singly to obtain the association rule, the universality of the rule is not always considered, the obtained rule part is only suitable for the current data set, and only the accident cause of a single data set block can be reflected, so that the common traffic accident cause characteristic of multiple blocks cannot be reflected.
Disclosure of Invention
In order to solve the technical problems of the background art, the invention aims to provide a traffic accident cause mining method based on a universal meta-rule, which extracts the meta-rule with universal characteristics and outputs the rule in a cellular mode form formed by the meta-rule, thereby realizing the universality and easy interpretability of the rule.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a traffic accident cause mining method based on a universal meta-rule comprises the following steps:
step one, data preparation
Step 1.1: reading traffic accident information of the past year, and dividing the traffic accident information into 5 types of traffic accident cause information including accident basic information, accident driver information, accident vehicle information, road condition information and environment information, wherein the traffic accident cause information is described by multiple attributes;
step 1.2: performing data quality analysis on the read traffic accident information, and screening and reserving the attribute variables with qualified quality;
step 1.3: selecting attributes of the screened traffic accident information, and removing the attributes which are irrelevant or redundant to the mining task, wherein the attribute selection aims to find out a minimum attribute set and simultaneously ensures that the probability distribution of a data set is as close as possible to the original distribution obtained by utilizing all the attributes;
step 1.4: carrying out data cleaning on the traffic accident information obtained in the step 1.3, wherein the data cleaning comprises missing value processing and noise filtering; the missing value processing adopts a deleting method to delete the information of which the attribute missing degree exceeds a preset missing threshold value in the information of the causes of the 5 types of traffic accidents; the noise filtering adopts an outlier detection algorithm based on a statistical method, and outliers in the data are diagnosed and deleted;
step 1.5: clustering the continuously distributed attributes, and classifying each accident according to the road traffic accident classification standard;
step two, parameter selection
Step 2.1: the support and confidence of the rules are calculated according to the following methods:
rule R:
Figure BDA0001732765330000031
the support in the traffic data set T is as follows:
Figure BDA0001732765330000032
wherein the content of the first and second substances,
Figure BDA0001732765330000033
rule R:
Figure BDA0001732765330000034
the confidence in the traffic data set T is as follows:
Figure BDA0001732765330000035
wherein the content of the first and second substances,
Figure BDA0001732765330000036
for rule R:
Figure BDA0001732765330000037
x is called the antecedent of the rule, Y is called the postent of the rule, and the support of the rule R
Figure BDA0001732765330000038
The probability of accident cause X and Y occurring simultaneously, and the confidence of rule R
Figure BDA0001732765330000039
The condition probability that the accident causes Y occur simultaneously when the accident causes X occur is shown, when the confidence coefficient of the rule R is greater than a preset threshold value, the occurrence of the Y event is considered to be induced by the occurrence of the X event, and the higher the confidence coefficient is, the closer the relation between the X event and the Y event is;
step 2.2: selecting a minimum support S threshold: after different types of accidents of traffic data sets of different administrative regions are distinguished, support degree calculation is carried out on rules of different regions according to a formula (1), and a relation graph of the number of association rules meeting a minimum support degree threshold value and the minimum support degree threshold value is obtained; selecting different minimum support threshold values, taking the minimum support threshold value as a horizontal coordinate, taking the number of association rules meeting the minimum support threshold value as a vertical coordinate, obtaining a support threshold value selection trend graph of each area of various accidents, and selecting the minimum support threshold value;
step 2.3: selecting a minimum confidence level Cthreshold: carrying out support degree and confidence degree calculation on the rules of different areas according to formulas (1) and (2) for traffic accident data sets of different types, setting different support degree and confidence degree thresholds for comparison and analysis, obtaining a bubble relation graph of rule distribution meeting threshold conditions and threshold setting, and balancing the selection range of the support degree and the confidence degree thresholds, wherein the abscissa corresponds to the support degree threshold, the ordinate corresponds to the confidence degree threshold, and the larger the number of bubbles is, the more the number of the included association rules is represented;
step 2.4: selecting a frequent index Fthreshold: screening indexes of universal element rules into frequent indexes among different traffic data sets, establishing an association rule frequent index table based on multiple data sets according to association rules obtained by mining in different data sets respectively, taking the association rules meeting frequent index threshold values as the universal element rules, wherein each data set adopts consistent support degree and confidence threshold values when mining the association rules, indicating existence rules and nonexistence rules respectively through Boolean variables 1 and 0, and a rule RiThe frequent index of (c) is defined as follows:
Figure BDA0001732765330000041
wherein p isijIs rule RiIn the data set TjA judgment value of (1), rule RiIn the data set TjIn presence of, then pijGet 1, otherwise pijTaking 0 and n as the number of the data sets;
in order to obtain element rules with universality in multiple regions and ensure that the obtained element rules with universality have analytical significance, association screening is carried out on different types of traffic accident data sets in each region, association rules which repeatedly appear in each region are screened out, association trend graphs of association rule regions of various accidents are obtained by taking a frequent index threshold as an abscissa and the number of strong association rules as an ordinate, and the frequent index threshold is selected;
step three, association rule mining based on meta-rules
Step 3.1: for multiple data sets T of the same format1,T2,…,TiSetting consistent minimum support degree S and minimum confidence degree C, and mining association rules for one time to obtain corresponding association rules R1,R2,…,Ri
Step 3.2: according to the frequent indexes of the association rules in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set;
step 3.3: combining meta-rule set and data set T1,T2,…,TiPerforming secondary mining, and integrating meta-rules in a plurality of groups of data sets;
step 3.4: strong association rule output is carried out according to the minimum support degree S and the minimum confidence degree C, meta-rules based on association factors causing traffic accident types are derived, and output rules which are output in a cellular mode and are composed of multi-element rules with pervasive characteristics are obtained;
step four, rule analysis
And qualitatively and quantitatively analyzing the relevance among the accident causes according to a strong association rule formed by the generated multivariate rules under different types of accidents, and providing a reference basis for a decision-making level.
Further, in step 1.1, the attributes included in the category 5 traffic accident cause information are as follows:
the accident basic information includes: the type of the accident, the accident grade, the casualties, the direct property loss of the accident, the accident time and the accident site;
the information about drivers involved in the accident includes: the sex, age, occupation and driving age of the driver, the physical and psychological conditions when an accident occurs and the driving operation behaviors before and after the accident occur;
the accident vehicle information includes: vehicle type, vehicle safety condition, and vehicle drivability;
the road condition information includes: road grade, road surface form, road surface condition, safety facilities and line shape design;
the environment information includes: road traffic conditions, driving sight distance, weather conditions, sign and line and illumination at the moment of occurrence.
Further, in step 1.2, attribute variables with a non-empty occupancy of greater than 70% are retained by a value analysis method.
Further, in step 1.4, the preset absence threshold is 30%.
Further, in step 1.5, the accident is classified as follows:
loss of property accident: causing damage to vehicles, cargo or other property items, or with minor injury to personnel;
injury accidents: cause the principal to be seriously injured or slightly injured, or can be accompanied with property loss;
death accident: causing death of the party or accompanied personal injury and property loss.
Further, in steps 2.2, 2.3 and 2.4, when selecting various thresholds, firstly, readability of the rules should be ensured, so that the number of the rules is kept within a readable range, and meanwhile, validity of the rules should be ensured, so that the attributes included in the rules are as many as possible.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the method mainly carries out universality analysis through association rules obtained by carrying out primary mining on different data sets, extracts rules with the universality characteristics as meta-rules, integrates the meta-rules in a plurality of groups of data sets through secondary mining, and outputs the meta-rules in a cellular mode to obtain output rules formed by the multi-element rules with the universality characteristics. Compared with the association rule obtained by the traditional mining means, the method has the advantages that: (1) hidden association information in the traditional association rule is disclosed through the universal meta rule, and the information is integrated into a mining result to be displayed; (2) eliminating the association rules without the universal characteristics of the multi-partition, and screening valuable universal rules; (3) the defects that the attributes of the traditional association rule in a decision-making level are various and are not beneficial to decision-making are overcome, the logicality among the attributes in the front item and the back item of the rule is emphasized, and the association rule is expressed in a more readable mode. The invention not only can dig out the causes causing various traffic accidents, but also can find out the incidence relation among the causes, thereby helping traffic management departments to find out the most important and most key intervention factors and improving the accident prevention effect.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
The invention provides a traffic accident cause mining method based on a universal meta rule, which comprises the following specific steps as shown in figure 1.
The method comprises the following steps: data preparation
Step 1.1: reading in traffic accident information of all the years, and dividing the traffic accident information into five aspects of traffic accident cause information according to accident basic information, accident-related driver information, accident vehicle information, road condition information and environment information to realize multi-angle description of accident attributes:
1) the accident basic information comprises information such as the type of the accident, the accident grade, accident casualties (the number of injured people, the number of dead people, the number of serious injury people and the number of light injury people), direct property loss of the accident, the accident time and the accident site. This information constitutes a basic description of the road traffic accident itself.
2) The driver information is one of important factors of an accident, and includes basic information such as sex, age, occupation, driving age and the like of the driver, and auxiliary information such as physical and psychological conditions when the accident occurs, driving operation behaviors before and after the accident occurs and the like.
3) The accident vehicle information includes a vehicle type, a vehicle safety condition, a vehicle drivability, and the like.
4) The road condition information, which includes road grade, road surface form, road surface condition, safety facilities, alignment design, etc., includes the urban road and the highway where the road traffic accident occurs.
5) The environmental information includes road traffic conditions, driving sight distance, weather conditions, mark lines, illumination and the like at the time of the accident, and the environmental information directly or indirectly influences the occurrence of the traffic accident.
Step 1.2: and carrying out data quality analysis on the traffic accident data information. And (4) incorporating attribute variables with non-empty occupancy of more than 70% into a variable system of the data sample of the next round through a value analysis method.
Step 1.3: and (4) performing attribute selection on the screened traffic accident data information, and removing the attributes which may not be related to the mining task or are redundant. The goal of attribute selection is to find the minimum set of attributes while ensuring that the probability distribution of the data set is as close as possible to the original distribution obtained using all the attributes. This has the advantage of reducing the number of attributes that appear on the discovery pattern, making the pattern easier to understand.
Step 1.4: and carrying out data cleaning on the reserved accident cause information. Missing value processing and noise filtering are mainly performed. The missing value processing is performed first. The missing data represents the independent individual information of the accident, and no obvious correlation exists between accidents, so that the missing data can not be compensated by later analysis theoretically, a deleting method is used for cleaning the data, the accident information with the information attribute missing degree exceeding 30% caused by 5 types of traffic accidents is eliminated, and the data quality and the mining value are improved. Noise filtering is then performed, where a statistical-based outlier detection algorithm is employed. Because the traffic accident data is independent individual information, no obvious correlation exists among accidents, and the traffic accident data has high randomness, the missing data can not be smoothly processed by analysis of a regression method in theory, and the diagnosed outliers are deleted.
Step 1.5: and carrying out data specification on the continuity attributes. In the subsequent data mining process, the characteristics of various traffic accidents can be summarized in a classified manner, attention can be paid to a certain specific class for further analysis, and clustering processing is carried out on continuously distributed attributes. Meanwhile, in order to conveniently and visually present the associated factor information causing various traffic accidents according to the data mining result, classifying each accident based on the number of dead people, the number of light injuries, the number of heavy injuries and property loss recorded by each accident according to the classification standard of the road traffic accidents:
a) and (5) property loss accidents. Damage to vehicles, cargo or other property items, which may be accompanied by minor injury to personnel;
b) an injury accident. Cause the serious injury or the slight injury to the people concerned, and can be accompanied with property loss;
c) and (4) death accidents. The death of the people involved can be accompanied by personal injury and property loss.
Step two: parameter selection
Step 2.1: the support and confidence of the rule are calculated according to the following formulas.
Rule R:
Figure BDA0001732765330000087
the support in the traffic data set T is as follows:
Figure BDA0001732765330000081
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0001732765330000082
rule R:
Figure BDA0001732765330000083
the confidence in the traffic data set T is as follows:
Figure BDA0001732765330000084
wherein the content of the first and second substances,
Figure BDA0001732765330000085
for rule R:
Figure BDA0001732765330000086
x is called the antecedent of the rule, Y is called the postent of the rule, and the support of the rule R
Figure BDA0001732765330000091
The probability of accident cause X and Y occurring at the same time, and the confidence of rule R
Figure BDA0001732765330000092
And when the confidence coefficient of the rule R is greater than a preset threshold value, the occurrence of the X event induces the occurrence of the Y event, and the higher the confidence coefficient is, the tighter the relation between the X event and the Y event is.
Step 2.2: selecting a minimum support S threshold: after different types of accidents of traffic data sets of different administrative regions are distinguished, support degree calculation is carried out on rules of different regions according to a formula (1), and a relation graph of the number of association rules meeting a minimum support degree threshold value and the minimum support degree threshold value is obtained; by selecting different minimum support threshold values, taking the minimum support threshold value as a horizontal coordinate and taking the number of association rules meeting the minimum support threshold value as a vertical coordinate, a support threshold value selection trend graph of each area of various accidents is obtained, and the minimum support threshold value is selected.
Step 2.3: selecting a minimum confidence level Cthreshold: for traffic accident data sets under different types, carrying out support degree and confidence degree calculation on rules in different areas according to formulas (1) and (2), setting different support degree and confidence degree thresholds for comparison and analysis, obtaining a bubble relation graph of rule distribution meeting threshold conditions and threshold setting, and balancing the selection range of the support degree and the confidence degree thresholds, wherein the abscissa corresponds to the support degree threshold, the ordinate corresponds to the confidence degree threshold, and the larger the number of bubbles is, the more the number of contained association rules is.
Step 2.4: selecting a frequent index Fthreshold: screening indexes of meta-rules with universal characteristics among different traffic data sets into frequent indexes, establishing a multi-data-set-based association rule frequent index table according to association rules obtained by mining in different data sets respectively, and taking the association rules meeting a frequent index threshold value as universal meta-rules, wherein the data sets adopt consistent indexes when mining the association rulesSupport degree and confidence degree threshold, wherein Boolean variables 1 and 0 respectively represent existence rule and non-existence rule, and rule RiThe frequent index of (c) is defined as follows:
Figure BDA0001732765330000093
wherein p isijIs rule RiIn the data set TjA judgment value of (1), rule RiIn the data set TjIn presence of, then pijGet 1, otherwise pijAnd taking 0 and n as the number of the data sets.
In order to obtain element rules with universality in multiple regions and ensure that the obtained element rules with universality have analytical significance, association screening is carried out on different types of traffic accident data sets in each region, association rules which repeatedly appear in each region are screened out, association trend graphs of association rule regions of various accidents are obtained by taking a frequent index threshold as an abscissa and the number of strong association rules as an ordinate, and the frequent index threshold is selected.
When selecting various types of thresholds, firstly, readability of the rules should be ensured, so that the number of the rules is kept within a readable range (generally, less than 200 pieces), and meanwhile, validity of the rules should be ensured, so that attributes contained in the rules are as many as possible.
Step three: association rule mining based on meta-rules
Step 3.1: for multiple data sets T of the same format1,T2,…,TiSetting consistent minimum support degree S and minimum confidence degree C, and mining association rules for one time to obtain corresponding association rules R1,R2,…,Ri
Step 3.2: and according to the frequent indexes of the association rules in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set.
Step 3.3: combining meta-rule set and data set T1,T2,…,TiAnd performing secondary mining, and integrating meta-rules in a plurality of groups of data sets.
Step 3.4: and generating an association rule. Outputting strong association rules according to the minimum support degree S and the minimum confidence degree C, deriving meta-rules based on association factors causing the traffic accident type, and obtaining output rules composed of multi-element rules with cell mode output and universal characteristics
Figure BDA0001732765330000101
Rule template of (1), representing accident cause P1,...,Pi,Pj,...,PkThe occurrence of (2) induces the occurrence of the accident cause Q (Y), in the accident cause P1,...,Pi,Pj,...,PkMiddle, accident cause P1,...,PiThe occurrence of (A) induces the cause of an accident Pj,...,PkOccurs. The cell mode is selected as the output mode, the wrapping type characteristics of the cell mode are mainly considered, the output rule can comprise both the element rule and the single attribute, and the output rule is formed together, so that the envelope information of the output rule is more complete and has higher analyzability. The association rules obtained by mining in different data sets are screened, and then presented in the form of a cell group in the front item and the back item of the association rules, wherein the cell group comprises three forms of attributes and attributes, attributes and rules. In practical application, by considering the association rule among the influencing factors, the prevention of the traffic accident can be achieved as long as fewer influencing factors are controlled.
Step four: rule analysis
And qualitatively and quantitatively analyzing the relevance among the factors of the accidents according to the strong association rule formed by the multivariate rules under different types of accidents generated in the step three, and providing reference basis for a decision-making level.
For example, with the transportation accident data information of Shenzhen city 2014-: s is more than or equal to 30 percent, C is more than or equal to 70 percent, F is more than or equal to 55 percent, and the correlation rule results under each traffic accident type are obtained, as shown in the following table:
TABLE 1 Shenzhen City traffic accident association rule results (part)
Figure BDA0001732765330000111
Analyzing the association rule results, the following suggestions can be provided: in terms of weather, when the weather condition is clear, a driver is more likely to cause an accident due to random lane change, when the weather condition is rain, the driver does not keep a safe distance with a front vehicle and unsafe driving becomes a main traffic behavior causing the accident, the accident frequent time period is 17:00-19:59, the accident area is a safe area, and therefore, the time period traffic broadcasting reminding is started from the weather, time and place in a certain weather period, and the safety protection consciousness of the driver is enhanced. On the driver side, drivers aged 19 to 23 and 30 to 35 years are a high-incidence group of type 1 accidents (loss of property accidents), but their association characteristics are different. Drivers aged 19 to 23 years have an approximate rate of occurrence of driving behavior with the number 1225, i.e., other behaviors that hinder safe driving while driving, which is the cause of the type 1 accident, and the imperfect marking is the main cause of such behaviors. And the main illegal action of the class 1 accident of the drivers who age from 30 to 35 years is 1094, i.e. the drivers do not keep a safe distance from the front vehicle, and the accident site is mostly distributed on the general urban roads. Considering that drivers in the age range of 19 to 23 are newly promoted drivers just graduate from driving schools, the training of driving schools is recommended to strengthen the training of the awareness of the traffic safety behaviors of students, the management emphasizes the group difference of the drivers, and the management of novice is emphasized. And most drivers in the age range of 24-29 years have accumulated driving ages of 4-6 years, and are the weakest time period of safety consciousness link, so that the method suggests that the traffic safety driving education can be properly carried out when the first driver license is replaced due, and the safety consciousness is intensively cultured by combining with an actual case, and considering that the audience population is too large, the safety education can be carried out by adopting modes of network answering, network video and the like, and meanwhile, the safety education qualification is brought into the condition range of replacing the first driver license.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims (6)

1. A traffic accident cause mining method based on a universal meta-rule is characterized by comprising the following steps:
step one, data preparation
Step 1.1: reading traffic accident information of the past year, and dividing the traffic accident information into 5 types of traffic accident cause information including accident basic information, accident driver information, accident vehicle information, road condition information and environment information, wherein the traffic accident cause information is described by multiple attributes;
step 1.2: performing data quality analysis on the read traffic accident information, and screening and reserving the attribute variables with qualified quality;
step 1.3: selecting attributes of the screened traffic accident information, and removing the attributes which are irrelevant or redundant to the mining task, wherein the attribute selection aims to find out a minimum attribute set and simultaneously ensures that the probability distribution of a data set is as close as possible to the original distribution obtained by utilizing all the attributes;
step 1.4: carrying out data cleaning on the traffic accident information obtained in the step 1.3, wherein the data cleaning comprises missing value processing and noise filtering; the missing value processing adopts a deleting method to delete the information of which the attribute missing degree exceeds a preset missing threshold value in the information of the causes of the 5 types of traffic accidents; the noise filtering adopts an outlier detection algorithm based on a statistical method to diagnose outliers in the data and delete the outliers;
step 1.5: clustering the continuously distributed attributes, and classifying each accident according to the road traffic accident classification standard;
step two, parameter selection
Step 2.1: the support and confidence of the rules are calculated according to the following methods:
rules
Figure FDA0001732765320000011
The support in the traffic data set T is as follows:
Figure FDA0001732765320000012
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0001732765320000021
rules
Figure FDA0001732765320000022
The confidence in the traffic data set T is as follows:
Figure FDA0001732765320000023
wherein the content of the first and second substances,
Figure FDA0001732765320000024
Figure FDA0001732765320000025
for rules
Figure FDA0001732765320000026
X is called the antecedent of the rule, Y is called the postent of the rule, and the support of the rule R
Figure FDA0001732765320000027
The probability of accident cause X and Y occurring simultaneously, and the confidence of rule R
Figure FDA0001732765320000028
The condition probability that the accident causes Y occur simultaneously when the accident causes X occur is shown, and when the confidence coefficient of the rule R is greater than the preset threshold value, the occurrence of the X event is considered to be inducedThe occurrence of the Y event is led, and the higher the confidence coefficient is, the closer the relation between the two is;
step 2.2: selecting a minimum support S threshold: after different types of accidents of traffic data sets of different administrative regions are distinguished, support degree calculation is carried out on rules of different regions according to a formula (1), and a relation graph of the number of association rules meeting a minimum support degree threshold value and the minimum support degree threshold value is obtained; selecting different minimum support threshold values, taking the minimum support threshold value as a horizontal coordinate, taking the number of association rules meeting the minimum support threshold value as a vertical coordinate, obtaining a support threshold value selection trend graph of each area of various accidents, and selecting the minimum support threshold value;
step 2.3: selecting a minimum confidence level Cthreshold: carrying out support degree and confidence degree calculation on the rules of different areas according to formulas (1) and (2) for traffic accident data sets of different types, setting different support degree and confidence degree thresholds for comparison and analysis, obtaining a bubble relation graph of rule distribution meeting threshold conditions and threshold setting, and balancing the selection range of the support degree and the confidence degree thresholds, wherein the abscissa corresponds to the support degree threshold, the ordinate corresponds to the confidence degree threshold, and the larger the number of bubbles is, the more the number of the included association rules is represented;
step 2.4: selecting a frequent index Fthreshold: screening indexes of universal element rules into frequent indexes among different traffic data sets, establishing an association rule frequent index table based on multiple data sets according to association rules obtained by mining in different data sets respectively, taking the association rules meeting frequent index threshold values as the universal element rules, wherein each data set adopts consistent support degree and confidence threshold values when mining the association rules, indicating existence rules and nonexistence rules respectively through Boolean variables 1 and 0, and a rule RiThe frequent index of (c) is defined as follows:
Figure FDA0001732765320000031
wherein p isijIs rule RiIn the data set TjA judgment value of (1), rule RiIn the data set TjIn presence of, then pijGet 1, otherwise pijTaking 0 and n as the number of the data sets;
in order to obtain element rules with universality in multiple regions and ensure that the obtained element rules with universality have analytical significance, association screening is carried out on different types of traffic accident data sets in each region, association rules which repeatedly appear in each region are screened out, association trend graphs of association rule regions of various accidents are obtained by taking a frequent index threshold as an abscissa and the number of strong association rules as an ordinate, and the frequent index threshold is selected;
step three, association rule mining based on meta-rules
Step 3.1: for a plurality of data sets T with the same format1,T2,…,TiSetting consistent minimum support degree S and minimum confidence degree C, and carrying out association rule mining once to obtain corresponding association rule R1,R2,…,Ri
Step 3.2: according to the frequent indexes of each association rule in different data sets, extracting meta-rules through frequent index F threshold screening, and establishing a meta-rule set;
step 3.3: combining meta-rule set and data set T1,T2,…,TiPerforming secondary mining, and integrating meta-rules in a plurality of groups of data sets;
step 3.4: strong association rule output is carried out according to the minimum support degree S and the minimum confidence degree C, meta-rules based on association factors causing traffic accident types are derived, and output rules which are output in a cellular mode and are composed of multi-element rules with pervasive characteristics are obtained;
step four, rule analysis
And qualitatively and quantitatively analyzing the relevance among the accident causes according to a strong association rule formed by the generated multivariate rules under different types of accidents, and providing a reference basis for a decision-making level.
2. The method for mining the cause of the traffic accident based on the universal meta-rule as claimed in claim 1, wherein in step 1.1, the category 5 traffic accident cause information comprises the following attributes:
the accident basic information includes: the type of the accident, the accident grade, the casualties, the direct property loss of the accident, the accident time and the accident site;
the information about drivers involved in the accident includes: the sex, age, occupation and driving age of the driver, the physical and psychological conditions when an accident occurs and the driving operation behaviors before and after the accident occur;
the accident vehicle information includes: vehicle type, vehicle safety condition, and vehicle drivability;
the road condition information includes: road grade, road surface form, road surface condition, safety facilities and linear design;
the environment information includes: road traffic conditions, driving sight distance, weather conditions, sign and line and illumination at the moment of occurrence.
3. The mining method for traffic accident causes based on universal meta-rules according to claim 1, characterized in that in step 1.2, attribute variables with non-null percentage higher than 70% are retained by a value analysis method.
4. The method for mining the cause of the traffic accident based on the universal meta-rule as claimed in claim 1, wherein the predetermined loss threshold is 30% in step 1.4.
5. The method for mining the cause of a traffic accident based on the universal meta-rule as claimed in claim 1, wherein in step 1.5, the accident is classified as follows:
loss of property accident: causing damage to vehicles, cargo or other property items, or with minor injury to personnel;
injury accidents: cause the principal to be seriously injured or slightly injured, or can be accompanied with property loss;
death accident: causing death of the party or accompanied personal injury and property loss.
6. The method as claimed in claim 1, wherein in steps 2.2, 2.3 and 2.4, when selecting the thresholds of each type, readability of the rules should be ensured first, so that the number of the rules is kept within a readable range, and validity of the rules should be ensured, so that the attributes included in the rules are as many as possible.
CN201810781739.9A 2018-07-17 2018-07-17 Traffic accident cause mining method based on universality meta-rule Active CN108717786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810781739.9A CN108717786B (en) 2018-07-17 2018-07-17 Traffic accident cause mining method based on universality meta-rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810781739.9A CN108717786B (en) 2018-07-17 2018-07-17 Traffic accident cause mining method based on universality meta-rule

Publications (2)

Publication Number Publication Date
CN108717786A CN108717786A (en) 2018-10-30
CN108717786B true CN108717786B (en) 2022-06-17

Family

ID=63914019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810781739.9A Active CN108717786B (en) 2018-07-17 2018-07-17 Traffic accident cause mining method based on universality meta-rule

Country Status (1)

Country Link
CN (1) CN108717786B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410588B (en) * 2018-12-20 2022-03-15 湖南晖龙集团股份有限公司 Traffic accident evolution analysis method based on traffic big data
CN110263709B (en) * 2019-06-19 2021-07-16 百度在线网络技术(北京)有限公司 Driving decision mining method and device
CN110442620B (en) * 2019-08-05 2023-08-29 赵玉德 Big data exploration and cognition method, device, equipment and computer storage medium
CN110825777A (en) * 2019-11-11 2020-02-21 云南电网有限责任公司电力科学研究院 Cause and effect analysis method for park road degradation
CN111144772B (en) * 2019-12-30 2023-11-21 交通运输部公路科学研究所 Road transportation safety risk real-time assessment method based on data mining
CN111459994A (en) * 2020-03-06 2020-07-28 中国科学院计算技术研究所 Disabled person-oriented big data analysis method and system
CN112597236B (en) * 2020-12-04 2022-10-25 河南大学 Concept lattice-based association rule optimization method and visual display method
CN113077625B (en) * 2021-03-24 2022-03-15 合肥工业大学 Road traffic accident form prediction method
CN113792193B (en) * 2021-08-27 2023-02-28 武汉理工大学 Inland navigation mark-oriented accident data mining method and system
CN115794801B (en) * 2022-12-23 2023-08-15 东南大学 Data analysis method for mining cause chain relation of automatic driving accidents
CN116384820A (en) * 2023-03-31 2023-07-04 四川省自然资源科学研究院(四川省生产力促进中心) Scientific and technological innovation capability assessment method, system, equipment and medium for enterprises

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739818A (en) * 2009-11-19 2010-06-16 北京世纪高通科技有限公司 Method and device for data mining of road traffic accident based on association rule
CN103455563B (en) * 2013-08-15 2016-12-28 国家电网公司 A kind of data digging method being applicable to intelligent substation integrated monitoring system
CN103488802A (en) * 2013-10-16 2014-01-01 国家电网公司 EHV (Extra-High Voltage) power grid fault rule mining method based on rough set association rule
CN104298778B (en) * 2014-11-04 2017-07-04 北京科技大学 A kind of Forecasting Methodology and system of the steel rolling product quality based on correlation rule tree
CN104464344B (en) * 2014-11-07 2016-09-14 湖北大学 A kind of vehicle running path Forecasting Methodology and system
US10024684B2 (en) * 2014-12-02 2018-07-17 Operr Technologies, Inc. Method and system for avoidance of accidents
CN106383920B (en) * 2016-11-28 2019-11-12 东南大学 A kind of particularly serious traffic accident causation recognition methods based on correlation rule
CN107610421A (en) * 2017-09-19 2018-01-19 合肥英泽信息科技有限公司 A kind of geo-hazard early-warning analysis system and method

Also Published As

Publication number Publication date
CN108717786A (en) 2018-10-30

Similar Documents

Publication Publication Date Title
CN108717786B (en) Traffic accident cause mining method based on universality meta-rule
CN109448369B (en) Real-time operation risk calculation method for expressway
Warren et al. Driving while black: Bias processes and racial disparity in police stops
Chiou An artificial neural network-based expert system for the appraisal of two-car crash accidents
Marshall et al. Scofflaw bicycling: Illegal but rational
CN109191828B (en) Traffic participant accident risk prediction method based on ensemble learning
CN112364627B (en) Text mining-based safety production accident analysis method and device, electronic equipment and storage medium
CN111667204A (en) Method and system for determining and grading environmental risk degree of automatic driving open test road
Shiau et al. The application of data mining technology to build a forecasting model for classification of road traffic accidents
Taamneh Investigating the role of socio-economic factors in comprehension of traffic signs using decision tree algorithm
Kim et al. Hit-and-run crashes: Use of rough set analysis with logistic regression to capture critical attributes and determinants
CN115662113A (en) Signalized intersection people-vehicle game conflict risk assessment and early warning method
Fang et al. Driver risk assessment using traffic violation and accident data by machine learning approaches
Leonavičienė et al. Investigation of factors that have affected the outcomes of road traffic accidents on Lithuanian roads
Nayak et al. Application of text mining in analysing road crashes for road asset management
Shawky et al. The impact of road and site characteristics on the crash-injury severity of pedestrian crashes.
Tien Exploring factors associated with red-light running: a case study of Hanoi city
Hao et al. Difference in rural and urban driver-injury severities in highway–rail grade crossing accidents
Clarke et al. A statistical profile of road accidents during cross-flow turns
Mudgal et al. Mining of the correlations for fatal road accident using graph-based fuzzified FP-growth algorithm
CN115035722B (en) Road safety risk prediction method based on combination of space-time characteristics and social media
Asgharpour et al. Crash Severity Pattern of Motorcycle Crashes in Developing Country Context
Qu et al. Crash/near-crash analysis of naturalistic driving data using association rule mining
Dung EXPLORING FACTORS ASSOCIATED WITH RED-LIGHT RUNNING: A CASE STUDY OF HANOI CITY
Soltani et al. Predicting injury risk using Big Data: The case of Metropolitan Melbourne

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant