CN113808392B - Method for optimizing traffic accident data under multi-source data structure - Google Patents
Method for optimizing traffic accident data under multi-source data structure Download PDFInfo
- Publication number
- CN113808392B CN113808392B CN202110975201.3A CN202110975201A CN113808392B CN 113808392 B CN113808392 B CN 113808392B CN 202110975201 A CN202110975201 A CN 202110975201A CN 113808392 B CN113808392 B CN 113808392B
- Authority
- CN
- China
- Prior art keywords
- traffic
- road section
- data
- road
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
- G08G1/0129—Traffic data processing for creating historical data or processing based on historical data
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0137—Measuring and analyzing of parameters relative to traffic conditions for specific applications
Abstract
The invention discloses a method for optimizing traffic accident data under a multi-source data structure, which comprises the following steps: (1) collecting multi-source traffic data; (2) constructing a generating model which accords with the form distribution of the multi-source data; (3) balancing the traffic accident data structure; (4) and (5) verifying and evaluating the optimized data. The method comprises the steps of firstly collecting and summarizing multi-source traffic accident data, respectively determining the distribution form of each traffic data type, secondly constructing an accident data generation model based on the data distribution form, and finally verifying and evaluating an optimized data set based on a road safety analysis model. The method can greatly reduce the influence of the unbalanced traffic accident data structure on the safety analysis model and obtain accurate and reliable traffic safety evaluation results.
Description
Technical Field
The invention relates to a method for optimizing traffic accident data under a multi-source data structure, and belongs to the technical field of traffic data structures.
Background
In recent years, the construction of a road safety accident analysis model becomes a research hotspot in the field of traffic safety, however, the performance of the model depends on the effectiveness of a traffic accident data structure to a great extent. As a small probability event, especially a serious accident, a traffic accident often results in an unbalanced accident data structure, that is, the accident data sample is far smaller than a zero accident sample (i.e., a phenomenon of excessive zero). At present, in the scientific research field and the patent application field, most researches are based on traditional statistical analysis models, such as a zero-expansion Poisson regression model, bootstrap resampling and the like. With the development of advanced data mining technologies, upsampling and downsampling technologies are beginning to be used for data structure balance optimization, such as synthesizing few classes of oversampling technologies, generating countermeasure networks, and the like.
However, the method often gives a common likelihood function to all variables when generating a new data set, and ignores heterogeneity among different variables, thereby affecting the fitting effect of the model and the identification of safety factors. Therefore, in order to ensure the effectiveness of data generation and ensure the acquisition of accurate and reliable safety evaluation results, likelihood functions conforming to respective morphological distribution need to be respectively constructed for different variable data to generate a new data set, so that the accident data structure is balanced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for optimizing the traffic accident data under the multi-source data structure is provided, the influence of the unbalanced traffic accident data structure on the safety analysis model can be greatly reduced, and accurate and reliable traffic safety evaluation results can be obtained.
The invention adopts the following technical scheme for solving the technical problems:
a method for optimizing traffic accident data under a multi-source data structure comprises the following steps:
step 1, collecting multi-source traffic data, namely acquiring multi-source traffic safety influence factor data;
step 2, constructing a generation model which accords with the form distribution of the multi-source traffic data, namely constructing a distribution form function for each influence factor acquired in the step 1;
and 3, performing proliferation optimization processing on the multi-source traffic data acquired in the step 1 based on the generation model constructed in the step 2, so that the ratio of the number of accident samples to the number of zero accident samples in the processed multi-source traffic data is 1: 4.
As a further scheme of the invention, the method for optimizing the traffic accident data further comprises a step 4 of constructing a traffic safety analysis model and verifying the proliferation optimization result according to the fitting indexes of the model.
As a preferred scheme of the present invention, the multi-source traffic safety influencing factors in step 1 include: the total number N of annual traffic accidents of the road section, the length L of the road section, the daily average traffic quantity Q of the road section, the average speed V of the road section, the traffic node density S of the road section, the road grade A, the road width W, the number K of roads and the existence of buses and lanes B.
As a preferred scheme of the invention, the specific process of the step 2 is as follows:
dividing the multi-source traffic safety influence factors into counting variables, real-value variables, classification variables and ordered variables;
the counting variable comprises the total number N of the annual traffic accidents of the road section, and the distribution form function of the total number of the annual traffic accidents of the road section is constructed according to the formula (1):
wherein p (N ═ G) represents the probability of occurrence of a G accident on a link, λ represents the average number of occurrences of the accident per unit time or unit area, and G is a natural number;
the real-value variables comprise a road section length L, a road section daily average traffic quantity Q, a road section traffic node density S and a road width W, and the distribution form function of the real-value variables is constructed according to the formula (2):
Wherein Z represents a real variable, p (Z ═ J) represents the probability of the real variable taking the value J,represents a normal distribution function, mu (I), sigma (I)2Respectively, the mean and variance of the Gaussian distribution, wherein I represents the actual observed value of the real-valued variable;
the classification variables comprise road grade A, road lane number K and the existence of a bus lane B, and the distribution form function of the constructed classification variables is as shown in formula (3):
where H denotes a categorical variable, p (H ═ C) denotes the probability that the categorical variable takes on value C, and piC(F)、πq(F) Expressing parameters of a polynomial Logit model, F expressing actual observed values of the classification variables, and U being a natural number;
the ordered variables comprise a road section average speed V, and the distribution form function of the road section average speed is constructed according to the following formulas (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) Indicating the segment threshold, ψ, to which the mean value R correspondsV(E) And E is a model parameter and an actual observed value of the ordered variable.
As a preferred embodiment of the present invention, the traffic safety analysis model is represented by the following formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2 ln(Y)+2T,BIC=ln(n)T-21n(Y) (7)
wherein N represents the total annual traffic accident quantity of the road section, L represents the length of the road section, Q represents the daily average traffic volume of the road section, V represents the average speed of the road section, S represents the traffic node density of the road section, A represents the road grade, W represents the road width, K represents the number of the roads, B represents the existence of the bus lane, theta and theta1、θ2、θ3、θ4、θ5、θ6、θ7、θ8The coefficient of the traffic safety analysis model is AIC, BIC and T, wherein AIC represents a Chichi information quantity criterion, BIC represents a Bayesian information criterion, Y represents a maximum likelihood value, T represents the number of influencing factors, and n is the number of observation samples.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. the invention provides a method for optimizing traffic accident data under a multi-source data structure, which respectively determines the distribution form of each traffic data type, constructs an accident data generation model based on the data distribution form, and verifies and evaluates the optimized data set based on a road safety analysis model, thereby greatly reducing the influence of an unbalanced traffic accident data structure on the safety analysis model and enabling the traffic safety evaluation result to be more accurate and reliable.
2. The invention constructs the likelihood functions which accord with respective distribution aiming at different variable data, thereby ensuring the effectiveness of data generation and ensuring the acquisition of accurate and reliable safety evaluation results.
Drawings
FIG. 1 is a flow chart of a method of optimizing traffic accident data in a multi-source data structure in accordance with the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, the method for optimizing traffic accident data under a multi-source data structure provided by the present invention includes the following steps:
step 1, collecting multi-source traffic data, and respectively obtaining the following multi-source traffic safety influence factors through field investigation and investigation of related department traffic departments: the method comprises the following steps of (1) carrying out traffic accident total number N on a road section year, length L of the road section, daily average traffic quantity Q of the road section, average speed V of the road section, traffic node density S of the road section, road grade A, road width W, road number K and whether a bus lane B exists or not;
step 2, constructing a generation model which accords with the form distribution of the multi-source data, and constructing a suitable distribution form function for each factor in the step 1 specifically as follows:
counting variable (total number of annual traffic accidents N), as in formula (1):
where p (N ═ G) represents the probability of occurrence of a G accident on a link, and λ represents the average number of occurrences of the accident per unit time or unit area.
The real-valued variables (the length L of the road section, the width W of the road width, the density S of the traffic nodes of the road section, and the average traffic volume Q of the daily traffic of the road section) are as follows:
Wherein Z represents an actual value variable in the present invention, p (Z ═ J) represents a probability that the variable takes on the value J,represents a normal distribution function, mu (I), sigma (I)2The mean and variance of the gaussian distribution, I, represent the actual observed values of the variables.
The classification variables (road grade A, presence or absence of bus lane B, road lane number K) are as follows:
where H denotes a categorical variable in the present invention, p (H ═ C) denotes a probability that the variable takes on the value C, and piC(S) represents parameters of a polynomial Logit model, F represents an actual observed value of each variable, and U is a natural number.
The order variable (road segment average speed V), as shown in equations (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) The segment threshold corresponding to the average value R is shown, for example, R is 0.8, and the segment where R is located is [0, 1 ]]The threshold is 1, psiV(E) For model parameters, E is the order variable realAnd (4) observing the values.
Step 3, balancing a traffic accident data structure, balancing the traffic accident data structure based on the proliferation processing of each variable in the step 2 and combining the original observation data to balance the traffic accident data structure, wherein the ratio of a recommended accident sample (N is not equal to 0) to a zero accident sample (N is equal to 0) is 1: 4;
and 4, optimizing the verification and evaluation of data, constructing a traffic safety analysis model for verifying the balanced traffic accident data structure, and evaluating the data optimization result according to the fitting indexes (AIC and BIC) of the model, wherein the data optimization results are shown as the following formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2 ln(Y)+2T,BIC=ln(n)T-2ln(Y) (7)
where Y represents the maximum likelihood value, T represents the number of parameters (9 in the present invention), and n is the number of observation samples.
The present invention will be described with reference to specific examples.
1) Multi-source traffic data acquisition: multi-source data is collected by accurate survey methods and relevant department surveys, assuming n1-n10For accident sample, n11-n100The sample is a zero accident sample, so the ratio of the accident sample to the zero accident sample in the original data is 1: 9, and the data structure has an unbalance phenomenon, as shown in table 1.
TABLE 1 statistical table for sample data collection
2) And (3) proliferating accident sample data: when the ratio of the accident sample to the zero accident sample is inquired to be 1:4 according to the literature, the effectiveness of the safety analysis model and the interpretability of the variable can be ensured, so that the accident sample is proliferated through the generation model which accords with the form distribution of each variable in the step 2 of the invention, and the ratio of the non-zero accident sample to the zero accident sample in the analysis data is 1: 4.
3) Constructing a safety analysis model: respectively constructing a safety analysis model according to the original traffic accident data and the traffic accident data after proliferation optimization, wherein the model comprises the following steps:
safety analysis model based on original traffic accident data (ratio 1: 9)
Ln(Noi)=θoi+θo1Loi+θo2Qoi+θo3Voi+θo4Soi+θo5Aoi+θo6Woi+θo7Koi+θo8Boi
AICo=-2 ln(Yo)+2To,BICo=ln(no)To-2ln(Yo)
Safety analysis model (proportion is 1: 4) based on traffic accident data after proliferation optimization
Ln(Nai)=θai+θa1Lai+θa2Qai+θa3Vai+θa4Sai+θa5Aai+θa6Wai+θa7Kai+θa8Bai
AICa=-2 ln(Ya)+2Ta,BICa=ln(na)Ta-2ln(Ya)
4) The verification and evaluation of the optimized data, if the AIC is performed under the assumed data, because the case is performed under the assumed datao<AICa、BICo<BICaThe safety analysis model based on the traffic accident data after the proliferation optimization is superior to the safety analysis model based on the original traffic accident data in model fitting, and the safety analysis model based on the traffic accident data after the proliferation optimization is inferior to the safety analysis model based on the original traffic accident data in model fitting.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.
Claims (3)
1. A method for optimizing traffic accident data under a multi-source data structure is characterized by comprising the following steps:
step 1, collecting multi-source traffic data, namely acquiring multi-source traffic safety influence factor data;
the multi-source traffic safety influence factors comprise: the method comprises the following steps of (1) carrying out traffic accident total number N on a road section year, length L of the road section, daily average traffic quantity Q of the road section, average speed V of the road section, traffic node density S of the road section, road grade A, road width W, road number K and whether a bus lane B exists or not;
step 2, constructing a generation model which accords with the form distribution of the multi-source traffic data, namely constructing a distribution form function for each influence factor acquired in the step 1; the specific process is as follows:
dividing the multi-source traffic safety influence factors into counting variables, real-value variables, classification variables and ordered variables;
the counting variable comprises the total number N of the annual traffic accidents of the road section, and the distribution form function of the total number of the annual traffic accidents of the road section is constructed according to the formula (1):
wherein p (N ═ G) represents the probability of occurrence of a G accident on a link, λ represents the average number of occurrences of the accident per unit time or unit area, and G is a natural number;
the real-value variables comprise a road section length L, a road section daily average traffic quantity Q, a road section traffic node density S and a road width W, and the distribution form function of the real-value variables is constructed according to the formula (2):
wherein Z represents a real variable, p (Z ═ J) represents the probability of the real variable taking the value J,represents a normal distribution function, mu (I), sigma (I)2Respectively, the mean and variance of the Gaussian distribution, wherein I represents the actual observed value of the real-valued variable;
the classification variables comprise road grade A, road lane number K and the existence of a bus lane B, and the distribution form function of the constructed classification variables is as shown in formula (3):
where H denotes a categorical variable, p (H ═ C) denotes the probability that the categorical variable takes on value C, and piC(F)、πq(F) Expressing parameters of a polynomial Logit model, F expressing actual observed values of the classification variables, and U being a natural number;
the ordered variables comprise a road section average speed V, and the distribution form function of the road section average speed is constructed according to the following formulas (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) Indicating the segment threshold, ψ, to which the mean value R correspondsV(E) Is a model parameter, E is an ordered variable actual observed value;
and 3, carrying out proliferation optimization processing on the multi-source traffic data acquired in the step 1 based on the generation model constructed in the step 2, so that the ratio of the number of accident samples to the number of zero accident samples in the processed multi-source traffic data is 1: 4.
2. The method for optimizing traffic accident data under the multi-source data structure of claim 1, wherein the method for optimizing traffic accident data further comprises a step 4 of constructing a traffic safety analysis model and verifying the proliferation optimization result according to the fitting indexes of the model.
3. The method for optimizing traffic accident data under the multi-source data structure of claim 2, wherein the traffic safety analysis model is as shown in formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2ln(Y)+2T,BIC=ln(n)T-2ln(Y) (7)
wherein N represents the total annual traffic accident quantity of the road section, L represents the length of the road section, Q represents the daily average traffic volume of the road section, V represents the average speed of the road section, S represents the traffic node density of the road section, A represents the road grade, W represents the road width, K represents the number of the roads, B represents the existence of the bus lane, theta and theta1、θ2、θ3、θ4、θ5、θ6、θ7、θ8The coefficient of the traffic safety analysis model is AIC, BIC and T, wherein AIC represents a Chichi information quantity criterion, BIC represents a Bayesian information criterion, Y represents a maximum likelihood value, T represents the number of influencing factors, and n is the number of observation samples.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110975201.3A CN113808392B (en) | 2021-08-24 | 2021-08-24 | Method for optimizing traffic accident data under multi-source data structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110975201.3A CN113808392B (en) | 2021-08-24 | 2021-08-24 | Method for optimizing traffic accident data under multi-source data structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113808392A CN113808392A (en) | 2021-12-17 |
CN113808392B true CN113808392B (en) | 2022-04-01 |
Family
ID=78941653
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110975201.3A Active CN113808392B (en) | 2021-08-24 | 2021-08-24 | Method for optimizing traffic accident data under multi-source data structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113808392B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114639237B (en) * | 2022-02-21 | 2023-02-14 | 东南大学 | Method for analyzing influence effect after implementation of traffic safety management standard |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046957A (en) * | 2015-07-02 | 2015-11-11 | 清华大学 | Balanced sampling method for accident analysis and safety assessment |
CN112487961A (en) * | 2020-11-27 | 2021-03-12 | 鹏城实验室 | Traffic accident detection method, storage medium and equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6662141B2 (en) * | 1995-01-13 | 2003-12-09 | Alan R. Kaub | Traffic safety prediction model |
-
2021
- 2021-08-24 CN CN202110975201.3A patent/CN113808392B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105046957A (en) * | 2015-07-02 | 2015-11-11 | 清华大学 | Balanced sampling method for accident analysis and safety assessment |
CN112487961A (en) * | 2020-11-27 | 2021-03-12 | 鹏城实验室 | Traffic accident detection method, storage medium and equipment |
Non-Patent Citations (1)
Title |
---|
城市快速路交通事故特性分析及安全评价;裴玉龙等;《交通信息与安全》;20090420(第02期);87-90 * |
Also Published As
Publication number | Publication date |
---|---|
CN113808392A (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102831440B (en) | Method and device for decision tree based wide-area remote sensing image classification | |
CN107610469A (en) | A kind of day dimension regional traffic index forecasting method for considering multifactor impact | |
CN108665093B (en) | Deep learning-based expressway traffic accident severity prediction method | |
CN107644057B (en) | Absolute imbalance text classification method based on transfer learning | |
CN108388651A (en) | A kind of file classification method based on the kernel of graph and convolutional neural networks | |
CN102879677A (en) | Intelligent fault diagnosis method based on rough Bayesian network classifier | |
CN113808392B (en) | Method for optimizing traffic accident data under multi-source data structure | |
CN110263666A (en) | A kind of motion detection method based on asymmetric multithread | |
US20220084396A1 (en) | Method for extracting road capacity based on traffic big data | |
CN106023592A (en) | Traffic jam detection method based on GPS data | |
CN105809193A (en) | Illegal operation vehicle recognition method based on Kmeans algorithm | |
CN111159243A (en) | User type identification method, device, equipment and storage medium | |
CN110084534A (en) | A kind of driving risks and assumptions quantization method based on driving behavior portrait | |
CN104252556B (en) | A kind of river classification system | |
CN114299742B (en) | Speed limit information dynamic identification and update recommendation method for expressway | |
CN106126637A (en) | A kind of vehicles classification recognition methods and device | |
CN108710967B (en) | Expressway traffic accident severity prediction method based on data fusion and support vector machine | |
CN111444286B (en) | Long-distance traffic node relevance mining method based on trajectory data | |
CN111599170B (en) | Traffic running state classification method based on time sequence traffic network diagram | |
CN112149922A (en) | Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel | |
CN105337842A (en) | Method for filtering junk mail irrelevant to contents | |
CN114978931B (en) | Network traffic prediction method and device based on manifold learning and storage medium | |
CN116051841A (en) | Roadside ground object multistage clustering segmentation algorithm based on vehicle-mounted LiDAR point cloud | |
JP2023164240A (en) | Method for designing vehicle speed/slope compound operation condition cycle test | |
CN114170796A (en) | Algorithm improved congestion propagation analysis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |