CN113808392B - Method for optimizing traffic accident data under multi-source data structure - Google Patents

Method for optimizing traffic accident data under multi-source data structure Download PDF

Info

Publication number
CN113808392B
CN113808392B CN202110975201.3A CN202110975201A CN113808392B CN 113808392 B CN113808392 B CN 113808392B CN 202110975201 A CN202110975201 A CN 202110975201A CN 113808392 B CN113808392 B CN 113808392B
Authority
CN
China
Prior art keywords
traffic
road section
data
road
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110975201.3A
Other languages
Chinese (zh)
Other versions
CN113808392A (en
Inventor
郭延永
刘攀
丁红亮
马景峰
李清韵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110975201.3A priority Critical patent/CN113808392B/en
Publication of CN113808392A publication Critical patent/CN113808392A/en
Application granted granted Critical
Publication of CN113808392B publication Critical patent/CN113808392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications

Abstract

The invention discloses a method for optimizing traffic accident data under a multi-source data structure, which comprises the following steps: (1) collecting multi-source traffic data; (2) constructing a generating model which accords with the form distribution of the multi-source data; (3) balancing the traffic accident data structure; (4) and (5) verifying and evaluating the optimized data. The method comprises the steps of firstly collecting and summarizing multi-source traffic accident data, respectively determining the distribution form of each traffic data type, secondly constructing an accident data generation model based on the data distribution form, and finally verifying and evaluating an optimized data set based on a road safety analysis model. The method can greatly reduce the influence of the unbalanced traffic accident data structure on the safety analysis model and obtain accurate and reliable traffic safety evaluation results.

Description

Method for optimizing traffic accident data under multi-source data structure
Technical Field
The invention relates to a method for optimizing traffic accident data under a multi-source data structure, and belongs to the technical field of traffic data structures.
Background
In recent years, the construction of a road safety accident analysis model becomes a research hotspot in the field of traffic safety, however, the performance of the model depends on the effectiveness of a traffic accident data structure to a great extent. As a small probability event, especially a serious accident, a traffic accident often results in an unbalanced accident data structure, that is, the accident data sample is far smaller than a zero accident sample (i.e., a phenomenon of excessive zero). At present, in the scientific research field and the patent application field, most researches are based on traditional statistical analysis models, such as a zero-expansion Poisson regression model, bootstrap resampling and the like. With the development of advanced data mining technologies, upsampling and downsampling technologies are beginning to be used for data structure balance optimization, such as synthesizing few classes of oversampling technologies, generating countermeasure networks, and the like.
However, the method often gives a common likelihood function to all variables when generating a new data set, and ignores heterogeneity among different variables, thereby affecting the fitting effect of the model and the identification of safety factors. Therefore, in order to ensure the effectiveness of data generation and ensure the acquisition of accurate and reliable safety evaluation results, likelihood functions conforming to respective morphological distribution need to be respectively constructed for different variable data to generate a new data set, so that the accident data structure is balanced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for optimizing the traffic accident data under the multi-source data structure is provided, the influence of the unbalanced traffic accident data structure on the safety analysis model can be greatly reduced, and accurate and reliable traffic safety evaluation results can be obtained.
The invention adopts the following technical scheme for solving the technical problems:
a method for optimizing traffic accident data under a multi-source data structure comprises the following steps:
step 1, collecting multi-source traffic data, namely acquiring multi-source traffic safety influence factor data;
step 2, constructing a generation model which accords with the form distribution of the multi-source traffic data, namely constructing a distribution form function for each influence factor acquired in the step 1;
and 3, performing proliferation optimization processing on the multi-source traffic data acquired in the step 1 based on the generation model constructed in the step 2, so that the ratio of the number of accident samples to the number of zero accident samples in the processed multi-source traffic data is 1: 4.
As a further scheme of the invention, the method for optimizing the traffic accident data further comprises a step 4 of constructing a traffic safety analysis model and verifying the proliferation optimization result according to the fitting indexes of the model.
As a preferred scheme of the present invention, the multi-source traffic safety influencing factors in step 1 include: the total number N of annual traffic accidents of the road section, the length L of the road section, the daily average traffic quantity Q of the road section, the average speed V of the road section, the traffic node density S of the road section, the road grade A, the road width W, the number K of roads and the existence of buses and lanes B.
As a preferred scheme of the invention, the specific process of the step 2 is as follows:
dividing the multi-source traffic safety influence factors into counting variables, real-value variables, classification variables and ordered variables;
the counting variable comprises the total number N of the annual traffic accidents of the road section, and the distribution form function of the total number of the annual traffic accidents of the road section is constructed according to the formula (1):
Figure BDA0003227401970000021
wherein p (N ═ G) represents the probability of occurrence of a G accident on a link, λ represents the average number of occurrences of the accident per unit time or unit area, and G is a natural number;
the real-value variables comprise a road section length L, a road section daily average traffic quantity Q, a road section traffic node density S and a road width W, and the distribution form function of the real-value variables is constructed according to the formula (2):
Figure BDA0003227401970000022
j is a continuous natural number (2)
Wherein Z represents a real variable, p (Z ═ J) represents the probability of the real variable taking the value J,
Figure BDA0003227401970000023
represents a normal distribution function, mu (I), sigma (I)2Respectively, the mean and variance of the Gaussian distribution, wherein I represents the actual observed value of the real-valued variable;
the classification variables comprise road grade A, road lane number K and the existence of a bus lane B, and the distribution form function of the constructed classification variables is as shown in formula (3):
Figure BDA0003227401970000024
where H denotes a categorical variable, p (H ═ C) denotes the probability that the categorical variable takes on value C, and piC(F)、πq(F) Expressing parameters of a polynomial Logit model, F expressing actual observed values of the classification variables, and U being a natural number;
the ordered variables comprise a road section average speed V, and the distribution form function of the road section average speed is constructed according to the following formulas (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
Figure BDA0003227401970000031
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) Indicating the segment threshold, ψ, to which the mean value R correspondsV(E) And E is a model parameter and an actual observed value of the ordered variable.
As a preferred embodiment of the present invention, the traffic safety analysis model is represented by the following formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2 ln(Y)+2T,BIC=ln(n)T-21n(Y) (7)
wherein N represents the total annual traffic accident quantity of the road section, L represents the length of the road section, Q represents the daily average traffic volume of the road section, V represents the average speed of the road section, S represents the traffic node density of the road section, A represents the road grade, W represents the road width, K represents the number of the roads, B represents the existence of the bus lane, theta and theta1、θ2、θ3、θ4、θ5、θ6、θ7、θ8The coefficient of the traffic safety analysis model is AIC, BIC and T, wherein AIC represents a Chichi information quantity criterion, BIC represents a Bayesian information criterion, Y represents a maximum likelihood value, T represents the number of influencing factors, and n is the number of observation samples.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
1. the invention provides a method for optimizing traffic accident data under a multi-source data structure, which respectively determines the distribution form of each traffic data type, constructs an accident data generation model based on the data distribution form, and verifies and evaluates the optimized data set based on a road safety analysis model, thereby greatly reducing the influence of an unbalanced traffic accident data structure on the safety analysis model and enabling the traffic safety evaluation result to be more accurate and reliable.
2. The invention constructs the likelihood functions which accord with respective distribution aiming at different variable data, thereby ensuring the effectiveness of data generation and ensuring the acquisition of accurate and reliable safety evaluation results.
Drawings
FIG. 1 is a flow chart of a method of optimizing traffic accident data in a multi-source data structure in accordance with the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, the method for optimizing traffic accident data under a multi-source data structure provided by the present invention includes the following steps:
step 1, collecting multi-source traffic data, and respectively obtaining the following multi-source traffic safety influence factors through field investigation and investigation of related department traffic departments: the method comprises the following steps of (1) carrying out traffic accident total number N on a road section year, length L of the road section, daily average traffic quantity Q of the road section, average speed V of the road section, traffic node density S of the road section, road grade A, road width W, road number K and whether a bus lane B exists or not;
step 2, constructing a generation model which accords with the form distribution of the multi-source data, and constructing a suitable distribution form function for each factor in the step 1 specifically as follows:
counting variable (total number of annual traffic accidents N), as in formula (1):
Figure BDA0003227401970000041
where p (N ═ G) represents the probability of occurrence of a G accident on a link, and λ represents the average number of occurrences of the accident per unit time or unit area.
The real-valued variables (the length L of the road section, the width W of the road width, the density S of the traffic nodes of the road section, and the average traffic volume Q of the daily traffic of the road section) are as follows:
Figure BDA0003227401970000042
j is a continuous natural number (2)
Wherein Z represents an actual value variable in the present invention, p (Z ═ J) represents a probability that the variable takes on the value J,
Figure BDA0003227401970000043
represents a normal distribution function, mu (I), sigma (I)2The mean and variance of the gaussian distribution, I, represent the actual observed values of the variables.
The classification variables (road grade A, presence or absence of bus lane B, road lane number K) are as follows:
Figure BDA0003227401970000044
where H denotes a categorical variable in the present invention, p (H ═ C) denotes a probability that the variable takes on the value C, and piC(S) represents parameters of a polynomial Logit model, F represents an actual observed value of each variable, and U is a natural number.
The order variable (road segment average speed V), as shown in equations (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
Figure BDA0003227401970000051
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) The segment threshold corresponding to the average value R is shown, for example, R is 0.8, and the segment where R is located is [0, 1 ]]The threshold is 1, psiV(E) For model parameters, E is the order variable realAnd (4) observing the values.
Step 3, balancing a traffic accident data structure, balancing the traffic accident data structure based on the proliferation processing of each variable in the step 2 and combining the original observation data to balance the traffic accident data structure, wherein the ratio of a recommended accident sample (N is not equal to 0) to a zero accident sample (N is equal to 0) is 1: 4;
and 4, optimizing the verification and evaluation of data, constructing a traffic safety analysis model for verifying the balanced traffic accident data structure, and evaluating the data optimization result according to the fitting indexes (AIC and BIC) of the model, wherein the data optimization results are shown as the following formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2 ln(Y)+2T,BIC=ln(n)T-2ln(Y) (7)
where Y represents the maximum likelihood value, T represents the number of parameters (9 in the present invention), and n is the number of observation samples.
The present invention will be described with reference to specific examples.
1) Multi-source traffic data acquisition: multi-source data is collected by accurate survey methods and relevant department surveys, assuming n1-n10For accident sample, n11-n100The sample is a zero accident sample, so the ratio of the accident sample to the zero accident sample in the original data is 1: 9, and the data structure has an unbalance phenomenon, as shown in table 1.
TABLE 1 statistical table for sample data collection
Figure BDA0003227401970000052
Figure BDA0003227401970000061
2) And (3) proliferating accident sample data: when the ratio of the accident sample to the zero accident sample is inquired to be 1:4 according to the literature, the effectiveness of the safety analysis model and the interpretability of the variable can be ensured, so that the accident sample is proliferated through the generation model which accords with the form distribution of each variable in the step 2 of the invention, and the ratio of the non-zero accident sample to the zero accident sample in the analysis data is 1: 4.
3) Constructing a safety analysis model: respectively constructing a safety analysis model according to the original traffic accident data and the traffic accident data after proliferation optimization, wherein the model comprises the following steps:
safety analysis model based on original traffic accident data (ratio 1: 9)
Ln(Noi)=θoio1Loio2Qoio3Voio4Soio5Aoio6Woio7Koio8Boi
AICo=-2 ln(Yo)+2To,BICo=ln(no)To-2ln(Yo)
Safety analysis model (proportion is 1: 4) based on traffic accident data after proliferation optimization
Ln(Nai)=θaia1Laia2Qaia3Vaia4Saia5Aaia6Waia7Kaia8Bai
AICa=-2 ln(Ya)+2Ta,BICa=ln(na)Ta-2ln(Ya)
4) The verification and evaluation of the optimized data, if the AIC is performed under the assumed data, because the case is performed under the assumed datao<AICa、BICo<BICaThe safety analysis model based on the traffic accident data after the proliferation optimization is superior to the safety analysis model based on the original traffic accident data in model fitting, and the safety analysis model based on the traffic accident data after the proliferation optimization is inferior to the safety analysis model based on the original traffic accident data in model fitting.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (3)

1. A method for optimizing traffic accident data under a multi-source data structure is characterized by comprising the following steps:
step 1, collecting multi-source traffic data, namely acquiring multi-source traffic safety influence factor data;
the multi-source traffic safety influence factors comprise: the method comprises the following steps of (1) carrying out traffic accident total number N on a road section year, length L of the road section, daily average traffic quantity Q of the road section, average speed V of the road section, traffic node density S of the road section, road grade A, road width W, road number K and whether a bus lane B exists or not;
step 2, constructing a generation model which accords with the form distribution of the multi-source traffic data, namely constructing a distribution form function for each influence factor acquired in the step 1; the specific process is as follows:
dividing the multi-source traffic safety influence factors into counting variables, real-value variables, classification variables and ordered variables;
the counting variable comprises the total number N of the annual traffic accidents of the road section, and the distribution form function of the total number of the annual traffic accidents of the road section is constructed according to the formula (1):
Figure FDA0003501374850000011
wherein p (N ═ G) represents the probability of occurrence of a G accident on a link, λ represents the average number of occurrences of the accident per unit time or unit area, and G is a natural number;
the real-value variables comprise a road section length L, a road section daily average traffic quantity Q, a road section traffic node density S and a road width W, and the distribution form function of the real-value variables is constructed according to the formula (2):
Figure FDA0003501374850000012
wherein Z represents a real variable, p (Z ═ J) represents the probability of the real variable taking the value J,
Figure FDA0003501374850000013
represents a normal distribution function, mu (I), sigma (I)2Respectively, the mean and variance of the Gaussian distribution, wherein I represents the actual observed value of the real-valued variable;
the classification variables comprise road grade A, road lane number K and the existence of a bus lane B, and the distribution form function of the constructed classification variables is as shown in formula (3):
Figure FDA0003501374850000014
where H denotes a categorical variable, p (H ═ C) denotes the probability that the categorical variable takes on value C, and piC(F)、πq(F) Expressing parameters of a polynomial Logit model, F expressing actual observed values of the classification variables, and U being a natural number;
the ordered variables comprise a road section average speed V, and the distribution form function of the road section average speed is constructed according to the following formulas (4) and (5):
p(V=R)=p(V≤R)-p(V≤R-1) (4)
Figure FDA0003501374850000021
wherein p (V ═ R) represents the probability of the average vehicle speed value R, p (V ≦ R) represents the probability of the average vehicle speed value R being less than or equal to R, p (V ≦ R-1) represents the probability of the average vehicle speed value R-1, R is a natural number, ω is a natural number, and R is a natural numberR(E) Indicating the segment threshold, ψ, to which the mean value R correspondsV(E) Is a model parameter, E is an ordered variable actual observed value;
and 3, carrying out proliferation optimization processing on the multi-source traffic data acquired in the step 1 based on the generation model constructed in the step 2, so that the ratio of the number of accident samples to the number of zero accident samples in the processed multi-source traffic data is 1: 4.
2. The method for optimizing traffic accident data under the multi-source data structure of claim 1, wherein the method for optimizing traffic accident data further comprises a step 4 of constructing a traffic safety analysis model and verifying the proliferation optimization result according to the fitting indexes of the model.
3. The method for optimizing traffic accident data under the multi-source data structure of claim 2, wherein the traffic safety analysis model is as shown in formulas (6) and (7):
Ln(N)=θ+θ1L+θ2Q+θ3V+θ4S+θ5A+θ6W+θ7K+θ8B (6)
AIC=-2ln(Y)+2T,BIC=ln(n)T-2ln(Y) (7)
wherein N represents the total annual traffic accident quantity of the road section, L represents the length of the road section, Q represents the daily average traffic volume of the road section, V represents the average speed of the road section, S represents the traffic node density of the road section, A represents the road grade, W represents the road width, K represents the number of the roads, B represents the existence of the bus lane, theta and theta1、θ2、θ3、θ4、θ5、θ6、θ7、θ8The coefficient of the traffic safety analysis model is AIC, BIC and T, wherein AIC represents a Chichi information quantity criterion, BIC represents a Bayesian information criterion, Y represents a maximum likelihood value, T represents the number of influencing factors, and n is the number of observation samples.
CN202110975201.3A 2021-08-24 2021-08-24 Method for optimizing traffic accident data under multi-source data structure Active CN113808392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110975201.3A CN113808392B (en) 2021-08-24 2021-08-24 Method for optimizing traffic accident data under multi-source data structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110975201.3A CN113808392B (en) 2021-08-24 2021-08-24 Method for optimizing traffic accident data under multi-source data structure

Publications (2)

Publication Number Publication Date
CN113808392A CN113808392A (en) 2021-12-17
CN113808392B true CN113808392B (en) 2022-04-01

Family

ID=78941653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110975201.3A Active CN113808392B (en) 2021-08-24 2021-08-24 Method for optimizing traffic accident data under multi-source data structure

Country Status (1)

Country Link
CN (1) CN113808392B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114639237B (en) * 2022-02-21 2023-02-14 东南大学 Method for analyzing influence effect after implementation of traffic safety management standard

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046957A (en) * 2015-07-02 2015-11-11 清华大学 Balanced sampling method for accident analysis and safety assessment
CN112487961A (en) * 2020-11-27 2021-03-12 鹏城实验室 Traffic accident detection method, storage medium and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6662141B2 (en) * 1995-01-13 2003-12-09 Alan R. Kaub Traffic safety prediction model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046957A (en) * 2015-07-02 2015-11-11 清华大学 Balanced sampling method for accident analysis and safety assessment
CN112487961A (en) * 2020-11-27 2021-03-12 鹏城实验室 Traffic accident detection method, storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
城市快速路交通事故特性分析及安全评价;裴玉龙等;《交通信息与安全》;20090420(第02期);87-90 *

Also Published As

Publication number Publication date
CN113808392A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN102831440B (en) Method and device for decision tree based wide-area remote sensing image classification
CN107610469A (en) A kind of day dimension regional traffic index forecasting method for considering multifactor impact
CN108665093B (en) Deep learning-based expressway traffic accident severity prediction method
CN107644057B (en) Absolute imbalance text classification method based on transfer learning
CN108388651A (en) A kind of file classification method based on the kernel of graph and convolutional neural networks
CN102879677A (en) Intelligent fault diagnosis method based on rough Bayesian network classifier
CN113808392B (en) Method for optimizing traffic accident data under multi-source data structure
CN110263666A (en) A kind of motion detection method based on asymmetric multithread
US20220084396A1 (en) Method for extracting road capacity based on traffic big data
CN106023592A (en) Traffic jam detection method based on GPS data
CN105809193A (en) Illegal operation vehicle recognition method based on Kmeans algorithm
CN111159243A (en) User type identification method, device, equipment and storage medium
CN110084534A (en) A kind of driving risks and assumptions quantization method based on driving behavior portrait
CN104252556B (en) A kind of river classification system
CN114299742B (en) Speed limit information dynamic identification and update recommendation method for expressway
CN106126637A (en) A kind of vehicles classification recognition methods and device
CN108710967B (en) Expressway traffic accident severity prediction method based on data fusion and support vector machine
CN111444286B (en) Long-distance traffic node relevance mining method based on trajectory data
CN111599170B (en) Traffic running state classification method based on time sequence traffic network diagram
CN112149922A (en) Method for predicting severity of accident in exit and entrance area of down-link of highway tunnel
CN105337842A (en) Method for filtering junk mail irrelevant to contents
CN114978931B (en) Network traffic prediction method and device based on manifold learning and storage medium
CN116051841A (en) Roadside ground object multistage clustering segmentation algorithm based on vehicle-mounted LiDAR point cloud
JP2023164240A (en) Method for designing vehicle speed/slope compound operation condition cycle test
CN114170796A (en) Algorithm improved congestion propagation analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant