CN110390816A - A kind of condition discrimination method based on multi-model fusion - Google Patents
A kind of condition discrimination method based on multi-model fusion Download PDFInfo
- Publication number
- CN110390816A CN110390816A CN201910650794.9A CN201910650794A CN110390816A CN 110390816 A CN110390816 A CN 110390816A CN 201910650794 A CN201910650794 A CN 201910650794A CN 110390816 A CN110390816 A CN 110390816A
- Authority
- CN
- China
- Prior art keywords
- data
- traffic flow
- feature
- centroids
- flow data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012850 discrimination method Methods 0.000 title claims abstract description 11
- 230000004927 fusion Effects 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 39
- 238000012545 processing Methods 0.000 claims description 22
- 239000013598 vector Substances 0.000 claims description 17
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 238000003066 decision tree Methods 0.000 claims description 9
- 238000003064 k means clustering Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 238000013439 planning Methods 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000004069 differentiation Effects 0.000 abstract 2
- 230000002596 correlated effect Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 14
- 238000012937 correction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013468 resource allocation Methods 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G1/00—Traffic control systems for road vehicles
- G08G1/01—Detecting movement of traffic to be counted or controlled
- G08G1/0104—Measuring and analyzing of parameters relative to traffic conditions
- G08G1/0125—Traffic data processing
- G08G1/0133—Traffic data processing for classifying traffic situation
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Probability & Statistics with Applications (AREA)
- Traffic Control Systems (AREA)
Abstract
The present invention relates to a kind of condition discrimination methods based on multi-model fusion, and the method includes the following contents: data prediction, carry out data prediction to the traffic flow data of acquisition;Feature selecting selects correlated characteristic subset to reduce data dimension by removing uncorrelated and redundancy feature;Multi-characters clusterl, by being divided to multidimensional characteristic analysis to traffic flow data;Real-time grading carries out the differentiation that classification carries out real-time traffic states to traffic flow data.The real-time status in the path in current network topology can be differentiated, for routine weight value determination and subsequent path planning application is provided fundamental basis and technology path;Accuracy and validity are improved compared with traditional single characteristic threshold value method of discrimination, meanwhile, feature selection approach can remove some extraneous features, promote the precision of differentiation.
Description
Technical Field
The invention relates to a traffic flow state discrimination method, in particular to a state discrimination method based on multi-model fusion.
Background
In recent years, the urban traffic demand is increasingly saturated due to the high-speed development of urban economy, the traffic jam phenomenon becomes the first common enemy in urban traffic transportation, and urban roads are in a queuing waiting state or even a congestion state, so that the enthusiasm and efficiency of people in traveling are seriously influenced. As an important component of the intelligent transportation system, the vehicle route guidance can efficiently provide services such as navigation positioning, geographic information and the like for travelers in real time, and guide the travelers to reach target places from original places. The selected path planning strategy directly determines the quality of the driving path provided by the path guidance to the travelers. According to the dynamic traffic demand, the path planning technology involved in the vehicle path guidance system provides an accurate path search result, and meanwhile, the result needs to be calculated in real time along with the dynamic change of traffic information, so that the failure of the obtained path planning result is prevented. The optimal path planning technology utilizes intelligent equipment such as a GPS (global positioning system), a sensor and the like to acquire the real-time running state of a road network, analyzes the accessibility of an original node and a target node in the road network, searches for an accessible path between the original node and the target node, sets a certain optimal rule such as lowest oil consumption, congestion avoidance and the like, selects different schemes according to the optimal rule, and presents a screening result to a user for selection.
And the vehicle path guidance shows the current optimal planning scheme for travelers by utilizing an optimal path finding technology according to the real-time requirements of users. Since the vehicle path guidance needs to provide an optimal routing route based on the current road network running state, great challenges are provided for the timeliness of routing algorithms. The traditional path planning algorithm based on the graphics method has too many nodes to traverse, and the amount of stored intermediate data is too large, so that the traditional path planning algorithm is difficult to be applied to a large-scale complex network topology structure.
From the perspective of economic development, the traffic jam caused by the over-saturation of urban traffic at present becomes an irreplaceable problem in the process of urban construction. The utilization rate of a road network is difficult to effectively improve, and a traffic resource allocation mechanism is disordered. The traffic flow data state judgment based on the real-time road network structure can effectively improve the utilization rate of the current road network, reduce traffic accidents, promote the scientificity and intellectualization of decision management and improve the resource allocation capability.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a state discrimination method based on multi-model fusion, which is used for discriminating the real-time state of a path in the current network topology and providing a theoretical basis and a technical route for path weight determination and subsequent path planning application.
The purpose of the invention is realized by the following technical scheme: a state discrimination method based on multi-model fusion comprises the following steps:
data preprocessing, namely preprocessing the acquired traffic flow data;
selecting characteristics, namely selecting a relevant characteristic subset to reduce data dimensionality by removing irrelevant and redundant characteristics;
multi-feature clustering, which is to divide traffic flow data by analyzing multi-dimensional features;
and (4) real-time classification, namely classifying the traffic flow data to judge the real-time traffic state.
The data preprocessing steps are as follows:
judging whether the acquired traffic flow data has abnormal data or not, and processing the abnormal data;
and carrying out data standardization processing on the data.
The specific steps of judging whether the acquired traffic flow data has abnormal data and processing the abnormal data are as follows:
judging whether the fluctuation range of the data value range is in a reasonable range or not;
if the data value range exceeds a reasonable range, indicating that the data has obvious errors, and processing the error data;
if the data value range fluctuates within a reasonable range, the data is normal.
The specific steps for carrying out data standardization processing on the data are as follows:
traversing all feature vectors of traffic flow data to obtain a maximum value;
traversing all feature vectors of traffic flow data to obtain a minimum value;
and carrying out normalization processing on the feature vectors.
The specific steps of selecting the relevant feature subset and reducing the data dimension by removing irrelevant and redundant features are as follows:
calculating the correlation between different feature vectors and known classes in the training set;
determining different weights of different characteristics according to different correlations;
and deleting the characteristic that the weight value is smaller than the threshold value.
The multi-feature clustering is characterized in that the traffic flow data are divided by multi-dimensional feature analysis, and the method comprises the following specific steps:
the first step is as follows: initially, let S be 1, K S centroids are calculated by using K-Means clustering algorithm on the initial m data.
The second step is that: the first step is repeated until m S-level centroids are obtained.
The third step: and calculating the m S-level centroids by using a K-Means clustering algorithm to obtain K S + 1-level centroids.
The fourth step: repeating the third step until m centroids of S +1 level are obtained, wherein S +1 is
The fifth step: repeatedly executing the steps, namely clustering by using a K-Means algorithm to obtain K S + 1-level centroids every time m S-level centroids are obtained; until the final k centroids are finally obtained.
The real-time classification for classifying the traffic flow data to judge the real-time traffic state comprises the following steps:
the first step is as follows: randomly selecting a replaced sample selection process in the sample set, and selecting m random samples in total;
the second step is that: for the feature set subjected to feature selection, randomly selecting n features in the feature set, and establishing a CART decision tree model;
the third step: repeating the first step and the second step k times to generate k CART decision trees, wherein each decision tree has an independent decision criterion;
the fourth step: and inputting the traffic flow data into each tree decision, and finally determining the category to which the features belong.
The invention has the following advantages: a state discrimination method based on multi-model fusion can discriminate the real-time state of a path in the current network topology, and provides a theoretical basis and a technical route for path weight determination and subsequent path planning application; compared with the traditional single-feature threshold value discrimination method, the accuracy and the effectiveness are improved, meanwhile, the feature selection method can remove some irrelevant features, and the discrimination precision is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of error data determination;
FIG. 3 is a flow chart of a live forest algorithm
FIG. 4 is a characteristic weight diagram of traffic flow data in an embodiment;
FIG. 5 is a graph comparing the accuracy of different models in the examples.
Detailed Description
The invention will be further described with reference to the accompanying drawings, but the scope of the invention is not limited to the following.
As shown in fig. 1, a method for discriminating states based on multi-model fusion includes the following steps:
data preprocessing, namely preprocessing the acquired traffic flow data;
further, the original data cannot be completely correct during the processes of acquisition, transmission and storage, and many incomplete places such as inconsistent data types, missing data, redundant data and the like are necessarily present. If the original data is directly used without being processed, the low-quality data is released to flow into the algorithm model, and huge damage is caused to the learning process of the algorithm model. On the contrary, the quality and reliability of the algorithm model decision can be obviously improved by properly preprocessing the data.
Selecting characteristics, namely selecting a relevant characteristic subset to reduce data dimensionality by removing irrelevant and redundant characteristics;
further, with the exponential rise of data scale and data complexity, an ultra-large algorithm structure is often required to be established for solving the problem, and the algorithm complexity and response time increase suddenly. This is naturally unacceptable for the data itself, especially streaming data, which is an infinitely arriving stream over time. Therefore, in the process of model construction, it is crucial to select effective data features (variables) for solution.
Multi-feature clustering, which is to divide traffic flow data by analyzing multi-dimensional features;
furthermore, different features describe different aspects of the problem to be solved, the surfaces are seemingly independent, the real situation has deep-level relation, and multi-feature clustering is to analyze multi-dimensional features and divide the examples into a plurality of sub-examples with obvious differences by utilizing the similarity principle. In short, the clustering analysis is to place classified objects in a multidimensional space, recognize the classified objects according to differences among the objects, divide the objects with the same attribute into the same class, and divide the objects with different attributes into different classes, so as to realize high cohesion and low coupling among the classes, that is, the objects classified into the same class have extremely high similarity, and the objects classified into the different classes have extremely high differences.
And (4) real-time classification, namely classifying the traffic flow data to judge the real-time traffic state.
Furthermore, due to the particularity of the stream data, the real-time performance of the model algorithm is extremely high, so that it is important to establish a real-time classifier to classify the stream data continuously generated continuously, the real-time classifier needs to be capable of quickly responding to the data flowing into the model, the classification process of the stream data can be completed within a limited time, and the phenomenon of large-scale data queuing and blocking caused by overlarge calculation complexity is avoided.
The data preprocessing steps are as follows:
judging whether the acquired traffic flow data has abnormal data or not, and processing the abnormal data;
and carrying out data standardization processing on the data.
As shown in fig. 2, the specific steps of determining whether there is abnormal data in the collected traffic flow data and processing the abnormal data are as follows:
judging whether the fluctuation range of the data value range is in a reasonable range or not;
further, the fluctuation of the data value range is between 50% and 150%, and the data value range fluctuates within a reasonable range.
If the data value range exceeds a reasonable range, indicating that the data has obvious errors, and processing the error data;
if the data value range fluctuates within a reasonable range, the data is normal.
Further, the error data mainly includes two types of data errors and data misses; the data error represents an unexpected result of the data caused by data format error in the process of acquiring and storing the data, such as negative number in traffic flow data; the data missing represents that the device is interrupted in the process of data acquisition, so that some data are obviously missed, such as traffic density data which is not acquired.
Further, when the data has obvious errors, error or exception processing is needed; when only a few errors occur, the errors can be ignored compared with the correct data, and the error data can be directly deleted. If the error data is less than 5% compared with the correct data, the error data can be directly deleted. If the error data is more than 5% compared with the correct data, the error data needs to be corrected, and the invention uses the adjacent data or the algebraic mean in a period of time to fill.
The specific steps for carrying out data standardization processing on the data are as follows:
traversing all feature vectors of traffic flow data to obtain a maximum value Max;
traversing all feature vectors of traffic flow data to obtain a minimum Min;
and carrying out normalization processing on the feature vectors.
Further, the normalized calculation formula is as follows:
in the formula x0-1For the normalized feature vector, x is the feature vector, Min is the minimum value of the feature vector, and Max is the maximum value of the feature vector.
The specific steps of selecting the relevant feature subset and reducing the data dimension by removing irrelevant and redundant features are as follows:
calculating the correlation between different feature vectors and known classes in the training set;
determining different weights of different characteristics according to different correlations;
and deleting the characteristic that the weight value is smaller than the threshold value.
Specifically, a sample S is randomly selected from a training set T, and then k adjacent samples H of the S are found from a sample set which is similar to the SkFinding out k adjacent samples M from each sample set different from SkThe weight of each feature is updated according to the following formula.
W(A)=W(A)-similarityH(A)+differenceM(A)
Wherein,
Mj(c) represents the jth nearest sample in class C, diff (A, S, R) represents that sample S and sample R are inThe difference in characteristic a is calculated as follows:
it can be found that the second of the above equations is essentially calculating a certain characteristic of the sample S to the nearest sample H of the same kindkThe sum of the distances of (a); the third formula is to calculate a certain feature of the sample S to the nearest sample M of different classeskThe sum of the distances of (a). According to the updated formula of the first formula, when a certain feature of the sample S reaches the nearest sample H of the same classkIs greater than the feature to the nearest sample M of the different classeskThe weight of the feature is boosted when the sum of the distances of (A) and (B) is equal, i.e. the feature is positive in classifying the same type of sample and the non-same type of sample, and conversely, when a certain feature of the sample S is equal to the nearest sample H of the same typekIs less than the feature to the nearest sample M of the different classeskThe sum of the distances of (1) is then the weight is reduced, i.e. the feature is a negative effect in classifying homogeneous samples and non-homogeneous samples. Of course, the selection of the sample S may have a certain randomness, and therefore, the selection may be repeated n times, the average weight of each feature is taken as the final weight of the feature, if the weight of a certain feature is greater than 0.5, it is proved that the correlation between the feature and the problem to be solved is high, otherwise, it is proved that the correlation between the feature and the problem to be solved is low, and particularly, if the weight of a certain feature is less than a threshold, it is illustrated that there is almost no relationship between the feature and the problem to be solved, and the feature may be directly removed from the multidimensional feature vector group, thereby achieving the purpose of feature selection.
The multi-feature clustering is characterized in that the traffic flow data are divided by multi-dimensional feature analysis, and the method comprises the following specific steps:
the first step is as follows: initially, let S be 1, K S centroids are calculated by using K-Means clustering algorithm on the initial m data.
The second step is that: the first step is repeated until m S-level centroids are obtained.
The third step: and calculating the m S-level centroids by using a K-Means clustering algorithm to obtain K S + 1-level centroids.
The fourth step: repeating the third step until m centroids of S +1 level are obtained, wherein S +1 is
The fifth step: repeatedly executing the steps, namely clustering by using a K-Means algorithm to obtain K S + 1-level centroids every time m S-level centroids are obtained; until the final k centroids are finally obtained.
Furthermore, the invention carries out multi-feature clustering analysis based on the STREAM algorithm, and the STREAM algorithm is based on the K-Means algorithm and introduces a sliding window mechanism to solve the problem in STREAM data clustering. The bottom framework of the STREAM algorithm is still the K-Means clustering algorithm, and the K-Means clustering algorithm is briefly analyzed below.
On the basis of the K-Means algorithm, the STREAM algorithm is used for realizing the clustering process of the flow data characteristics. The bottom layer structure algorithm of the STREAM algorithm is a K-Means algorithm, and a batch processing mechanism is added in the upper layer structure to solve the problem of concept drift in STREAM data.
The K-Means algorithm divides different classes according to the distribution similarity of data points in the multi-dimensional feature space. Specifically, k objects are randomly acquired from the dataset and are considered as initial centroids of k clusters; and distributing the other objects to the nearest cluster according to the Euclidean distance between the other objects and the centroid of each cluster, recalculating the centroid of each cluster, and repeating the process iteratively until the distortion function is converged to obtain k fixed and invariable centroids. Specifically, the algorithm flow is as follows:
1. randomly acquiring k objects from the dataset as initial centroids μ of k clusters1,μ2...μk;
2. Calculating Euclidean distances between each object and each cluster center point, and dividing the corresponding objects again according to the minimum distance, wherein the dividing standard is shown in a formula;
C(i)=argmin||x(i)-μj||2
wherein, C(i)For the category to which the ith data object belongs, x(i)For the ith data object, μjIs the jth cluster center.
3. Updating the centroids μ of the k clusters according to the following formula1,μ2...μk,
4. Repeating the steps 2-3 until the distortion function of the following formula is converged to obtain k unchanged centroids;
wherein J (c, μ) is a distortion function, μCThe center after the clustering is completed.
As shown in fig. 3, the real-time classification is based on a decision tree theory, a random forest model is established, and the classification obtained by the multi-feature clustering module is used as a training set to classify the real-time traffic flow and judge the real-time traffic state.
The real-time classification for classifying the traffic flow data to judge the real-time traffic state comprises the following steps:
the first step is as follows: randomly selecting a replaced sample selection process in the sample set, and selecting m random samples in total;
the second step is that: for the feature set subjected to feature selection, randomly selecting n features in the feature set, and establishing a CART decision tree model;
the third step: repeating the first step and the second step k times to generate k CART decision trees, wherein each decision tree has an independent decision criterion;
the fourth step: and inputting the traffic flow data into each tree decision, and finally determining the category to which the features belong.
Taking the traffic flow data of the expressway in Sichuan province as an example, for convenience of discussion, the original data is simply processed in advance, and 287 pieces of traffic flow data acquired by the expressway in Sichuan province are shown in table 1, wherein the acquisition period is 5 min. Volume in the table is a flow field; speed is a vehicle Speed field; density is a traffic Density field; occupancy is an Occupancy field; queue is a queuing time length field.
TABLE 1 traffic flow data for a certain highway in Sichuan province
It can be observed that the traffic flow data in the table above are not completely correct, wherein significant errors remain. Normally, the range of occupancy should be between 0 and 1. When the road is completely unblocked, the vehicle does not need to stay on the road, the occupancy is minimum and is 0 at the moment, when the road is seriously congested, the vehicle needs to stay on the road to wait, and the occupancy reaches the peak value and is 1 at the moment. However, when the occupancy of a plurality of data in the table exceeds 1, the data is classified as erroneous data. In addition, in some data, when vehicles exist on the road, the traffic density is reduced to 0, which is obviously not reasonable. Thus, correction of the error data is required.
A traffic parameter data model is established based on a traffic flow theory. The correction process for the erroneous data is accomplished using the following formula.
After error data correction, normalization processing needs to be performed on the features. The processed data are shown in Table 2, with a total of 274 correct data.
TABLE 2 traffic flow data after preprocessing
It can be found that through the data normalization processing, the value ranges of all traffic flow characteristics are distributed between 0 and 1, and the errors of the models caused by different expression forms among the characteristics are eliminated.
After the traffic flow data is preprocessed, the traffic flow characteristics are analyzed, the traffic characteristics beneficial to solving the traffic state are selected, and the characteristics which do not help or are redundant to solving the traffic state are removed, so that the precision and the reliability of a subsequent model are improved. Before feature selection, the invention contacts experts in the traffic field to manually judge the traffic states corresponding to a small part of traffic flow data, and the invention also uses the part of data as reference to select traffic features.
As can be seen from table 1, the characteristics included in the traffic flow data are mainly traffic Volume (Volume), travel Speed (Speed), traffic Density (Density), occupancy (occupancy), and Queue length (Queue).
As shown in fig. 4, in the feature selection process using the method provided by the present invention, considering that the algorithm may select a random sample S during the operation process, which may cause a certain difference in the result weight, the present invention adopts a method of averaging in multiple experiments, and performs 30 experiments in total, and summarizes the operation results each time to obtain the average value of each weight. In the figure, Q represents the flow rate, V represents the travel speed, P represents the traffic density, O represents the occupancy, and L represents the queuing time.
The average weight of each feature is shown in table 3;
TABLE 3 mean weight of features
In the table, Q represents flow, V represents travel speed, P represents traffic density, O represents occupancy, and L represents queuing time.
According to the feature selection algorithm, the weight values of the 3 features of the flow, the travel speed and the occupancy are all larger than 10%, and the weight values of the features of the traffic density and the queuing time are all smaller than 5%, so that the traffic state can be mapped by the 3 traffic flow features of the flow, the travel speed and the occupancy on the expressway. The traffic density and the queuing length have low correlation with the traffic state, and the two low correlation characteristics should be removed.
According to the traffic flow theory, the invention defines 4 traffic state grades which are respectively smooth, slow running, congestion and serious congestion.
The road traffic status results output by the real-time classifier are shown in table 4.
TABLE 4 traffic status discrimination
Test sets were randomly assigned to 1: and 4, dividing according to the proportion, and establishing a real-time classification model to judge the road traffic state. The invention uses the accuracy index to evaluate the algorithm.
Accuracy=(TP+TN)/(P+N)
Where TP is correctly divided into the number of positive cases, i.e., the number of instances (number of samples) that are actually positive cases and are divided into positive cases by the classifier, and TP is incorrectly divided into the number of positive cases, i.e., the number of instances that are actually negative cases but are divided into positive cases by the classifier. P + N is the total number of samples.
As shown in fig. 5, a real-time classification model is established by respectively using the multi-model fusion algorithm, the traditional clustering algorithm and the single-feature threshold discrimination algorithm provided by the invention, the real-time traffic state of a certain expressway in sichuan province is discriminated, the accuracy of the model is calculated, and the quality of the model is evaluated according to the accuracy.
According to experimental results, the precision of the state discrimination algorithm provided by the invention can reach about 94%, the accuracy and the effectiveness are improved compared with the traditional single-feature threshold discrimination method, and meanwhile, the feature selection method provided by the invention can remove some irrelevant features and improve the discrimination precision.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that this invention is not limited to the disclosed forms, but is intended to cover other embodiments, as may be used in various other combinations, modifications, and environments and is capable of changes within the scope of the invention as set forth, either as indicated by the above teachings or as may be learned by the practice of the invention. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A state discrimination method based on multi-model fusion is characterized in that: the method comprises the following steps:
data preprocessing, namely preprocessing the acquired traffic flow data;
selecting characteristics, namely selecting a relevant characteristic subset to reduce data dimensionality by removing irrelevant and redundant characteristics;
multi-feature clustering, which is to divide traffic flow data by analyzing multi-dimensional features;
and (4) real-time classification, namely classifying the traffic flow data to judge the real-time traffic state.
2. The method according to claim 1, wherein the method comprises: the data preprocessing steps are as follows:
judging whether the acquired traffic flow data has abnormal data or not, and processing the abnormal data;
and carrying out data standardization processing on the data.
3. The method according to claim 2, wherein the method comprises: the specific steps of judging whether the acquired traffic flow data has abnormal data and processing the abnormal data are as follows:
judging whether the fluctuation range of the data value range is in a reasonable range or not;
if the data value range exceeds a reasonable range, indicating that the data has obvious errors, and processing the error data;
if the data value range fluctuates within a reasonable range, the data is normal.
4. The method according to claim 2, wherein the method comprises: the specific steps for carrying out data standardization processing on the data are as follows:
traversing all feature vectors of traffic flow data to obtain a maximum value;
traversing all feature vectors of traffic flow data to obtain a minimum value;
and carrying out normalization processing on the feature vectors.
5. The method according to claim 1, wherein the method comprises: the specific steps of selecting the relevant feature subset and reducing the data dimension by removing irrelevant and redundant features are as follows:
calculating the correlation between different feature vectors and known classes in the training set;
determining different weights of different characteristics according to different correlations;
and deleting the characteristic that the weight value is smaller than the threshold value.
6. The method according to claim 1, wherein the method comprises: the multi-feature clustering is characterized in that the traffic flow data are divided by multi-dimensional feature analysis, and the method comprises the following specific steps:
the first step is as follows: initially, let S = 1, K S centroids are calculated for the initial m data using the K-Means clustering algorithm.
7. The second step is that: the first step is repeated until m S-level centroids are obtained.
8. The third step: and calculating the m S-level centroids by using a K-Means clustering algorithm to obtain K S + 1-level centroids.
9. The fourth step: repeating the third step until m centroids of S +1 level are obtained, S = S +1
The fifth step: repeatedly executing the steps, namely clustering by using a K-Means algorithm to obtain K S + 1-level centroids every time m S-level centroids are obtained; until the final k centroids are finally obtained.
10. The method according to claim 1, wherein the method comprises: the real-time classification for classifying the traffic flow data to judge the real-time traffic state comprises the following steps:
the first step is as follows: randomly selecting the samples with the replacement in the sample setmA random sample;
the second step is that: for feature-selected feature sets, the features are randomly selected from the feature setsnEstablishing a CART decision tree model;
the third step: repeating the first and second stepskThen, generatekEach CART decision tree has an independent decision criterion;
the fourth step: and inputting the traffic flow data into each tree decision, and finally determining the category to which the features belong.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650794.9A CN110390816A (en) | 2019-07-18 | 2019-07-18 | A kind of condition discrimination method based on multi-model fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910650794.9A CN110390816A (en) | 2019-07-18 | 2019-07-18 | A kind of condition discrimination method based on multi-model fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390816A true CN110390816A (en) | 2019-10-29 |
Family
ID=68285143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910650794.9A Pending CN110390816A (en) | 2019-07-18 | 2019-07-18 | A kind of condition discrimination method based on multi-model fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390816A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177346A (en) * | 2019-12-19 | 2020-05-19 | 爱驰汽车有限公司 | Man-machine interaction method and device, electronic equipment and storage medium |
CN111192456A (en) * | 2020-01-14 | 2020-05-22 | 泉州市益典信息科技有限公司 | Road traffic operation situation multi-time scale prediction method |
CN111230872A (en) * | 2020-01-31 | 2020-06-05 | 武汉大学 | Object delivery intention recognition system and method based on multiple sensors |
CN111599170A (en) * | 2020-04-13 | 2020-08-28 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN113029227A (en) * | 2021-02-02 | 2021-06-25 | 中船第九设计研究院工程有限公司 | State monitoring system for moving hydraulic trolley |
CN113971216A (en) * | 2021-10-22 | 2022-01-25 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and memory |
CN114897074A (en) * | 2022-05-13 | 2022-08-12 | 北京纪新泰富机电技术股份有限公司 | Method and device for determining running state of equipment, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592453A (en) * | 2012-02-27 | 2012-07-18 | 东南大学 | Real-time traffic condition judging method based on time window |
CN102609612A (en) * | 2011-12-31 | 2012-07-25 | 电子科技大学 | Data fusion method for calibration of multi-parameter instruments |
CN108492557A (en) * | 2018-03-23 | 2018-09-04 | 四川高路交通信息工程有限公司 | Highway jam level judgment method based on multi-model fusion |
US20190069808A1 (en) * | 2016-05-10 | 2019-03-07 | David Andrew Clifton | Method of determining the frequency of a periodic physiological process of a subject, and a device and system for determining the frequency of a periodic physiological process of a subject |
-
2019
- 2019-07-18 CN CN201910650794.9A patent/CN110390816A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609612A (en) * | 2011-12-31 | 2012-07-25 | 电子科技大学 | Data fusion method for calibration of multi-parameter instruments |
CN102592453A (en) * | 2012-02-27 | 2012-07-18 | 东南大学 | Real-time traffic condition judging method based on time window |
US20190069808A1 (en) * | 2016-05-10 | 2019-03-07 | David Andrew Clifton | Method of determining the frequency of a periodic physiological process of a subject, and a device and system for determining the frequency of a periodic physiological process of a subject |
CN108492557A (en) * | 2018-03-23 | 2018-09-04 | 四川高路交通信息工程有限公司 | Highway jam level judgment method based on multi-model fusion |
Non-Patent Citations (4)
Title |
---|
冯勇: "基于云模型的城市快速路交通状态识别方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
张钰等: "基于分类与回归算法(CART)的城市道路交通状态阈值划分研究", 《黑龙江交通科技》 * |
张静萱: "基于特征选择的城市快速路实时交通事故风险预测", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
李晓璐: "基于多源信息处理技术的交通状态判别研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177346A (en) * | 2019-12-19 | 2020-05-19 | 爱驰汽车有限公司 | Man-machine interaction method and device, electronic equipment and storage medium |
CN111192456A (en) * | 2020-01-14 | 2020-05-22 | 泉州市益典信息科技有限公司 | Road traffic operation situation multi-time scale prediction method |
CN111230872A (en) * | 2020-01-31 | 2020-06-05 | 武汉大学 | Object delivery intention recognition system and method based on multiple sensors |
CN111230872B (en) * | 2020-01-31 | 2021-07-20 | 武汉大学 | Object delivery intention recognition system and method based on multiple sensors |
CN111599170A (en) * | 2020-04-13 | 2020-08-28 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN111599170B (en) * | 2020-04-13 | 2021-12-17 | 浙江工业大学 | Traffic running state classification method based on time sequence traffic network diagram |
CN113029227A (en) * | 2021-02-02 | 2021-06-25 | 中船第九设计研究院工程有限公司 | State monitoring system for moving hydraulic trolley |
CN113971216A (en) * | 2021-10-22 | 2022-01-25 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and memory |
CN113971216B (en) * | 2021-10-22 | 2023-02-03 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and memory |
CN114897074A (en) * | 2022-05-13 | 2022-08-12 | 北京纪新泰富机电技术股份有限公司 | Method and device for determining running state of equipment, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390816A (en) | A kind of condition discrimination method based on multi-model fusion | |
CN109191896B (en) | Personalized parking space recommendation method and system | |
CN110516702B (en) | Discrete path planning method based on streaming data | |
WO2021189729A1 (en) | Information analysis method, apparatus and device for complex relationship network, and storage medium | |
Wu et al. | A fast trajectory outlier detection approach via driving behavior modeling | |
CN109167805B (en) | Analysis processing method based on vehicle networking space-time data in urban scene | |
CN109733390A (en) | A kind of adaptive lane-change method for early warning based on driver's characteristic | |
CN111159243B (en) | User type identification method, device, equipment and storage medium | |
CN111104398B (en) | Detection method and elimination method for intelligent ship approximate repeated record | |
CN114299742B (en) | Speed limit information dynamic identification and update recommendation method for expressway | |
CN105306296A (en) | Data filter processing method based on LTE (Long Term Evolution) signaling | |
CN113344128A (en) | Micro-cluster-based industrial Internet of things adaptive stream clustering method and device | |
CN112560915A (en) | Urban expressway traffic state identification method based on machine learning | |
CN107644533B (en) | Method for monitoring traffic flow of virtual section of expressway based on mobile network data | |
CN117809458A (en) | Real-time assessment method and system for traffic accident risk | |
CN113468538A (en) | Vulnerability attack database construction method based on similarity measurement | |
CN111341096B (en) | Bus running state evaluation method based on GPS data | |
Krishna et al. | A Computational Data Science Based Detection of Road Traffic Anomalies | |
CN108280548A (en) | Intelligent processing method based on network transmission | |
CN116401586A (en) | Intelligent sensing and accurate classifying method for full scene service | |
CN114022705B (en) | Self-adaptive target detection method based on scene complexity pre-classification | |
CN113379334B (en) | Road section bicycle riding quality identification method based on noisy track data | |
Parathasarathy et al. | Using hybrid data mining algorithm for analysing road accidents data set | |
CN112200052B (en) | Track deviation recognition and vehicle running analysis method, device, equipment and medium | |
CN115221955A (en) | Multi-depth neural network parameter fusion system and method based on sample difference analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |