CN110162744B - Tensor-based internet of vehicles data loss multiple estimation method - Google Patents

Tensor-based internet of vehicles data loss multiple estimation method Download PDF

Info

Publication number
CN110162744B
CN110162744B CN201910421687.9A CN201910421687A CN110162744B CN 110162744 B CN110162744 B CN 110162744B CN 201910421687 A CN201910421687 A CN 201910421687A CN 110162744 B CN110162744 B CN 110162744B
Authority
CN
China
Prior art keywords
data
tensor
algorithm
interpolation
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910421687.9A
Other languages
Chinese (zh)
Other versions
CN110162744A (en
Inventor
张德干
张婷
吴昊
高瑾馨
颜浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN201910421687.9A priority Critical patent/CN110162744B/en
Publication of CN110162744A publication Critical patent/CN110162744A/en
Application granted granted Critical
Publication of CN110162744B publication Critical patent/CN110162744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a tensor-based vehicle network data loss multiple estimation new method aiming at the problem of vehicle networking data loss, integrates Bayesian Tensor Decomposition (IBTD), and belongs to the field of vehicle networking. In the data model construction stage, the algorithm randomly extracts missing data to generate a data subset by using a random sampling principle, and performs interpolation by using an optimized Bayesian tensor decomposition algorithm. And (3) introducing an integration idea, analyzing and sequencing a plurality of interpolated error results, considering space-time complexity, and obtaining an optimal result by means of preferential averaging. The performance of the proposed model was evaluated by Mean Absolute Percent Error (MAPE) and Root Mean Square Error (RMSE). Experimental results show that the new method can effectively interpolate traffic data sets with different loss amounts, and can obtain good interpolation results.

Description

Tensor-based internet of vehicles data missing multiple estimation method
Technical Field
The invention belongs to the field of internet of vehicles, and particularly relates to a novel tensor-based internet of vehicles data missing estimation method.
Background
The Internet of vehicles is to construct an intelligent transportation network. With the rapid development of modern sensing technology, communication technology, computer technology and information technology, an intelligent transportation System (ITS for short) is gradually popularized, a traffic information acquisition System is an important component of the ITS, urban road traffic conditions and change rules can be grasped by acquiring comprehensive, rich and real-time traffic information, and scientific bases are provided for urban traffic planning and decision-making.
The data required by the internet of vehicles in the application should have high spatial and temporal resolution to achieve the purposes of modeling, traffic management, prediction, route guidance and the like, however, a large amount of missing data and low-quality data often appear in reality. Missing data generally has a very wide influence, and if incomplete missing data is collected in the database, the difference between the actual acquired data amount and the pre-estimated data amount is caused, and the accuracy of final calculation is also reduced; some data are incomplete and have missing data, but the system considers the data as complete data and forms data processing errors; some algorithms or systems perform operation calculations based on ideal non-missing datasets, which may cause the calculation process to stop if the dataset is found to be incomplete.
For processing missing data, scholars at home and abroad usually choose to adopt two most typical methods: one is to directly delete the missing part of data in the data set in whole section, and only use the existing collected complete data in whole section for traffic flow prediction application; another approach is to use an algorithm to complement incomplete data into complete data. The two methods have advantages and disadvantages, the first method is undoubtedly the most direct and effective method, but all data information cannot be fully utilized, and particularly when the time-space nodes of deleted data can represent important data information, the accuracy of application of traffic flow information prediction and the like can be greatly reduced after the data information is deleted. In contrast, the second method gradually gains wide attention and research in this field, vector, matrix and tensor-based data restoration methods are proposed in succession, and scholars optimize and compare the proposed methods in various aspects from various angles. When serious data loss exists, the method can stably show the applicability better than the first method, but also has some defect problems, such as data errors generated in repair can reduce the overall performance. In order to solve the problems, a novel tensor-based vehicle networking data loss estimation method, namely Integrated Bayesian Tensor Decomposition (IBTD), is provided.
Disclosure of Invention
Compared with the traditional missing data interpolation method, the IBTD algorithm combines a tensor model capable of better representing the time-space relevance of data, generates data subsets by random extraction and integrates two advantages, and creates innovation in a Bayesian tensor algorithm of a repairing algorithm. The traditional Bayes tensor algorithm only sets one hyper-parameter and only one conjugate prior, the algorithm sets two hyper-parameters and places the two conjugate prior, and the model is rapidly converged by continuously updating the parameters. The novel tensor-based internet of vehicles data missing estimation method, namely Integrated Bayesian Tensor Decomposition (IBTD), can effectively repair missing traffic data under a tensor model based on high temporal and spatial correlation, and shows better interpolation performance compared with the traditional method.
A tensor-based vehicle networking data loss multiple estimation new method is characterized by mainly comprising the following steps of:
1, constructing a model, including a tensor model basic idea, a Bayesian tensor decomposition basic principle, a new sampling strategy and a preferred ordering mechanism;
2, carrying out tensor-based vehicle networking data loss estimation algorithm, wherein the algorithm comprises algorithm design and complexity theoretical analysis;
and 3, carrying out experimental test and comparative analysis.
The tensor-based internet of vehicles data missing estimation algorithm comprises the following steps:
2.1, generating a third-order tensor data model for evaluating the performance of the algorithm according to the traffic data in a road section day flow form;
2.2, obtaining an incomplete random tensor data set which is different from the original missing tensor data by using the generated missing tensor data through a random sampling algorithm, and calling a new sampling strategy algorithm;
2.3, interpolating the generated incomplete random tensor data set through a Bayes tensor decomposition algorithm, and calling the Bayes tensor decomposition algorithm;
and 2.4, performing bubble sorting and optimization on the error parameters of all interpolation results, performing arithmetic mean processing on the optimized interpolation data to obtain repair data closer to the original data, and calling a bubble sorting mechanism algorithm.
Further, experimental testing and comparative analysis includes:
3.1, expressing the acquired data tensor;
and 3.2, repairing according to the data missing condition, and analyzing to obtain the advantages of the new algorithm compared with the old algorithm.
The invention has the advantages and positive effects that:
compared with the traditional missing data interpolation method, the proposed IBTD algorithm combines a tensor model capable of better expressing the space-time relevance of data, generates a data subset by random extraction and integrates two advantages, and also makes innovation in the Bayesian tensor algorithm of the basis restoration algorithm. The traditional Bayes tensor algorithm only sets one hyper-parameter and only one conjugate prior, the algorithm sets two hyper-parameters and places the two conjugate prior, and the model is rapidly converged by continuously updating the parameters. The novel tensor-based vehicle networking data loss estimation method, namely the Integrated Bayesian Tensor Decomposition (IBTD), can effectively repair lost traffic data under a tensor model based on high temporal and spatial correlation, and has better interpolation performance compared with the traditional method.
Drawings
FIG. 1 is a 3 rd order tensor model of traffic data;
figure 2 is a third order tensor cp decomposition model;
FIG. 3 is a Bayesian tensor resolution probability map model;
FIG. 4 is a schematic diagram of Bagging random sampling;
FIG. 5 is a graph of integration times versus elapsed time and RMSE error;
FIG. 6 is a preferred number versus RMSE;
FIG. 7 is a local road network graph;
FIG. 8 is a diagram of a local area network structure;
FIG. 9 is a 9-month 22-day traffic trend graph for different road segments;
FIG. 10 is a graph of traffic trends for road segment 1 on different dates;
FIG. 11 is a data miss type;
FIG. 12 is a data recovery root mean square error under random miss conditions;
FIG. 13 is a graph of mean absolute percent error of data repair under random miss conditions;
FIG. 14 is the data repair root mean square error under structural deletion conditions.
FIG. 15 is the mean absolute percent error of data repair under structural deficiency conditions.
FIG. 16 is a comparison of repaired data to missing data.
Fig. 17 is a comparison of the repaired data with the actual data.
Detailed Description
Step one, model construction:
the basic idea of the tensor model is:
fig. 2 shows a model of the third-order tensor CP decomposition, and the main idea of CP decomposition is: a higher order tensor can be considered to be composed of a plurality of one-dimensional factor matrices, and then the decomposed factor matrices can be used for calculation.
Establishing a k-order tensor
Figure BDA0002066196410000051
Wherein n is l Represents the dimension along the l-th direction (l ∈ {1, 2.., k }). For the constructed tensor T, the index of the element is passed
Figure BDA0002066196410000059
And (4) showing. From the basic idea of CP decomposition, the constructed tensor approximation can be represented by a low rank structure, as follows:
Figure BDA0002066196410000053
wherein
Figure BDA0002066196410000054
Is the ith factorization matrix
Figure BDA0002066196410000055
Is given by the symbol omicron, r is the CP rank of the tensor T. Equation (1) is again equivalent to each element if analyzed from the perspective of each element
Figure BDA0002066196410000056
Wherein
Figure BDA0002066196410000057
Is the mth factor matrix a m (i) in m J) (line i) m Column j) values.
The basic principle of bayesian tensor decomposition:
fig. 3 shows a bayesian tensor decomposition probability map model, and in consideration of characteristics of traffic data, although the accuracy of a higher-order tensor is increased to a certain extent, compared with the increased complexity, the feasibility is not high, so that the third-order tensor is mainly taken as an example.
Here we denote the index set of those observed elements by Ω and then introduce a full bayesian model for the data generation process.
First, we assume that the noise term for each observed element (i ∈ Ω) follows an independent Gaussian distribution
Figure BDA0002066196410000058
Where N (-) denotes a multivariate Gaussian distribution and τ is the precision, which is a common parameter for all elements.
In order to estimate the factor matrix through bayesian inference, a conjugate prior needs to be further set, and the conjugate prior is also in multivariate gaussian distribution. Since both parameters of the gaussian distribution are unknown, in order to properly model the tensor data, the factor matrix group a needs to be included k And an agile a priori distribution is placed on the precision tau. For the factor matrix, the prior distribution of the row vectors is assumed to be multiple gaussians, and the specific expression is as formula (4):
Figure BDA0002066196410000061
in order to enhance the robustness of the model, the method is set up in a way of being matched with the traditional Bayesian methodInstead, we place two conjugate priors hyperparameters
Figure BDA0002066196410000062
And
Figure BDA0002066196410000063
η l sum-sigma l (l =1, 2...., k) obeys a Gaussian-Wishart distribution:
l ,Σ l )~GW(η 00 ,W 0 ,v 0 ) (5)
η l sum sigma l The lead-test distribution of (l =1,2.. K) is as shown in equation (6):
p(η ll00 ,W 0 ,v 0 )=N(η l0 ,(β 0 Σ l ) -1 )×W(Λ l |W 0 ,v 0 ) (6)
in the distribution, W (-) is Wishart distribution with degree of freedom v 0 ,w 0 Is a scaling matrix of r × r:
Figure BDA0002066196410000064
where tr (-) of the square matrix (trace function) is the sum of all the elements on its main diagonal.
Under the Gaussian assumption in equation (3), the precision parameter τ ε The degree of noise in the captured data, τ, follows the gamma distribution:
τ~Gamma(x 0 ,y 0 ) (8)
wherein x 0 And y 0 Respectively a shape parameter and a scale parameter.
If the random variable x-Gamma (x, y), we have:
Figure BDA0002066196410000065
the goal of the factor matrix is to captureObserved value T i (i ∈ Ω) and a hyperparameter μ l Sum-sigma l (l =1,2,3). Tensor observed for a given part
Figure BDA0002066196410000066
We first define an index tensor B with the same size as T, if i belongs to omega, each element B i Is 1, otherwise is 0. By sampling all one by one
Figure BDA0002066196410000067
To update the factor matrix a 1 . Consider the gaussian assumption in equation (3). The likelihood function can be written as
Figure BDA0002066196410000068
(symbol)
Figure BDA0002066196410000071
Representing the Hadamard product. The combination of equations (10) and (4) gives the posterior distribution, which can also be written in the form of a multivariate gaussian.
Figure BDA0002066196410000072
The posterior parameters are more novel:
Figure BDA0002066196410000073
Figure BDA0002066196410000074
bayesian posterior probability in internet of vehicles is proportional to product of prior probability and likelihood
Bayesian is about the conditional probability and edge probability of random events a and B.
Figure BDA0002066196410000075
Where L (A | B) is the probability that A will occur in the case that B will occur. According to the definition of conditional probability, the probability of event A occurring under the condition that event B occurs is
Figure BDA0002066196410000076
Likewise, the probability of event B occurring under the conditions of event A occurring
Figure BDA0002066196410000077
By combining and collating the two equations, the following can be obtained:
P(A|B)P(B)=P(A∩B)=P(B|A)P(A) (17)
then, the two sides of the above equation are divided by P (B), if P (B) is non-zero, we can obtain the formula expression of Bayes' theorem:
Figure BDA0002066196410000078
in addition, the first and second substrates are,
Figure BDA0002066196410000079
also sometimes referred to as standard likelihood (standarddiscardlikehood), bayes' rule can be expressed as:
posterior probability = standard likelihood a prior probability, i.e. bayes posterior probability is proportional to the product of prior probability and likelihood.
It should be noted that in equations (12), (13), there are only two terms T i And b i Containing an index i 1 . So that we can pair a in parallel l All vectors in
Figure BDA0002066196410000081
Sampling is performed. Derived according to the same procedure
Figure BDA0002066196410000082
We can also write out with similar derivations
Figure BDA0002066196410000083
And
Figure BDA0002066196410000084
posterior distribution of (2).
Factor matrix
Figure BDA0002066196410000085
Can be decomposed into n 1 Product of conditional distributions of individual vectors:
Figure BDA0002066196410000086
given the likelihood terms in equation (19) and the Gauss-Weishhart lead prior in equation (6), we can note down and decompose the hyper-parameter η 1 Sum-sigma 1 The combined posterior distribution of (a) is as follows:
Figure BDA0002066196410000087
the total likelihood function is given by
Figure BDA0002066196410000088
Combining the likelihood term in equation (21) with the previous term in equation (5) will give τ ε A posteriori of (d.t. ε Is also formed by x 0 And y 0 Parameterized Gamma distribution:
Figure BDA0002066196410000089
wherein:
Figure BDA00020661964100000810
sampling a new strategy:
the ensemble learning actually solves a single prediction problem by establishing a combination of several models. The working principle of the system is that a plurality of weak learners are independently trained and learned and make prediction judgment. The multiple prediction results are finally combined to form single prediction, and the combined forming but prediction method can obtain the result which is better than the prediction result of any one single learner.
In an integrated algorithm, a bagging method is to train data on a randomly generated data subset of an initial data training set through a plurality of homogeneous or heterogeneous black box estimators, and then obtain a final prediction result by performing certain data processing on prediction results of all weak learners. The method reduces the data prediction variance of the weak learner by adopting a means of randomly extracting to generate random data subsets in the construction of the data model. In most cases, the bagging method provides a very simple way to improve on a single model without modifying the underlying algorithm. Because the bagging method can reduce overfitting, the performance is good when the bag-based sparse clustering algorithm is used on a strong classifier and a complex model (for example, full-scaled decision trees), and the algorithm mainly utilizes the random extraction idea of bagging to obtain a plurality of random data subsets for integration to obtain an optimal result. The Bagging random sampling principle is shown in figure 4.
A preferred ordering mechanism:
because the proposed IBTD aims at finding out the optimal interpolation result, after the result of the integrated interpolation is obtained, all the results need to be subjected to data processing, the data are sequenced according to interpolation errors, and finally a plurality of data interpolation results are taken, the interpolation data of the missing part is subjected to average processing, so that the final interpolation result is obtained, and the interpolation performance is improved.
In order to integrate time consumption and performance, 10, 20, \8230, 100 times are respectively integrated, time consumption and corresponding error performance are calculated, a relation between integration times and consumption time and RMSE errors shown in figure 5 is obtained, through trend trends of two curves, it can be seen that time consumption is continuously increased along with increase of the integration times, but change from RMSE to later period is not obvious, which shows that the error is reduced due to increase of the integration times, but the time consumption problem is considered, and the cost performance is not high, so 40 times are adopted as the integration times of the experiment group.
After the integration times are selected, the error results of 40 integration results are sorted from small to large, and the first 5, 10, \8230, and 40 results are respectively taken for result averaging. Taking the first 5 results as an example, the 5 repaired data with the smallest error are taken out, an average value of 5 is taken for each repaired data to obtain a new repaired complete data, and the new repaired complete data is compared with the original data to obtain error data. Finally, the relationship graph of the preferred quantity and the RMSE shown in FIG. 6 is obtained.
From fig. 6, it can be seen that as the number of extraction results increases, the RMSE value decreases continuously, but the decrease trend of more than 10 extractions tends to be gentle. And (3) comprehensively considering the complexity problem, and selecting the first 10 optimal results for preferential averaging in the experiment.
After comparing and analyzing the performances of various existing sorting mechanisms, a bubble sorting mechanism is adopted in the middle to sort interpolation results. The bubble ordering mechanism is as follows:
1) The adjacent elements are compared. If the first is larger than the second, swap both of them;
2) Doing the same for each pair of adjacent elements, from the first pair to the last pair at the end, so that the last element should be the largest number;
3) Repeating the above steps for all elements except the last one;
4) And (5) repeating the steps 1-3 until the sequencing is completed.
Step two, algorithm design and analysis:
step 2.1 tensor-based vehicle networking data loss estimation algorithm:
based on the model, strategy and mechanism designed above, we design the following tensor-based vehicle networking data loss estimation algorithm, which we call Integrated Bayesian tensor decomposition algorithm (IBTD), and the main steps of the algorithm are as follows algorithm 1: .
1) And generating a third-order tensor data model by the traffic data in the form of road section day flow. In generating the tensor model, two cases are used: random deletion and non-random deletion, and two different deletion data models are established for evaluating the performance of the algorithm.
2) And obtaining an incomplete random tensor data set which is different from the original missing tensor data by using the generated missing tensor data through a random sampling algorithm. The random sampling algorithm has the replaced extracted data based on the independent property of weak learners in the lemma 1Bagging strategy, and generates a data set for subsequent model training. Here algorithm 2 is invoked.
3) The generated incomplete random tensor data set is interpolated through a Bayes tensor decomposition algorithm, posterior distribution can be deduced through a priori and likelihood functions according to the principle that the posterior probability is in direct proportion to the product of the prior probability and the likelihood degree through the placed flexible prior parameters, and then the hyper-parameters are continuously updated until convergence. By using the integration idea, the cyclic interpolation will also obtain different interpolation results each time due to different initial data sets of each interpolation. Here algorithm 3 is invoked.
4) And performing bubble sorting and preference on the error parameters of all interpolation results, and performing arithmetic mean processing on the preferred interpolation data to obtain the repair data closer to the original data. Here algorithm 4 is invoked.
Step 2.2 sample new strategy algorithm (algorithm 2):
describing an algorithm:
the input is sample T = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x m ,y m ) And (4) a Bayes tensor algorithm of a base restorer, and iteration times t of the base restorer.
The output is the final restorer f (x)
1) For q =1,2.. T:
a) Random sampling is carried out on the training set for the T time, m times are collected in total, and a sampling set T containing m samples is obtained t
b) Using a sample set T t Training the tth weak learner G t (x)。
2) And carrying out preferred arithmetic mean on interpolation results obtained by the Q base healers to obtain a value which is the final model output.
Step 2.3 bayesian tensor decomposition algorithm (algorithm 3):
describing an algorithm:
1) Calculating conjugate prior hyperparameters by equation (21)
Figure BDA0002066196410000111
Obtaining prior distribution;
2) According to the formulas (12) and (13), calculating
Figure BDA0002066196410000121
Obtaining posterior distribution;
3) According to the formula (22), calculating
Figure BDA0002066196410000122
Obtaining Gamma distribution of the precision parameters;
4) And repeatedly updating the parameters until convergence.
Figure BDA0002066196410000123
Step 2.4 bubble sort mechanism algorithm (algorithm 4):
describing an algorithm:
1) Comparing adjacent elements and swapping two of them if the first is larger than the second;
2) Doing the same for each pair of adjacent elements;
3) The above steps are repeated for all elements except the last one.
Pseudo code description of the IBTD algorithm:
the algorithm 1IBTD algorithm pseudo code described above is described as follows:
Figure BDA0002066196410000124
Figure BDA0002066196410000131
step three, experimental testing and comparative analysis:
step 3.1 speed data:
taking traffic flow data of a local road network (see the attached figure 7) as a research object, as shown in figure 8, the road network comprises four intersections (with the numbers of 12001, 11605, 12701 and 12700), wherein the road section to be detected (with the number of 8) is about 400 meters long, and a section of a far greater route between a Jiayu road and a Wanjiali road in the east-to-west direction is located. The second-order upstream of the road section to be measured comprises road sections 1,2,3, 4, 5 and 6, and the second-order downstream comprises road sections 7, 9, 10, 11, 12 and 13.
The data tensor structure represents:
the adopted data set is traffic flow data of a certain place, which is derived from real-time acquisition of the loop coils of each lane of each intersection of the target area, and the flow of the intersection approach is the flow of the corresponding road section on the assumption that the road section has no entrance and exit. The traffic data is output by a SCATSTrafficReporter system, and each detector collects 180 pieces of traffic data per day. The database contains data for 32 days in 13 road segments 2013 from 9 months 17 to 10 months 18. According to the acquired data, the historical database is constructed into a three-dimensional tensor with the size of 13 × 32 × 180, and the time-space related information of a plurality of modes such as a space mode and a day mode can be simultaneously utilized, as shown in fig. 1, three dimensions respectively represent 13 related road sections, 32 days and 180 flows in 1 day.
Fig. 9 reflects the traffic flow change trend of different road segments on day 22 at 9 months, and although there is a difference in the flow size between different road segments, the overall flow change trend has strong similarity, which is related to the spatial diversion of the traffic flow on each road segment. Fig. 10 reflects the flow rate variation trend of the road segment 1 on different dates, and traffic flow fluctuations are similar because the traffic flow and the travel rule are necessarily linked, and people have certain periodicity in travel, and except for special cases such as major holidays, the travel time of each day has a peak in the morning and at the evening.
Through the constructed three-dimensional tensor of road section day flow, various time-space related information of data is fully utilized, and therefore traffic flow data can be better repaired.
We create a data set by randomly deleting a certain number of entries, thus dividing the original data into two groups: observed (Ω) and absent (removed). For these "missing" entries, we also have the corresponding basic fact, which allows us to directly evaluate the interpolation performance of the model by inputting the deleted entries.
Data missing condition:
in experiments, we wanted to evaluate the performance of different models and different data under different loss rates and different loss scenarios (random and non-random loss). The random deletion is to randomly extract 10%,20%, 8230, 90% of the data at 10% intervals from the history data of 32 days on 13 links, and to set the missing data to 0. Structural missing the same amount of data is extracted from the same historical data, assuming missing data occurs on different road sections in the same time period, taking 30% of the data as an example, and fig. 11 shows two data missing forms, namely random missing and structural missing.
Step 3.2 data repair results and analysis:
two errors are used to measure the repairing effect of the missing data, namely, mean absolute percentage error MAPE (mean absolute percentage error) and root-mean-square error RMSE (root-mean-square deviation). The error is calculated as follows:
Figure BDA0002066196410000151
in numerical experiments, we compared the IBTD model with four other interpolation methods based on tensor decomposition: bayesian gaussian tensor decomposition (BGCP), classical tensor cp decomposition, cp-wopt algorithm, bayesian Principal Component Analysis (BPCA).
The RMSE and MAPE error results calculated according to the formula (24) are shown in fig. 12 and fig. 13, and it can be seen that when the data deletion type is a random deletion condition, the proposed IBTD algorithm can stably reduce the repair error, and the error is relatively small. With the increase of the missing percentage, the errors of the five algorithms are increased, but the increase amplitude of the cp decomposition is too large, which shows that the cp decomposition algorithm is more sensitive to the missing data volume and the interpolation stability is poor. The performance of the cp-wopt interpolation is superior to that of the traditional cp decomposition, and the performance of the BPCA algorithm is slightly inferior to that of the cp-wopt algorithm. The Bayesian tensor decomposition has approximately the same amplification trend as IBTD, but the error of IBTD is smaller, and the interpolation effect is more ideal.
As can be seen from fig. 13 and 14, the overall error rates of the five algorithms are increased under the condition that the data loss type is structural loss, which indicates that the structural loss has a greater influence on data interpolation than the random loss. Compared with the five algorithms, the proposed IBTD shows a more stable error rate when the missing data percentage is increased, and the superiority of the algorithm in using an integration preference principle is reflected.
In order to more intuitively show the repairing effect of the IBTD, fig. 15 shows the effect of comparing the missing data with the repairing data on the road segment 1 of day 22/9 when the data is missing 50%, wherein the missing data is set to 0. Fig. 16 is a comparison between the repaired data and the missing data, and fig. 17 is a comparison effect between the real data and the repaired data, and it can be known from these three graphs that although the data missing situation is serious and the known data information is very little, the data after IBTD repair can reflect the fluctuation situation of the traffic flow more completely, and the repair accuracy is higher.

Claims (4)

1. A tensor-based Internet of vehicles data missing multiple estimation new method is characterized by mainly comprising the following steps of:
1, constructing a model, including a tensor model, a Bayesian tensor decomposition, a new sampling strategy and a preferred ordering mechanism;
2, carrying out tensor-based vehicle networking data loss estimation algorithm, wherein the algorithm comprises algorithm design and complexity theoretical analysis;
3, carrying out experimental test and comparative analysis;
the tensor-based internet of vehicles data missing estimation algorithm comprises the following steps:
2.1, generating a third-order tensor data model for evaluating the performance of the algorithm according to the traffic data in a road section day flow form;
2.2, obtaining an incomplete random tensor data set which is different from the original missing tensor data by using the generated missing tensor data through a random sampling algorithm, and calling a new sampling strategy algorithm;
2.3, interpolating the generated incomplete random tensor data set through a Bayes tensor decomposition algorithm, and calling the Bayes tensor decomposition algorithm;
2.4, bubble sorting and optimizing the error parameters of all interpolation results, carrying out arithmetic mean processing on the optimized interpolation data to obtain repair data closer to the original data, and calling a bubble sorting mechanism algorithm;
the sampling new strategy is as follows: the single prediction problem is solved by establishing a plurality of model combinations, the working principle is that a plurality of weak learners are trained and learned independently and make prediction judgment, and a plurality of prediction results are finally combined to form single prediction;
the preferred ordering mechanism is as follows: and after the result of the integrated interpolation is obtained, performing data processing on all the results, sequencing the data according to the interpolation error, finally taking a plurality of data interpolation results, and averaging the interpolation data of the missing part to obtain the final interpolation result so as to improve the interpolation performance.
2. The tensor-based new car networking data loss multiple estimation method according to claim 1, wherein: step 2.4 bubble sorting mechanism algorithm is:
describing an algorithm:
1) Comparing adjacent elements and swapping two of them if the first is larger than the second;
2) Doing the same for each pair of adjacent elements;
3) The above steps are repeated for all elements except the last one.
3. The tensor-based new car networking data loss multiple estimation method of claim 1, wherein the experimental testing and comparative analysis comprises:
3.1, expressing the acquired data tensor;
and 3.2, repairing according to the data missing condition, and analyzing to obtain the advantages of the new algorithm compared with the old algorithm.
4. The tensor-based new car networking data loss multiple estimation method according to claim 1, wherein: the tensor data model in step 2.1 includes two cases: random deletion and non-random deletion, and two different deletion data models are established.
CN201910421687.9A 2019-05-21 2019-05-21 Tensor-based internet of vehicles data loss multiple estimation method Active CN110162744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910421687.9A CN110162744B (en) 2019-05-21 2019-05-21 Tensor-based internet of vehicles data loss multiple estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910421687.9A CN110162744B (en) 2019-05-21 2019-05-21 Tensor-based internet of vehicles data loss multiple estimation method

Publications (2)

Publication Number Publication Date
CN110162744A CN110162744A (en) 2019-08-23
CN110162744B true CN110162744B (en) 2023-01-17

Family

ID=67631546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910421687.9A Active CN110162744B (en) 2019-05-21 2019-05-21 Tensor-based internet of vehicles data loss multiple estimation method

Country Status (1)

Country Link
CN (1) CN110162744B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766066B (en) * 2019-10-18 2023-06-23 天津理工大学 Tensor heterogeneous integrated vehicle networking missing data estimation method based on FNN
CN110837888A (en) * 2019-11-13 2020-02-25 大连理工大学 Traffic missing data completion method based on bidirectional cyclic neural network
CN113256977B (en) * 2021-05-13 2022-06-14 福州大学 Traffic data processing method based on image tensor decomposition
CN113378931A (en) * 2021-06-11 2021-09-10 北京航空航天大学 Intelligent roadside multi-source data fusion method based on Bayesian tensor decomposition
CN114330145B (en) * 2022-03-01 2022-07-12 北京蚂蚁云金融信息服务有限公司 Method and device for analyzing sequence based on probability map model
CN114841888B (en) * 2022-05-16 2023-03-28 电子科技大学 Visual data completion method based on low-rank tensor ring decomposition and factor prior

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932863A (en) * 2015-06-26 2015-09-23 厦门大学 High-dimensional exponential signal data completion method
CN105679022A (en) * 2016-02-04 2016-06-15 北京工业大学 Multi-source traffic data complementing method based on low rank
CN107564288A (en) * 2017-10-10 2018-01-09 福州大学 A kind of urban traffic flow Forecasting Methodology based on tensor filling
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
CN107992536A (en) * 2017-11-23 2018-05-04 中山大学 Urban transportation missing data complementing method based on tensor resolution

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932863A (en) * 2015-06-26 2015-09-23 厦门大学 High-dimensional exponential signal data completion method
CN105679022A (en) * 2016-02-04 2016-06-15 北京工业大学 Multi-source traffic data complementing method based on low rank
CN107577649A (en) * 2017-09-26 2018-01-12 广州供电局有限公司 The interpolation processing method and device of missing data
CN107564288A (en) * 2017-10-10 2018-01-09 福州大学 A kind of urban traffic flow Forecasting Methodology based on tensor filling
CN107992536A (en) * 2017-11-23 2018-05-04 中山大学 Urban transportation missing data complementing method based on tensor resolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"概率张量分解综述";史加荣,张安银;《陕西理工大学学报(自然科学版)》;20180831;第34卷(第4期);全文 *

Also Published As

Publication number Publication date
CN110162744A (en) 2019-08-23

Similar Documents

Publication Publication Date Title
CN110162744B (en) Tensor-based internet of vehicles data loss multiple estimation method
Chen et al. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation
Do et al. An effective spatial-temporal attention based neural network for traffic flow prediction
CN110827544B (en) Short-term traffic flow control method based on graph convolution recurrent neural network
CN111612243B (en) Traffic speed prediction method, system and storage medium
CN111199343A (en) Multi-model fusion tobacco market supervision abnormal data mining method
CN112216108A (en) Traffic prediction method based on attribute-enhanced space-time graph convolution model
Yu et al. A low rank dynamic mode decomposition model for short-term traffic flow prediction
Xu et al. Graph partitioning and graph neural network based hierarchical graph matching for graph similarity computation
CN113762595B (en) Traffic time prediction model training method, traffic time prediction method and equipment
Shahzad et al. Missing data imputation using genetic algorithm for supervised learning
Zheng et al. Hybrid deep learning models for traffic prediction in large-scale road networks
CN115206092A (en) Traffic prediction method of BiLSTM and LightGBM model based on attention mechanism
James Citywide estimation of travel time distributions with Bayesian deep graph learning
CN115862324A (en) Space-time synchronization graph convolution neural network for intelligent traffic and traffic prediction method
Chen et al. Traffic flow prediction with parallel data
Prabowo et al. Traffic forecasting on new roads unseen in the training data using spatial contrastive pre-training
Li et al. A two-stream graph convolutional neural network for dynamic traffic flow forecasting
Zhong et al. Estimating link flows in road networks with synthetic trajectory data generation: Inverse reinforcement learning approach
Giang et al. Adaptive Spatial Complex Fuzzy Inference Systems With Complex Fuzzy Measures
Bornholdt Genetic algorithm dynamics on a rugged landscape
Zhao et al. STCGAT: A spatio-temporal causal graph attention network for traffic flow prediction in intelligent transportation systems
Shin et al. Missing value imputation model based on adversarial autoencoder using spatiotemporal feature extraction
Wati et al. Particle swarm optimization comparison on decision tree and Naive Bayes for pandemic graduation classification
Han et al. An Urban Traffic Flow Prediction Approach Integrating External Factors Based on Deep Learning and Knowledge Graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant