CN114445252A

CN114445252A - Data completion method and device, electronic equipment and storage medium

Info

Publication number: CN114445252A
Application number: CN202111346757.2A
Authority: CN
Inventors: 余剑峤; 张舒昱
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-05-06

Abstract

The embodiment of the invention provides a data completion method and device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring an original data set; carrying out data preprocessing on an original data set and classifying the original data set into a incomplete data set and a historical data set; extracting time characteristics of the incomplete data set to obtain a first time sequence; extracting time characteristics of the historical data set to obtain a second time sequence; performing multi-head attention mechanism calculation on the first time sequence to obtain a first output matrix; performing multi-head attention mechanism calculation on the second time sequence to obtain a second output matrix; and performing fusion processing on the first output matrix and the second output matrix to obtain the completion data. The method can complete the missing traffic data, and has good data completing effect in the face of different data missing conditions.

Description

Data completion method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a data completion method and device, electronic equipment and a storage medium.

Background

Traffic data is a road characteristic over a period of time, such as: the collection of traffic and speed is an important component of the Intelligent Transportation System (ITS). On the basis of the road characteristic data, the traffic department can carry out reasonable and effective traffic control, and enterprises can also provide more accurate and reliable services. In practice, however, traffic data sets are often missing due to sensor failures, regional blackouts, extreme weather, and the like. Therefore, how to provide a data completion method to complete the missing traffic data is a technical problem to be solved urgently.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the invention provides a data completion method, a data completion device, an electronic device and a storage medium, which can complete missing traffic data on the basis of historical traffic data and have good data completion effect in the face of different data missing conditions.

In order to achieve the above object, a first aspect of an embodiment of the present invention provides a data completion method, including:

acquiring an original data set;

performing data preprocessing on the raw data set to classify the raw data set into a incomplete data set and a historical data set;

extracting time characteristics of the incomplete data set to obtain a first time sequence;

extracting time characteristics of the historical data set to obtain a second time sequence;

performing multi-head attention mechanism calculation on the first time sequence to obtain a first output matrix;

performing multi-head attention mechanism calculation on the second time sequence to obtain a second output matrix;

and performing fusion processing on the first output matrix and the second output matrix to obtain the completion data.

In some embodiments of the present invention, the performing temporal feature extraction on the incomplete data set to obtain a first time series includes:

inputting the incomplete data set into a preset long-short term memory completion network;

extracting observation data in the incomplete data set at a preset time period;

calculating missing data of the preset time interval according to the observation data and the prediction data of the previous time interval of the preset time interval;

and obtaining the first time sequence according to the missing data and the observation data.

In some embodiments of the present invention, the time feature extracting the historical data set to obtain a second time series includes:

inputting the historical data set into a preset historical data processing network;

calculating historical average data of the historical data set through the historical data processing network, and taking the historical average data as the historical data of the preset time period;

and obtaining the second time sequence according to the historical average data.

In some embodiments of the invention, the performing multi-head attention mechanism calculation on the first time series to obtain a first output matrix includes:

inputting the first temporal sequence into a first multi-headed attention layer;

converting the first time series into a first attention matrix by the first multi-head attention layer;

converting the first attention moment matrix into a first probability matrix through a preset function;

performing attention mechanism calculation on the first probability matrix to obtain a first characteristic matrix;

and performing dimension reduction processing on the first characteristic matrix to obtain a first output matrix.

In some embodiments of the present invention, the performing a multi-point attention mechanism calculation on the second time series to obtain a second output matrix includes:

inputting the second time series to a second multi-headed attention layer;

converting, by the second multi-head attention layer, the second time series into a second attention matrix;

converting the second attention matrix into a second probability matrix through a preset function;

performing attention mechanism calculation on the second probability matrix to obtain a second feature matrix;

and performing dimensionality reduction on the second feature matrix to obtain a second output matrix.

In some embodiments of the present invention, the fusing the first output matrix and the second output matrix to obtain the completion data includes:

splicing the first output matrix and the second output matrix to obtain a spliced matrix;

calculating an attention mechanism of the spliced matrix through a single-head attention layer to obtain a target matrix;

and performing feature extraction on the target matrix through a linear layer to obtain the completion data.

In some embodiments of the present invention, the calculating an attention mechanism on the mosaic matrix through a single attention layer to obtain a target matrix includes:

inputting the stitching matrix into a single-headed attention layer;

converting the mosaic matrix into an attention matrix by the single-headed attention layer;

converting the attention moment array into a probability matrix through a preset function;

and carrying out attention mechanism calculation on the probability matrix to obtain the target matrix.

To achieve the above object, a second aspect of an embodiment of the present invention provides a data complementing apparatus, including:

the original data set acquisition module is used for acquiring an original data set;

the system comprises an original data set preprocessing module, a data preprocessing module and a data processing module, wherein the original data set preprocessing module is used for performing data preprocessing on the original data set so as to classify the original data set into a defective data set and a historical data set;

the first time sequence extraction module is used for extracting time characteristics of the incomplete data set to obtain a first time sequence;

the second time sequence extraction module is used for extracting time characteristics of the historical data set to obtain a second time sequence;

the first output matrix calculation module is used for performing multi-head attention mechanism calculation on the first time sequence to obtain a first output matrix;

the second output matrix calculation module is used for performing multi-head attention mechanism calculation on the second time sequence to obtain a second output matrix;

and the data fusion module is used for carrying out fusion processing on the first output matrix and the second output matrix to obtain the completion data.

To achieve the above object, a third aspect of an embodiment of the present invention provides an electronic apparatus, including:

at least one memory;

at least one processor;

at least one program;

the programs are stored in a memory and a processor executes the at least one program to implement the data completion method of the present invention as described in the above first aspect.

To achieve the above object, a fourth aspect of the present invention proposes a storage medium which is a computer-readable storage medium storing computer-executable instructions for causing a computer to execute:

a method of data completion as described in the first aspect above.

According to the data completion method, the data completion device, the electronic equipment and the storage medium, the original data set is obtained, data preprocessing is carried out on the original data set, the original data set is classified into the incomplete data set and the historical data set, time feature extraction is carried out on the incomplete data set to obtain a first time sequence, time feature extraction is carried out on the historical data set to obtain a second time sequence, multi-head attention mechanism calculation is carried out on the first time sequence to obtain a first output matrix, multi-head attention mechanism calculation is carried out on the second time sequence to obtain a second output matrix, and finally fusion processing is carried out on the first output matrix and the second output matrix to obtain the completion data. The technical scheme provided by the embodiment of the invention can solve the problem of data loss caused by the fact that the collection of traffic data is often influenced by factors such as communication errors, sensor faults, storage loss and the like, and the problem of traffic data loss is solved by using the attention-cycle neural network, so that the historical data can be fully utilized, and the traffic data completion effect is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a completely random missing of traffic data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a non-random loss of traffic data according to an embodiment of the present invention;

FIG. 3 is a traffic speed chart of the three-river avenue of Beijing east China on different dates every 10 minutes, according to the embodiment of the present invention;

FIG. 4 is a flow chart of a data completion method according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S130 in FIG. 4;

FIG. 6 is a flowchart of step S140 in FIG. 4;

FIG. 7 is a flowchart of step S150 in FIG. 4;

fig. 8 is a flowchart of step S160 in fig. 4;

fig. 9 is a flowchart of step S170 in fig. 4;

fig. 10 is a flowchart of step S620 in fig. 9;

FIG. 11 is a block diagram of an overall framework of an attention-cycling neural network for completion of traffic data provided by an embodiment of the present invention;

FIG. 12 is a schematic diagram of a data set of Beijing in China according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of a fifth region of California dataset provided by an embodiment of the present invention;

FIG. 14 is a diagram of the hong Kong data set in China according to an embodiment of the present invention;

fig. 15 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the description to the first and second is only for the purpose of distinguishing technical features, it is not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated, nor is it necessary to describe a particular order or sequence.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

First, several terms referred to in the present application are resolved:

reinforcement Learning (Reinforcement Learning): reinforcement learning, also known as refit learning, evaluation learning or reinforcement learning, is one of the paradigms and methodologies of machine learning, and is used for describing and solving the problem that an Agent (Agent) achieves the maximum return or achieves a specific target through a learning strategy in the interaction process with the environment; learning the mapping from environmental states to behaviors enables the behavior selected by the agent to receive the greatest reward from the environment, making the external environment optimal for the evaluation of the learning system in some sense (or the overall system's performance). If a certain action strategy of an Agent results in a positive reward (reinforcement signal) of the environment, the Agent later generates this action strategyThe tendency is intensified. The goal of the Agent is to find the optimal policy at each discrete state to maximize the desired discount reward sum. Reinforcement learning regards learning as a heuristic evaluation process, an Agent selects an action for an environment, the state of the environment changes after receiving the action, and a reinforcement signal (reward or punishment) is generated and fed back to the Agent, the Agent selects the next action according to the reinforcement signal and the current state of the environment, the selection principle is that reinforcement learning is different from supervised learning in connection insights learning, and the reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the action of the Agent on the quality of the generated action, rather than telling the Agent how to generate correct action. Since the external environment provides little information, agents must learn on their own experience. In this way, agents gain knowledge in the context of action-by-action evaluation, improving the action scheme to accommodate the increased probability that the context is being enhanced (rewarded). The action selected affects not only the immediate enhancement value, but also the state at the next moment in the environment and the final enhancement value. The goal of reinforcement learning is to dynamically adjust the parameters to achieve reinforcement signal maximization. In so-called reinforcement learning, an Agent of an Agent acts as a learning system, obtains information s of a current State (State) of an external environment, takes a tentative action u on the environment, and obtains an evaluation r of the action fed back by the environment and a new environment State. If an action u by the agent results in a positive reward (immediate reward) to the environment, then the tendency of the agent to generate this action at a later time is heightened; conversely, the tendency of the agent to generate this action will be diminished. In the repeated interaction of the control behavior of the learning system and the state and evaluation of the environmental feedback, the mapping strategy from the state to the action is continuously modified in a learning mode so as to achieve the aim of optimizing the system performance. The reinforcement learning comprises two types of Value-based and Policy-based, wherein the Value-based is a learning Value function, a Policy is taken from the Value function, and a Policy a is determined_tIs a method for indirectly generating a strategy; the action-Value estimated Value in Value-Base will eventually converge to the corresponding Value (usually a different finite number, which can be converted to 0)1) and therefore usually a deterministic policy (deterministic policy) is obtained; policy-based is a learning strategy function, and a method for directly generating a strategy can generate the probability pi theta (a | s) of each action; Policy-Based generally does not converge to a deterministic value; Policy-Based applies to continuous motion spaces where actions can be selected by Gaussian distribution instead of computing the probability of each action.

RNN (current Neural Networks, Recurrent Neural Networks): the RNN consists of an input layer, a hidden layer and an output layer. The recurrent neural network neurons not only predict, but also pass a time step s_t-1The next neuron is given. Wherein, the output layer Ot is a full connection layer, and Ot ═ g (Vs)_t) G is the activation function, V is the network weight matrix of the output layer, s_tIs a hidden layer; i.e. each of its nodes is connected to each of the nodes of the hidden layer. The current output is obtained by hidden layer calculation, and the calculation of the current hidden layer is not only related to the input, but also related to the output of the previous hidden layer; s_t＝f(Ux_t+Ws_t-1) U is the network weight matrix of the input x, W is the network weight matrix of the last value as input this time, f is the activation function, and the hidden layer is the loop layer.

Long short-Term Memory network (Long short-Term Memory): the network is called a Long Short Term Memory network (Long Short Term Memory networks) and is a special RNN. LSTM works well for many problems and is now widely used. LSTM explicitly avoids the long-term dependence problem in design. All recurrent neural networks are in the form of repeating neural network modules forming a chain. In a normal RNN, the repetitive module structure is very simple, e.g. only one tanh layer. LSTM uses the ability of a "gate" structure to remove or add information to a cellular state. A gate is a method of selectively passing information. The method comprises a sigmoid neural network layer and a pointwise multiplication operation. The Sigmoid layer outputs a value between 0 and 1 describing how much of each part can pass through. 0 represents "no amount is allowed to pass through" and 1 means "any amount is allowed to pass through". LSTM has three gates to protect and control cell state: input gate, output gate, forget the gate.

Attention Mechanism (Attention Mechanism) stems from the study of human vision. In cognitive science, humans selectively focus on a portion of all information while ignoring other visible information due to bottlenecks in information processing. The above mechanism is commonly referred to as an attention mechanism. The neural attention mechanism may enable the neural network to concentrate on a subset of its inputs: a particular input is selected. Attention may be applied to any type of input regardless of its shape. In situations where computing power is limited, the attention mechanism is a resource allocation scheme that is the primary means to solve the information overload problem, allocating computing resources to more important tasks. The Multi-head Attention Mechanism (Multi-head Attention Mechanism) uses multiple queries to compute the selection of multiple information from the input information in parallel. Each focusing on a different part of the input information.

The statistical learning-based method comprises the following steps: traditional statistical learning applies statistical operations to simply complement incomplete data, such as linear interpolation, historical averaging, and final observation to complement missing values, but this type of method can only solve the situation where data distribution is relatively simple. Some researchers have proposed using data around the missing location in the feature matrix to compute the missing value, such as the classical K-nearest neighbor (KNN) algorithm, which fills the missing by computing the average of K neighboring locations around the missing location. Differential integration moving average autoregressive model (ARIMA) and its variants calculate missing values by predicting missing data based on historical data, however, these methods do not make efficient use of features collected after a miss occurs. The constraint method establishes a completion rule according to the overall data characteristics in the data set; however, this method is only applicable to univariate data and is not effective in most practical scenarios involving multivariate data.

Matrix decomposition based method: matrix Factorization (Matrix Factorization) by decomposing and reconstructing data matrices, correlations in the data are found, and missing values are complemented. Time-series Regularized Matrix Factorization (TRMF) is a time-series completion method that introduces regularization and scalable Matrix Factorization methods based on Matrix decomposition. In addition, there is a technical solution to introduce a Probabilistic Principal Component Analysis (PPCA) into matrix decomposition, which assumes that potential features of observed data conform to gaussian distribution, thereby performing data complementation. A more complex low rank tensor complementation algorithm is also employed to recover missing data. A Bayesian Gaussia Candecomp/Parafac (BGCP) tensor decomposition method is proposed to convert the original data matrix into a high-dimensional tensor, and then describe and recover the incomplete matrix. However, matrix decomposition based methods require inputs having a specific shape, which limits their applications.

Generate a generic countermeasure network (GAN): the method is a deep learning model and is one of the most promising methods for unsupervised learning on complex distribution in recent years. The model passes through at least two modules in the framework: mutual game learning of the Generative Model and the Discriminative Model produces a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.

Historical Average (HA): the HA calculates an average value for the corresponding time period of each road segment in one day using the complete data of the last five days to fill in missing values.

K-Nearest Neighbors Algorithm (KNN) KNN is a nonparametric interpolation method, and the deletion is calculated by the average value of K Nearest neighbor points, and 4 Nearest neighbor points are considered for calculation in the embodiment of the invention.

Bayesian Gaussian tensor decomposition (BGCP): the BGCP is a Bayesian tensor decomposition model, and a Markov chain Monte Carlo is adopted to model a potential factor, namely a low-rank structure.

Superposition Denoising autoencoder (DSAE): DSAE receives a time series data feature matrix containing observations and noise, then performs dimensionality reduction and dimensionality enhancement to extract implicit features of the data, and outputs a completion result.

Bayesian Temporal Matrix Factorization (BTMF): BTMF uses a gaussian vector autoregressive process to model timing dependencies to complement time series data.

Parallel generation of a countermeasure-completion network (PGAN): PGAN is a GAN-based data completion method for generating missing traffic data, and both the generator and the discriminator are composed of linear layers.

Bidirectional Recurrent neural network (BRITS): the BRITS inputs the missing time data into two RNNs in the forward direction and the reverse direction respectively, and estimates the missing time sequence value by combining the outputs of the two RNNs.

Completely random missing data and non-random missing: the present invention defines two common patterns of data deletion, namely complete random deletion (MCAR) and non-random deletion (MNAR), as shown in fig. 1 and 2, respectively, where the gray boxes are missing data and the white boxes are non-missing data. In MCAR, the distribution of missing data is scattered and random in the time series, whereas in MNAR, missing data points occur at consecutive time points. MNAR is a relatively more challenging problem due to the lack of neighborhood information necessary to recover a single point of absence.

Traffic data is a collection of road characteristics (such as traffic and speed) over a period of time, and is an important component of the Intelligent Transportation System (ITS). On the basis of the data, the traffic department can carry out reasonable and effective traffic control, and enterprises can also provide more accurate and reliable services. Recently, deep learning based algorithms

However, most deep learning based methods are highly dependent on the quality of the data. In practice, traffic data sets are often missing due to sensor failures, regional blackouts, extreme weather, and the like. Therefore, the problem of traffic data loss needs to be solved.

To solve the deletion problem, the most straightforward approach is to delete data that is and is not observed. However, such operations result in loss of temporal or spatial information. To address this problem, the incomplete data may be estimated using an appropriate data completion method. The data completion method estimates missing values by analyzing the dependency or distribution of data. The appropriate data completion method can accurately recover the missing data, thereby avoiding the performance degradation of various downstream data mining algorithms in the intelligent transportation system.

In addition, traffic data has strong periodicity and volatility compared to conventional time series data, such as stock index and medical device data. Fig. 3 shows the road speed of the east three-li-he street in beijing, china within five days. Traffic speeds for several consecutive days have a similar pattern as a whole. For example, road speeds are low during the morning and evening peak of the day. In addition, traffic speed is completely different every day due to various factors such as weather, accident and date, and the speed at each time point is highly correlated with the road speed at the previous and following times. Therefore, how to combine the observation data before and after different time points and the overall historical data is important for processing the missing data.

Some solutions are based on statistical methods such as KNN, ARIMA design algorithms to complement the incomplete data, however these methods are only effective for distributing simpler data. Based on matrix decomposition, some technical schemes introduce probability models and Gaussian distribution, but the methods are only suitable for low-rank data and are difficult to process traffic data influenced by various factors, and in addition, the methods have specific requirements on formats of input data and limit application scenarios of the methods. In recent years, deep learning has been widely used for the data completion problem. For example, some solutions propose a BRITS with a bidirectional recurrent neural network structure to complement data before and after the missing position. Some solutions propose generating the E2GAN of the network based on the countermeasure to generate the lost data. However, these studies are often affected by computational or convergence slowness problems on large data sets. While existing methods have produced acceptable results in the incomplete traffic data completion problem, some challenges remain. For example, some of the existing methods are too complex and many models are difficult to train. The existing method only considers the time dependence in the incomplete time sequence data and does not utilize the inherent periodicity of the traffic data. In addition, the problem of how to better extract the time sequence features from the incomplete data is still greatly saved.

In order to solve the above problems and fill up the blank of research, the present invention provides a new data completion model called Attention-Driven Current Impulse Network (ADRIN). Unlike previous models, the present invention extracts features from the incomplete input and historical averages to account for the volatility and periodicity of traffic data. ADRIN adopts Long Short-Term Memory for input, LSTM-I to receive incomplete time sequence input, and adopts multi-head attention network to model complete time sequence characteristics. In addition, the present invention applies a multi-head attention network to historical data to extract features related to historical information. The outputs of the two modules are then passed through a fusion module that contains an attention layer and a linear layer to obtain a completion result.

Based on this, the embodiment of the invention provides a data completion method, a data completion device, an electronic device and a storage medium, which can realize completion of traffic data.

Embodiments of the present invention provide a data completion method, an apparatus, an electronic device, and a storage medium, which are described in detail with reference to the following embodiments, and first describe the data completion method in the embodiments of the present invention.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the invention provides a data completion method, relates to the technical field of artificial intelligence, and particularly relates to a data completion method, a data completion device, electronic equipment and a storage medium. The data completion method provided by the embodiment of the invention can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, smart watch, or the like; the server can be an independent server, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform and the like; the software may be an application or the like implementing a data completion method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 4 is an alternative flowchart of a data completion method according to an embodiment of the present invention, and the method in fig. 4 may include, but is not limited to, steps S110 to S170.

Step S110, acquiring an original data set;

step S120, carrying out data preprocessing on the original data set so as to classify the original data set into a incomplete data set and a historical data set;

step S130, extracting time characteristics of the incomplete data set to obtain a first time sequence;

step S140, extracting time characteristics of the historical data set to obtain a second time sequence;

step S150, performing multi-head attention mechanism calculation on the first time sequence to obtain a first output matrix;

step S160, performing multi-head attention mechanism calculation on the second time sequence to obtain a second output matrix;

step S170, carrying out fusion processing on the first output matrix and the second output matrix to obtain complete data;

in practice, the collection of traffic data is often affected by communication errors, sensor failures, storage loss and other factors to cause data loss, so that the collected data is damaged, and the effectiveness of downstream applications is seriously affected. However, the existing completion method only estimates the missing data from the observation data with the defect, and neglects to utilize the historical data. The invention solves the problem of traffic data loss by using the attention circulation neural network, and can effectively improve the traffic data completion effect.

In step S110 of some embodiments, a raw data set is obtained, which may be acquired by setting different data sources according to which traffic speed data of different regions is acquired. For example, the data employs three real-world traffic speed data sets, including but not limited to: traffic data in Beijing, California, USA, and hong Kong, China. The traffic data of Beijing China includes the average speed of 1368 roads of Beijing China during the period of 1 month and 1 day 00:00 in 2019 and 30 months and 30 days in 2019. The U.S. california traffic data included 144 sensor stations in the fifth district of california, usa, recording times of 1 month, 00:00 in 2013 to 30 months, 30 days, 23:55 in 2013. The hong kong traffic data in china includes average road speeds of major roads in hong kong china between 10: 00 at 3/10/2021 and 31/31 at 7/2021 at 23: 50.

In step S120 of some embodiments, the raw data set is data preprocessed to classify the raw data set into a incomplete data set and a historical data set. Sampling the acquired data at different time intervals of 5 minutes or 10 minutes and the like, taking the complete data as a historical data set, and taking the incomplete data as a incomplete data set.

In step S130 of some embodiments, temporal feature extraction is performed on the incomplete data set to obtain a first time series. For example, temporal feature extraction can be performed on the incomplete data set through an improved long-short term memory completion network. The time feature extraction can be performed on the incomplete data set by generating a countermeasure completion network, a recurrent neural network, a bidirectional recurrent neural network and the like in parallel, but not limited thereto.

In step S140 of some embodiments, a time feature extraction is performed on the historical data set to obtain a second time series. For example, the historical data set may be time feature extracted through a historical averaging network. The time feature extraction may also be performed on the historical data set by a K-nearest neighbor algorithm, linear interpolation, matrix decomposition-based method, and the like, but is not limited thereto.

In step S150 of some embodiments, a multi-point attention mechanism calculation is performed on the first time series to obtain a first output matrix. And performing multi-head attention mechanism calculation on the first time sequence obtained in the step S130 to obtain a first output matrix. Although the improved long-short term memory completion network can evaluate missing data step by step, it has limited time-dependent capture capabilities for longer time series data, especially traffic data that often has hundreds of time periods during a day. By applying the multi-head attention mechanism, the learning can be respectively carried out in a plurality of subspaces by multi-head operation, and the time sequence characteristics of the long-time sequence data are captured.

In step S160 of some embodiments, a multi-point attention mechanism calculation is performed on the second time series to obtain a second output matrix. And performing multi-head attention mechanism calculation on the second time sequence obtained in the step S140 to obtain a second output matrix. The historical average network can not capture the time sequence of longer time sequence data, and in order to more effectively utilize the historical data, a multi-head attention mechanism is applied to calculate the historical data to obtain a second output matrix.

In step S170 of some embodiments, the first output matrix and the second output matrix are fused to obtain the complementary data. And the incomplete data set and the historical data set pass through the multi-head attention mechanism output matrix, and the two output matrices are fused in a fusion processing module to calculate a final completion result.

Through the data completion method provided by the embodiment of the steps, the problem of completion lack of traffic data can be solved, and an attention cycle neural network (ADRIN) network structure is provided, compared with the existing attribution method based on deep learning, the attention cycle neural network utilizes the unique periodicity and volatility of traffic data to extract features from incomplete input and historical average and supplement missing values, and has a long-short-term memory completion network (LSTM-I) to accept input with missing, and simultaneously applies a multi-head attention mechanism to extract time features from the historical average and the LSTM-I output respectively. And inputting the output of the multi-head attention mechanism into a fusion module comprising an attention layer and a full connection layer to obtain a completed result. The problem that the prior art cannot utilize historical data to complement the lack of data is solved, the problem of traffic data loss can be solved through the embodiment of the invention, and the traffic data complementing effect can be effectively improved.

Referring to fig. 5, in some embodiments, step S130 may include, but is not limited to including, steps S210 to S240;

step S210, inputting the incomplete data set into a preset long-term and short-term memory completion network;

step S220, extracting observation data in a preset time period in the incomplete data set;

step S230, calculating missing data of a preset time interval according to the observation data and the prediction data of the previous time interval of the preset time interval;

step S240, a first time series is obtained according to the missing data and the observation data.

Specifically, in step S210 of some embodiments, the incomplete data set is input into the long-short term memory completion network; in step S220 of some embodiments, observation data in the incomplete data set at a preset time period is extracted, and the integrity of the data at different time periods in the incomplete data set is different, for example, including but not limited to: in the data set of a period from 8 to 20, the data from 10 to 12 are incomplete, and the data from other periods are complete, so that the data from 10 to 12 are observed data of a preset period. In step S230 of some embodiments, missing data of a preset time period is calculated according to the observation data and the prediction data of the previous time period of the preset time period; time periods based on current observed data, for example, include but are not limited to including: 10 o 'clock to 12 o' clock on day 8/month 23, due to the periodicity of the traffic data, missing data of the preset time period can be calculated according to the 10 o 'clock to 12 o' clock time period on day 8/month 22. Meanwhile, according to the fitting of the multi-aspect data, missing data of a preset time period is calculated in the long-term and short-term memory completion network. In step S240 of some embodiments, a first time series is obtained according to the missing data and the observation data, and the missing data of the preset time period can be obtained by complementing the missing data set to obtain the first time series. By the data completion method provided by the embodiment of the above steps of the invention, the reconstructed LSTM-I uses the predicted value of the previous time segment to estimate the missing value of the current time segment. For each time segment, LSTM-I recovers the features using the estimated values and observed values, resulting in a first time series.

Referring to fig. 6, in some embodiments, step S140 may include, but is not limited to including, steps S310 to S330;

step S310, inputting a historical data set into a preset historical data processing network;

step S320, calculating historical average data of a historical data set through a historical data processing network, and taking the historical average data as historical data of a preset time period;

step S330, a second time series is obtained according to the historical average data.

Specifically, in step S310 of some embodiments, the historical data set is input into a preset historical data processing network; in step S320 of some embodiments, historical average data of the historical data set is calculated by the historical data processing network, the historical average data is used as historical data of a preset time period, and an average value is calculated for a corresponding time period in one day of each road section by using complete data of the last five days to fill in missing values; step S330, a second time series is obtained according to the historical average data. According to the data completion method provided by the embodiment of the steps, due to the periodicity of the traffic data, historical data plays an important role in completing the traffic data, and the error can be reduced and the interference can be reduced by calculating the historical average data of the historical data set, so that preparation is made for the next fusion processing.

Referring to fig. 7, in some embodiments, step S150 may include, but is not limited to including, steps S410 to S450;

step S410, inputting a first time sequence into a first multi-head attention layer;

step S420, converting the first time series into a first attention matrix through a first multi-head attention layer;

step S430, converting the first attention moment matrix into a first probability matrix through a preset function;

step S440, performing attention mechanism calculation on the first probability matrix to obtain a first feature matrix;

step S450, the dimension reduction processing is carried out on the first characteristic matrix to obtain a first output matrix.

Specifically, in step S410 of some embodiments, a first time series is input to a first multi-headed attention layer; in step S420 of some embodiments, the first time series is converted into a first attention matrix through a first multi-head attention layer, and a matrix operation is performed in a multi-head attention mechanism to convert the first time series into the first attention matrix; in step S430 of some embodiments, the first attention moment array is converted into a first probability matrix by a preset function, the first attention moment array is converted into the first probability matrix by a softmax function, and the sum of the probabilities of all columns is 1; in step S440 of some embodiments, performing attention mechanism calculation on the first probability matrix to obtain a first feature matrix, and calculating a multi-head attention mechanism to capture richer feature relationships to obtain the first feature matrix; in step S450 of some embodiments, a dimension reduction process is performed on the first feature matrix to obtain a first output matrix, and after calculation of a multi-head attention mechanism, output matrices Z of multiple attention mechanisms can be obtained relative to a single-head attention mechanism, and all the obtained output matrices Z are combined, and a dimension reduction is performed through a full connection layer, and the first output matrix having the same shape as an input time sequence is output. According to the data completion method provided by the embodiment of the steps, the first time sequence is input into the first multi-head attention layer, the first output matrix is obtained finally through matrix operation, attention mechanism calculation and dimensionality reduction, and the LSTM structure can only model the time sequence data from a single direction, so that the observed value after missing data is difficult to combine when a completion task is carried out. And by adopting a multi-head attention mechanism, incomplete data can be completed by combining observed values before and after the missing position based on the whole segment of time sequence data.

Referring to fig. 8, in some embodiments, step S160 may include, but is not limited to including, steps S510 to S550;

step S510, inputting the second time series into a second multi-head attention layer;

step S520, converting the second time series into a second attention matrix through a second multi-head attention layer;

step S530, converting the second attention matrix into a second probability matrix through a preset function;

step S540, performing attention mechanism calculation on the second probability matrix to obtain a second characteristic matrix;

and step S550, performing dimension reduction processing on the second feature matrix to obtain a second output matrix.

Specifically, in step S510 of some embodiments, a second time series is input to a second multi-headed attention layer; in step S520 of some embodiments, converting the second time series matrix into a second attention matrix by the second multi-headed attention layer; in step S530 of some embodiments, the second attention matrix is converted into a second probability matrix by a preset function, the second attention matrix is converted into the second probability matrix by a softmax function, and the sum of the probabilities of all columns is 1; in step S540 of some embodiments, the attention mechanism calculation is performed on the second probability matrix to obtain a second feature matrix, and a richer feature relationship can be captured through the calculation of the multi-head attention mechanism to obtain the second feature matrix; in step S550 of some embodiments, the dimension reduction processing is performed on the second feature matrix to obtain a second output matrix. According to the data completion method provided by the embodiment of the steps, the second time sequence is input into the second multi-head attention layer, the second output matrix is obtained finally through matrix operation, attention mechanism calculation and dimensionality reduction, the output result of the LSTM-I can be regarded as a preliminary incomplete result, but the LSTM-I is limited in model capacity and limited in completion effect, so that the embodiment of the invention hopes to be combined with the multi-head attention mechanism with stronger time sequence modeling capacity to complete the observation values before and after the incomplete position, and meanwhile, the periodicity of traffic data is considered, so that a historical data module (HA) is added. In order to more effectively utilize historical data and ensure that the output distribution of the left part and the output of the right part of the model are approximate at the fusion module, if the output distribution is inconsistent and is directly spliced and then input into the fusion module, the distribution difference is overlarge and parameter learning is difficult to perform, and the model parameters are easier to learn due to the approximate distribution.

Referring to fig. 9, in some embodiments, step S170 may include, but is not limited to including, steps S610 through S630;

step S610, splicing the first output matrix and the second output matrix to obtain a spliced matrix;

step S620, calculating an attention mechanism of the spliced matrix through the single-head attention layer to obtain a target matrix;

and step S630, performing feature extraction on the target matrix through the linear layer to obtain the completion data.

In step S610 of some embodiments, the first output matrix and the second output matrix are subjected to a stitching process to obtain a stitching matrix, and the incomplete data set and the historical data set are subjected to a multi-head attention mechanism output matrix, and the output matrix is subjected to a stitching process to obtain a stitching matrix.

In step S620 of some embodiments, the objective matrix is obtained by performing attention mechanism calculation on the mosaic matrix through a single attention layer. And inputting the spliced matrix after splicing processing into the single attention layer to obtain a single attention layer output matrix.

In step S630 of some embodiments, feature extraction is performed on the target matrix through the linear layer, resulting in the completion data. After the single attention layer calculation, the linear layer is input for feature extraction, and a fully connected neural network is used in the linear layer to obtain the complementary data. As shown in equation (1):

h and H^*Stitching, W, representing an implicit state_lAnd b_lIs a parameter of the linear layer or layers,

is the final completion result of the attention-cycling neural network.

According to the data completion method provided by the embodiment of the steps, the completion data can be finally calculated through the fusion processing module, the first output matrix for processing the incomplete data set and the second output matrix for processing the historical data set are arranged on the upper half part of the attention circulation neural network, the output matrices contain the characteristics of the incomplete data set and the historical data set, and the fusion processing module is required to process the first output matrix and the second output matrix to obtain the final completion data.

Referring to FIG. 10, in some embodiments, step S620 may include, but is not limited to including, steps S710-S730

Step S710, inputting the splicing matrix into a single-head attention layer;

step S720, converting the spliced matrix into an attention matrix through a single-head attention layer;

step S730, converting the attention moment array into a probability matrix through a preset function;

and step S740, performing attention mechanism calculation on the probability matrix to obtain a target matrix.

In step S710 of some embodiments, a stitching matrix is input to the single-headed attention layer, and a stitching matrix of the first output matrix and the second output matrix is input to the single-headed attention layer; in step S720 of some embodiments, the mosaic matrix is converted into an attention matrix by a single-headed attention layer, and the mosaic matrix is processed by the single-headed attention layer to form an attention matrix; in step S730 of some embodiments, the attention moment matrix is converted into a probability matrix by a preset function, the attention moment matrix is converted into a probability matrix by using a softmax function, and the sum of the probabilities of all columns is 1; in step S740 of some embodiments, an attention mechanism calculation is performed on the probability matrix to obtain a target matrix, and the target matrix is obtained by using the calculation of the single-headed attention mechanism.

In a specific application scenario, the goal of traffic speed data completion is to predict missing data using known incomplete traffic speed data. Taking into account real road speed data

Embodiments of the invention have an input feature with missing data points for

Where n represents the number of nodes, such as sensor stations or road segments, T represents the number of time steps of the day, x_ijThe observed data point representing node i at the jth time step. The embodiment of the invention additionally defines a mask matrix (also called a mark matrix)

As shown in equation (2):

for ease of understanding, the following is an example of a matrix form of the malformed traffic data and a corresponding mask matrix:

it can be seen that in the feature matrix, the data values at positions (1,3), (2,5), (3,2) are missing, and the missing data is represented by a question mark. The corresponding binary value is 1, while the other observed data points have a value of 0 in the respective mask matrix. The purpose of missing data incompleteness is to recover the missing data points from the existing observed data values.

And Y should be minimized, wherein

Is due to the results.

The framework of the attention-cycling neural network (ADRIN) is depicted in fig. 11. The embodiment of the present invention constructs two main data processing streams according to the time series characteristics of the traffic speed data, as shown in the left and right parts of fig. 11. The left module focuses on extracting temporal features from the incomplete input, i.e.

Considering that traffic data has strong periodic correlation, the embodiment of the invention additionally constructs a module on the right, which receives a historical average data matrix of the last five days before input

As an input. The outputs of these two parts are two implicit feature matrices, denoted separately as

And

the former contains timing information between the extracted incomplete time periods, while the latter contains periodic timing information. Finally, the output of the two modules is processed by a fusion module, and the completed data is obtained

In particular, embodiments of the present invention customize some advanced neural network methods from time series traffic speed data and integrate them into the ADRIN. In particular, embodiments of the present invention first improve the conventional LSTM and propose a long-short term memory completion network (LSTM-I) that can accept inputs with missing data. LSTM-I estimates missing values by forward prediction to generate a predicted feature map

In addition, the embodiment of the invention adopts a multi-head attention mechanism in the time sequence data processing, and respectively extracts X ^ and historical characteristics X^(a)And obtaining implicit feature matrices H and H^*。

In a specific application scenario, LSTM has achieved a number of achievements in the task of modeling the time series data, particularly in the context of time series prediction. Compared to normal RNNs, LSTM can prevent gradient explosions during learning. However, for existing LSTM networks, the incoming time series data must be complete. However, this requirement is not realistic in traffic data scenarios. Therefore, the embodiment of the invention reconstructs the existing LSTM network and provides an LSTM-I which is specially used for processing the input with missing data points.

As shown in fig. 11, at a time point t,

an incomplete observation is represented, and,

is a predicted value. When inputted

When missing data is included, embodiments of the present invention will

And

the concatenation is the current input. Specifically, the reconstructed LSTM-I uses the predicted value of the previous time segment to estimate the missing value of the current time segment. For each time segment, LSTM-I recovers the features using the estimated values and the observed values, as shown in equation (3):

wherein

Indicating the missing position of the t-th time step, an |, indicates a Hadamard product.

For each layer of LSTM-I, LSTM units with shared parameters per layer are used for the computation. For the t-th LSTMC unit, the unit state c including the previous time point is input_t-1Hidden feature h_t-1And input h_t-1. There are three kinds of gate control units in the LSTM unit, i.e. input gate i_tForgetting door f_tAnd an output gate o_tThey are used to decide whether to add/delete information to the cell state. The gates adaptively save the input information to a current storage state and compute hidden features

Where d is the size of the hidden feature output by the LSTM unit. The whole calculation process of the LSTM unit is shown as formula (4):

wherein

An activation vector is input for the cell and,

and

the weight matrix and the bias parameter of the cell are respectively;

weight matrices representing input, forgetting and output gates, respectively.

Is the bias parameter of the corresponding gate;

and

respectively are a weight matrix and an offset matrix of the memory cell; σ denotes the sigmoid activation function. In addition, the predicted value at the next time step

Is based on the hidden feature h_tCalculated, i.e.

Wherein

And

respectively weights and bias matrices.

In a particular application scenario, while LSTM-I is able to progressively estimate missing data, its timing-dependent capture of longer-time sequence data is limited, especially where traffic data often has hundreds of time periods during a day. Thus, embodiments of the present invention apply a multi-head attention mechanism to further extract the output of LSTM-I

And historical average X^(a)The characteristics of (1). The multi-head operation can be respectively learned in a plurality of subspaces, all obtained results are combined, dimension reduction is carried out through a full connection layer, and an output matrix with the same shape as an input time sequence is output.

For the calculation of the attention mechanism, embodiments of the present invention define the time series input as

(corresponds to

And X^(a)Transpose of (c). In the calculation process, there are three defined components, namely Query Q, Key K and Value V, and their definitions are shown in formula (5):

wherein

And

respectively, the parameter matrices of the corresponding parts. The calculation process of the attention matrix E is shown in equation (6):

here, the attention moment matrix is converted into a probability matrix using a softmax function, and the sum of the probabilities of all columns is 1. Thus, E_ijIndicating the influence of the ith time point on the jth time point. The output of the attention mechanism is represented by Z, and the calculation is shown in equation (7):

for the calculation of the multi-head attention mechanism, a paradigm is followed to let a plurality of attention mechanisms calculate respective output Z_iH, where H is the number of attention heads. This allows separate learning in different attention subspaces derived from the above equation, which can capture richer feature relationships. Finally, the present example combines all Z_iAnd outputting the result to a linear layer to obtain a final result, which is expressed as the formula (8):

Output＝concat(Z₁,...,Z_h)W_cformula (8)

Where Output is the final Output of MSL, corresponding to hidden states H and H^*，W_cIs the parameter weight of the linear layer.

In a specific application scenario, in order to combine the outputs of the two modules and calculate the final completion result, the embodiment of the present invention designs a fusion module, which includes an attention layer and a linear layer. The attention layer is a single-headed version of the MSL, whose input is the concatenation of two implicit states. After the attention layer calculation, the embodiment of the present invention uses a fully connected neural network in the linear layer to obtain the completion result, and the calculation is shown in formula (9):

wherein attention (-) represents equations (3) - (5) introduced in the multi-head attention system, W_lAnd b_lIs a parameter of the linear layer or layers,

is the final complement of ADRIN.

In a specific application scenario, to better train the different modules of the ADRIN, embodiments of the present invention define respective loss functions for the LSTM-I, the fusion module, and the final output. Taking into account ground real road speed data

And the estimated output

The embodiment of the invention defines a masking loss function

The expression is shown in formula (10):

wherein the content of the first and second substances,

is a mask matrix defined by the above embodiments to represent missing data points.

To ensure

(i.e., output of LSTM-I) and X^(？)Similarly, and to speed up the convergence of LSTM-I, embodiments of the present invention will lose the function

The definition is shown in formula (11):

combining the two sub-loss functions, the embodiment of the invention obtains the final loss function

Its expression is shown in formula (12):

in a specific application scenario, the embodiment of the present invention adopts three real-world traffic speed data sets, namely, NavInfo-Beijing: beijing (BJ), PeMS: the United states California highway Performance evaluation System (PeMSD5), and the Chinese hong Kong traffic speed (HK). Specifically, as shown in fig. 12, the data set of beijing chinese is provided by a four-dimensional traffic index platform, and includes the average speed of 1368 roads of beijing chinese during the period of 1 month and 1 day 00:00 in 2019 to 6 months and 30 days and 23:55 in 2019. The recorded sampling time interval was 5 minutes. In order to minimize the influence of data missing while preserving the complexity of the data set, the embodiment of the present invention performs an experiment using only the road speed data with the overall data missing rate less than 5%, i.e. 168 roads in total. In addition, embodiments of the present invention apply linear interpolation to supplement missing data and record its location and delete it during the evaluation phase. As shown in fig. 13, the PeMSD5 data set contains traffic speed data collected from the U.S. california transit bureau measurement system. The data set included 144 sensor stations in the fifth district of california, usa, recording time between 1 month, 1 day, 00:00 in 2013 and 30 months, 6 months, 30 days, 55 in 2013. It is worth mentioning that the records of the PeMSD5 were collected by sensors on the highway, which is different from the average traffic speed of urban roads in beijing and hong kong data sets in china. As shown in fig. 14, the hong kong data set in china includes average link speeds of major links in hong kong china from 3/10/00/2021 to 7/31/2021 at 23/50. Records from remote areas such as tungates and sand fields have not changed for a long time. Therefore, the embodiment of the invention only adopts the road speed in the hong Kong island of China consisting of 84 roads, and the sampling interval is 10 minutes. A summary of these data sets is given in table 1 below. The acquisition positions of some of the roads or sensors in the three data sets are depicted.

TABLE 1

	BJ	PeMSD5	HK
				Number of stages	168	144	84
Number of days	181	181	144
				Time interval	5min	5min	10min
Average velocity	36.70km/h	54.47mi/h	51.24km/h
				Standard deviation of	11.52km/h	7.32mi/h	20.19km/h

Z-score normalization is used in the present invention to preprocess the data. In the aspect of cross validation, the invention adopts the previous working method to divide three data sets into two non-overlapping subsets according to the time sequence, namely a training set and a testing set: the first 80% of all samples were training data and the remaining 20% were testing data. In addition, a data enhancement method is adopted in the training stage, and 10 times of missing input X are randomly generated for each sample Y in the training set^(？). Using Adam as the optimizer, the initial learning rate was 0.001. The number of training rounds was set to 200 and the batch size was 20. The number of heads H in the multi-head attention mechanism is set to 8 and the feature dimension d in LSTM-I is set to 168. PyTorch, hardware configuration for experiments included an nVidia RTX 2080Ti GPU and a Xeon Silver 4210 CPU.

In a specific application scenario, in order to fully evaluate the model in this case of research, the embodiment of the present invention has performed experiments on two deletion modes, namely MCAR and MNAR, respectively. Furthermore, to verify the validity of the model under different conditions, the examples of the invention compare ADRIN with the existing methods at different data loss rates of 10% to 90%. In addition, the adopted methods are derived from various completion methods of the noun resolution, and some of the methods are considered as the currently advanced missing data completion methods.

Among these methods, matrix factorization based methods (such as TRMF and BTMF) must strictly guarantee that the input format shape is day × time × road. In addition, the embodiment of the invention uniformly sets the input of the deep learning methods such as DSAE, PGAN, BRITS and ADRIN as missing data of one day, namely time multiplied by road. In addition, since it is difficult to ensure convergence of PGAN, the embodiment of the present invention sets the learning rate of the generator and the discriminator in {0.00001,0.0001,0.001,0.01} and applies a lattice search to obtain the best result. The hyper-parameters of the other models remain unchanged.

Table 1 shows the completion accuracy (MAE/MAPE (%)) of the data set in Beijing (BJ), table 2 shows the completion accuracy (MAE/MAPE (%)) of the data set in the fifth region of california (PeMSD), table 3 shows the completion accuracy (MAE/MAPE (%)) of the data set in Hong Kong (HK), and the Root Mean Square Error (RMSE) and the Mean Absolute Percentage Error (MAPE) are selected as the indexes in the embodiment of the present invention. The three tables above summarize the results of the complementation by MCAR and MNAR, respectively, with deletion rates between 10% and 90% for these three data sets. Experimental results show that the proposed ADRIN achieved the lowest RMSE and MAPE values in most cases, over all other approaches. On complex data sets, such as the urban road speed data beijing and hong kong, china, the results are very prominent, while on the simpler highway speed data set in the fifth district of california, usa, ADRIN can achieve results on MCAR close to SOTA. This is because ADRIN can extract timing dependencies from the incomplete input and the historical average. In particular, statistical learning-based methods such as HA and KNN produce complementary results that are significantly worse than ADRIN. This is because such methods rely only on simple sample distributions within the data set to supplement missing values, and temporal correlations cannot be extracted, resulting in poor performance. Furthermore, when large-scale missing data exists, some missing points make the KNN method difficult to apply efficiently.

TABLE 2

TABLE 3

TABLE 4

Considering the deep learning based approach, the proposed ADRIN outperforms advanced DSAE, PGAN and BRITS. Due to the superior ability to model the temporal sequence dependence of missing input and historical data separately, ADRIN achieves significant improvements over other methods, regardless of whether the missing data is completely random, even at high rates of deletion. For example, in both cases, ADRIN reduced MAPE by 7.16% and 7.77% at 90% deletion rate and 6.72% and 7.64% at 10% deletion rate, respectively, compared to other deep learning-based methods on the beijing dataset in china, such as BRITS. This indicates that due to the combination of historical features, ADRIN can achieve a more significant improvement at high deletion rates than at low deletion rates. The same phenomenon can also be observed on the hong kong dataset in china. Furthermore, since DASE and PGAN only model features using linear modules, they can only extrapolate acceptable results on the California fifth area data set of the United states with a simple data distribution. However, the proposed ADRIN takes into account the time dependency and can cope with missing data of urban roads and expressways at the same time.

Different from a matrix factorization-based method, the deep learning method has more advantages for modeling of complex data such as urban road speed and the like, and the models obtain more excellent completion effects on the whole. However, deep learning models are highly dependent on data quality, and the high miss rate makes it difficult to capture the correlation of temporal features. For MCAR as an example, BTMF achieves the best performance over 90% of the fifth california data set because highway speeds are simpler to distribute and easier to learn than urban roads. In addition, the matrix factorization based approach is robust in all miss rates. This is because these methods estimate missing values from the overall data distribution. For example, on the beijing dataset of china, the MAPE results for ADRIN were 0.60% (MCAR) and 0.43% (MNAR) in absolute difference for all deletion rates, but the differences for BGCP were 0.21% (MCAR) and 0.26% (MNAR).

Finally, comparing the completion results of MCAR and MNAR, it can be seen that all methods have lower MAPE than MNAR under MCAR. Modeling the feature dependence of data in MNARs is challenging due to the large number of missing blocks in succession, resulting in MNARs with correspondingly poorer performance than MCAR. That is, when the deletion rates are the same, all models perform better under MCAR because the distribution of observations is more spread and therefore more accurately represents the overall data. However, in the case of MNAR, ADRIN can achieve more pronounced attribution results at high miss rates than DSAE and BRITS, since ADRIN uses historical mean data.

An embodiment of the present invention further provides a data completion apparatus, which can implement the data completion method, and the apparatus includes:

the system comprises an original data set preprocessing module, a data preprocessing module and a data processing module, wherein the original data set preprocessing module is used for preprocessing data of an original data set so as to classify the original data set into a defective data set and a historical data set;

The specific implementation of the data completion apparatus of this embodiment is substantially the same as the specific implementation of the data completion method, and is not described herein again.

An embodiment of the present disclosure further provides an electronic device, including:

at least one memory;

at least one processor;

at least one program;

programs are stored in the memory and the processor executes at least one of the programs to implement the data completion method of the present invention as described above. The electronic device can be any intelligent terminal including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA for short), a vehicle-mounted computer and the like.

Referring to fig. 15, fig. 15 illustrates a hardware structure of an electronic device according to another embodiment, where the electronic device includes:

the processor 1501 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present invention;

the memory 1502 may be implemented in the form of a ROM (read only memory), a static memory device, a dynamic memory device, or a RAM (random access memory). The memory 1502 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 1502 and called by the processor 1501 to execute the data completion method of the embodiments of the present disclosure;

an input/output interface 1503 for realizing information input and output;

the communication interface 1504 is used for realizing communication interaction between the device and other devices, and can realize communication in a wired mode (such as a USB, a network cable and the like) or in a wireless mode (such as a mobile network, WIFI, bluetooth and the like);

a bus 1505 that transfers information between various components of the device (e.g., the processor 1501, memory 1502, input/output interface 1503, and communication interface 1504);

wherein the processor 1501, the memory 1502, the input/output interface 1503, and the communication interface 1504 are communicatively connected to each other within the device via a bus 1505.

The embodiment of the present disclosure also provides a storage medium, which is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, where the computer-executable instructions are used to enable a computer to execute the data completion method.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not constitute a limitation to the technical solution provided in the embodiment of the present invention, and it is known to a person skilled in the art that, with the evolution of the technology and the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 4 to 10 do not constitute a limitation of the embodiments of the present invention, and may include more or less steps than those shown, or combine some steps, or different steps.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not intended to limit the scope of the embodiments of the invention. Any modifications, equivalents, and improvements within the scope and spirit of the embodiments of the present invention may occur to those skilled in the art and are intended to be within the scope of the embodiments of the present invention.

Claims

1. A data completion method, comprising:

acquiring an original data set;

performing data preprocessing on the original data set to classify the original data set into a incomplete data set and a historical data set;

2. The data completion method according to claim 1, wherein said performing temporal feature extraction on the incomplete data set to obtain a first time series comprises:

inputting the incomplete data set into a preset long-term and short-term memory completion network;

extracting observation data in the incomplete data set at a preset time period;

3. The data completion method according to claim 1, wherein said performing temporal feature extraction on said historical data set to obtain a second time series comprises:

4. The data completion method according to claim 1, wherein said performing a multi-point attention mechanism calculation on the first time series to obtain a first output matrix comprises:

and performing dimensionality reduction on the first feature matrix to obtain a first output matrix.

5. The data completion method according to claim 1, wherein said performing a multi-point attention mechanism calculation on the second time series to obtain a second output matrix comprises:

inputting the second time series to a second multi-headed attention layer;

6. The data completion method according to any one of claims 1 to 5, wherein the fusing the first output matrix and the second output matrix to obtain the completion data comprises:

7. The data completion method according to claim 6, wherein the calculating an attention mechanism of the mosaic matrix through a single-head attention layer to obtain a target matrix comprises:

inputting the stitching matrix into a single-headed attention layer;

8. A data complementing device, comprising:

9. An electronic device, comprising:

at least one memory;

at least one processor;

at least one program;

the programs are stored in a memory, and a processor executes the at least one program to implement:

the data completion method according to any one of claims 1 to 7.

10. A storage medium that is a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform:

the data completion method according to any one of claims 1 to 7.