CN113298131A

CN113298131A - Attention mechanism-based time sequence data missing value interpolation method

Info

Publication number: CN113298131A
Application number: CN202110533285.5A
Authority: CN
Inventors: 季微; 金博斌; 李云
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-08-24
Anticipated expiration: 2041-05-17
Also published as: CN113298131B

Abstract

The invention discloses an attention mechanism-based interpolation method for missing values of time sequence data, which comprises the following steps: acquiring time sequence data with missing values; inputting the time sequence data with the missing value into a generator after training, and acquiring the time sequence data after interpolation; wherein the training of the generator comprises: inputting the time sequence data with missing values into a generator, and acquiring complete time sequence data based on an attention mechanism; and inputting the time sequence data with the missing values and the complete time sequence data into a discriminator, and carrying out antagonistic training on the discriminator and the generator based on the loss function. The invention can generate new time series data which are in accordance with the distribution of the original data set. The expression ability of important features in the features can be enhanced and the expression ability of unimportant features can be reduced by trial attention mechanism, and the processing efficiency can be improved. By the method, the accuracy of the interpolation of the missing values of the time series can be improved, and the interpolation efficiency can be improved.

Description

Attention mechanism-based time sequence data missing value interpolation method

Technical Field

The invention relates to a time sequence data missing value interpolation method based on an attention mechanism, and belongs to the technical field of computer science.

Background

In recent years, with the development of artificial intelligence technology, the frequency of occurrence of time series data in human life is also increasing. The time sequence data is a sequence formed by arranging numerical values of the same statistical index according to the occurrence time sequence, and reflects the state change and the development rule of objects and behaviors along with the change of time. The common time series includes some medical data, such as the change of blood sugar value of the diabetic patient with time in one day, and also includes the change of website access amount, road traffic amount and the like in different times.

The acquired data is often lost due to instability or interference of the data acquisition equipment. The absence of time series data can cause certain difficulties in the analytical modeling and practical application of the data. For example, when predicting future weather conditions based on historical weather conditions, if historical data is missing, the accuracy of the prediction will be affected. Therefore, an accurate and effective method for interpolating missing values of an incomplete data set is needed to obtain a complete data set that can be infinitely close to the real data.

In recent years, deep learning has been a great success in the field of artificial intelligence, and is rapidly becoming the leading technology in the field of artificial intelligence. The deep learning prediction model needs to perform network training and parameter optimization based on a complete data set to learn the historical change rule of data. The missing part of the data set often implies the historical change rule of the data, and as the missing part of the data set causes that the model cannot be completely driven, the finally trained parameters have larger difference with the optimal parameters. Therefore, the technology for effectively processing the missing value of the time series data has great research significance and practical application.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a time sequence data missing value interpolation method based on an attention mechanism, and solves the technical problem that in the traditional generation countermeasure network structure, the input of a generator is a random vector, the random vector is directly used for filling the time sequence data missing value, a large amount of time is consumed for searching the optimal input vector for each piece of time sequence data, and therefore the interpolation efficiency of the time sequence data missing value is greatly reduced.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

the invention provides a time sequence data missing value interpolation method based on an attention mechanism, which comprises the following steps of:

acquiring time sequence data with missing values;

inputting the time sequence data with the missing value into a generator after training, and acquiring the time sequence data after interpolation;

wherein the training of the generator comprises:

inputting the time sequence data with missing values into a generator, and acquiring complete time sequence data based on an attention mechanism;

and inputting the time sequence data with the missing values and the complete time sequence data into a discriminator, and carrying out antagonistic training on the discriminator and the generator based on the loss function.

Further, the generator is a noise reduction self-encoder comprising an encoder unit and a decoder unit.

Further, the acquiring the complete time series data includes:

the encoder unit outputs a hidden vector H of original time sequence data x according to the input original time sequence data x with a preset length of m and a missing value and a random noise vector eta;

the decoder unit interpolates the missing value of the original time sequence data x according to the original time sequence data x and the hidden vector H in combination with an attention mechanism to acquire complete time sequence data

Further, the missing value of the original time sequence data x is interpolated to obtain complete time sequence data

The method comprises the following steps:

according to the original time sequence data S of the nth time_n-1Combining the hidden vector H at each moment with an attention mechanism to obtain a weight factor alpha at each moment;

performing weighted average on all hidden vectors H according to the weight factor alpha at each moment;

substituting the result of the weighted average into a hyperbolic tangent function to obtain a complete value S of the original time sequence data at the nth moment_n；

The complete value S of the original time sequence data at the nth moment_nSubstituting the original time sequence data as the n +1 th time sequence data into the step, and circularly calculating to obtain the complete value S of the original time sequence data at all times;

acquiring complete time sequence data according to the complete value S of the original time sequence data at all the moments

Wherein H ═ { H ═ H₁，H₂，H₃，…，H_i，…，H_m}，H_iAs a hidden vector at time i, α ═ α₁，α₂，α₃，…，α_i，…，α_m}，α_iAs weighting factor at time i, S ═ S₁，S₂，S₃，…，S_n，…，S_m}，S_nIs the complete value of the original time series data at the nth time, S₀＝H_mI.e. the initial input vector of the decoder is S₀。

Further, the obtaining the weighting factor α at each time instant includes:

K_i＝W_k·H_i

Q_n-1＝W_q·S_n-1

wherein, K_iFor the ith key value, H in the attention mechanism_iA hidden vector at the ith moment; q_n-1Is the (n-1) th query value, S in the attention mechanism_n-1Is the original time sequence data of the n-1 th time; w_kAnd W_qIs a parameter matrix, W, learned from training data_kAnd W_qThe initial value of the network is obtained by a random initialization parameter matrix and is updated by generating a loss function of a countermeasure network and a back propagation algorithm;

order:

wherein,

is a matrix K_iThe transposed matrix of (2);

wherein alpha is_iFor the weighting factor at the ith time, the softmax function is a normalized exponential function for the input

The values are mapped to positive outputs between 0-1 and the sum of their all outputs is 1.

Further, the weighted average result is substituted into the hyperbolic tangent function to obtain the complete value S of the original time sequence data at the nth moment_nThe method comprises the following steps:

the result of the weighted average is:

C_n-1＝α₁H₁+α₂H₂+…+α_iH_i+…+α_mH_m

order:

wherein, the tanh function is a hyperbolic tangent function, and the expression is as follows:

ω_nand b_nIs a parameter, ω, learned from training data_nAnd b_nIs obtained from random initialization parameters and updated by generating a loss function against the network and a back propagation algorithm.

Further, the performing antagonistic training on the arbiter and the generator comprises:

training the discriminator with a loss function:

wherein D (x) is a probability value that the discriminator determines the input original time sequence data x as true,

for complete timing data of the discriminator pair input

A probability value determined to be true;

training the generator with the loss function:

and repeating for multiple times until the probability output by the discriminator is close to 0.5, and stopping training.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a time sequence data missing value interpolation method based on an attention mechanism, which utilizes a generation countermeasure network and combines the attention mechanism to provide a time sequence data missing value filling method based on the attention mechanism. Through antagonistic training, the method can generate new time sequence data which are consistent with the distribution of the original data set. The expression ability of important features in the features can be enhanced and the expression ability of unimportant features can be reduced by trial attention mechanism, and the processing efficiency can be improved. By the method, the accuracy of the interpolation of the missing values of the time series can be improved, and the interpolation efficiency can be improved.

Drawings

FIG. 1 is a detailed block diagram of a generation countermeasure network generator in an embodiment of the invention;

FIG. 2 is a detailed block diagram of a generation countermeasure network arbiter in an embodiment of the invention;

fig. 3 is a flowchart illustrating a missing value interpolation process of time series data according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The invention takes a data set KDD CUP 2018Dataset (KDD for short) as an example, and the steps of the method for performing missing value interpolation are described in detail. The dataset is a meteorological dataset from a KDD cup 2018 challenge. The data set comprises historical meteorological data of Beijing and consists of data of a plurality of meteorological observation stations respectively located in different places of Beijing. Data from 11 of the meteorological stations are selected, each comprising a record of weather and air quality data every other hour from 1/2017 to 30/12/2017. Specifically, 12 attributes including PM2.5, PM10, carbon monoxide and temperature were recorded.

As shown in fig. 3:

step 1: a generator for generating the countermeasure network is constructed according to the figure 1, wherein the generator is a noise reduction self-encoder and comprises two parts of an encoder and a decoder. Inputting an original time sequence X (with missing data, X in figure 1) with length m and a random noise vector eta into a noise reduction self-encoder part of an encoder to output a hidden vector H_mI.e. by

H_m＝Encoder(x+η)

Both the encoder and decoder are constructed of Recurrent Neural Network (RNN) units. In order to have a certain difference with original data, random noise eta is added in the process of reconstructing data by a noise reduction self-encoder unit so as to reduce the occurrence of an overfitting phenomenon and reconstruct more accurate data.

Step 1.1: the original time series with the deletions are input into the encoder shown in fig. 1 in chronological order.

Step 1.2: random noise η is added to the original time series data x (with a deletion).

Step 1.3: the encoding process is done by the encoder portion of the noise reduction self-encoder,

the length of the original time series is m. The encoder has m total RNN hidden layer outputs, which are respectively H₁,H₂,H₃,...,H_m. In fig. 1, a time sequence with a time length of 4 and an attribute number of 4 is illustrated, each row represents an attribute, each column represents a time, so that the time is in a 4 × 4 matrix form, and the last hidden layer output of RNN is H₄。

Step 2: by step 1 we get the output of each hidden layer of the encoder part RNN unit, but finally we input into the decoder part only the output H of the last hidden layer_m. Order S₀＝H_mI.e. the initial input vector of the decoder is S₀It contains the information of the original time column entered. The decoder of the noise reduction self-encoder outputs complete interpolated time series data step by step according to the time sequence. The following explains the decomposition of data interpolation into sub-steps at each time.

Step 2.1: first we want to make an initial time t₁For data interpolation, we need to calculate the initial input vector S of the decoder₀And the implicit vector H output by the encoder at each moment₁,H₂,H₃,...,H_mThe correlation of the values, the weight obtained is denoted as alpha_i,1≤i≤m。α₁To alpha_mAre real numbers between 0 and 1. The following is a detailed description₁To alpha_mThe calculation procedure of (1).

Step 2.1.1: calculating parameters according to the output of each hidden layer in the step 1

K_i＝W_k·H_i，i＝1,2,3,...,m

Q₀＝W_q·S₀

Wherein, K_iI.e. the key value, Q, in the attention mechanism₀I.e. the query value inside the attention mechanism, the subscript representing the number one. "." denotes a multiplication operation. W_kAnd W_qIs a parameter matrix obtained by learning from training data, and a parameter matrix W is initialized randomly_kAnd W_qThen, an update is performed, and the parameter matrix is updated by generating a loss function of the countermeasure network and a back propagation algorithm, which is introduced in the subsequent step 3.

Order to

Where "T" denotes a transpose operation,

i.e. the matrix K_iThe transposed matrix of (2).

S₀The weight of the hidden state m-1 times before the RNN unit of the encoder is

Wherein, the softmax function is also called as normalization index function and is input

To

The values are mapped to positive numbers between 0 and 1 and the sum of the output result values adds up to 1.

Step 2.1.2: using the weight factor alpha obtained in step 2.1.1_iAnd (3) performing weighted average on all historical hidden vectors H of the encoder in the step 1 to obtain:

C₀＝α₁H₁+…+α_mH_m

further, let

the tanh function is a hyperbolic tangent function, which is defined by the formula:

ω₁and b₁Is a parameter obtained by learning from training data, and a parameter omega is initialized randomly₁And b₁And then updating, namely updating the parameters by generating a loss function of the countermeasure network and a back propagation algorithm in the third step.

This results in the output S of the first unit of the decoder RNN₁I.e. initial time t₁The complete data value of. Up to this point, the initial time t₁Has been completed, and then proceeds to the next time t₂The interpolation of (3).

Step 2.2: recalculating the Current State S₁Weights to m H states of the encoder, denoted as α₁，α₂，…,α_mIt is worth noting that although the weight α has been previously calculated, this time since the S state is from S₀Update to S₁Therefore, the new weight α representing the current state S cannot be directly used₁And m H encoders. The new calculation method of alpha differs from the previous one in that here the parameter Q is Q₁Namely:

K_i＝W_k·H_i，i＝1,2,3,...,m

Q₁＝W_q·S₁

calculating vector C by formula₁：

C₁＝α₁H₁+…+α_mH_m

By the formula:

obtaining an output S of the second unit of the decoder RNN₂Instant t₂The complete data value of. At this point, time t₂The data interpolation of (2) is completed, and then the data interpolation at the subsequent time is performed.

Step 2.3, the steps are carried out in sequence according to the method, and the parameter Q is updated at each moment according to the formula in the step 2.2_i，α_i，C_iCalculating the time t by the updated parameter_iIs output S_iInstant t_iAt the last time t_mStop, t_mThe output of the time is S_mI.e., the time series data value of time m. Finally, complete interpolated time series data can be obtained.

And 3, inputting the complete time sequence and the original time sequence generated by the generator into a discriminator, wherein the structure of the discriminator is shown in figure 2, and a probability value is output by the discriminator and represents the probability that the generated sequence is a real sequence. The interpolated complete time data can be obtained by the antagonistic training of the discriminator and the generator. The specific training method is to fix the generator first, and utilize the loss function:

training the arbiter, D (x) is the probability that the arbiter determines true for the input true original missing timing dataThe value (P true in figure 2),

the complete time series data representing the generation of a false,

is a probability value at which the discriminator determines that the input false complete time series data is true. Then using the loss function:

and a training generator, wherein lambda is a hyper-parameter, and the training is repeated for a plurality of times until the probability output by the discriminator is close to 0.5.

In summary, the time-series missing value interpolation method provided by the present invention uses the generation of the countermeasure network as a basic framework. In the conventional generation countermeasure network structure, the input of the generator is a random vector, the random vector is directly used for filling missing values of time sequence data, and a large amount of time is consumed for searching an optimal input vector for each piece of time sequence data, so that the interpolation efficiency of the missing values of the time sequence data is greatly reduced.

The time sequence missing value interpolation method provided by the invention abandons the step of searching the optimal input vector for each piece of time sequence data in the traditional generation countermeasure network frame, and obtains the low-dimensional feature expression vector of the time sequence data by adopting the noise reduction self-encoder, and then reconstructs the complete time sequence data by utilizing the low-dimensional feature expression vector, thereby saving a large amount of training time.

The generator portion used in the present invention to generate the countermeasure network employs a noise reduction autoencoder. A noise-reducing self-encoder is a neural network that uses a back-propagation algorithm to make the output value equal to the input value. A noise-reducing self-encoder includes an encoder and a decoder that compress an input into a latent spatial representation and then reconstruct an output from the representation. The noise reduction self-encoder is similar to a perception mechanism of human eyes, when the human eyes observe an object, even if partial outline of the object does not enter a visual field, the human eyes can identify the type of the object, so that the noise reduction self-encoder can learn low-dimensional feature expression vectors of input data and can also be used for repairing lost data. It is because its input data is not complete in nature and can be applied naturally to the missing value padding algorithm.

The encoder part of the noise reduction self-encoder is used for automatically generating corresponding low-dimensional vectors for each piece of original time sequence data with missing values, then the generated low-dimensional vectors are used as input of a decoder part of the noise reduction self-encoder, the decoder performs interpolation on the time sequence data according to the time sequence of the time sequence data, the next time is entered after the interpolation at the current time is completed, weights are automatically distributed to the whole network based on an attention mechanism, the weighted average of all hidden vectors of the encoder can be obtained by utilizing the weights, the low-dimensional feature vectors output by the encoder can be more suitable for the data interpolation at the current time, the problem of long-time sequence information forgetting is solved, and the interpolation accuracy is improved.

The complete time sequence data generated by the generator and the time sequence data with missing in the original data set are input into the discriminator together, and after the antagonistic training of the generator and the discriminator is completed, the generator can be considered to be capable of generating a new sample which is in accordance with the distribution of the original data set according to the original time sequence data x, namely the original time sequence data after missing value interpolation.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A time series data missing value interpolation method based on an attention mechanism is characterized by comprising the following steps:

acquiring time sequence data with missing values;

wherein the training of the generator comprises:

2. The method of claim 1, wherein the generator is a de-noising self-encoder comprising an encoder unit and a decoder unit.

3. The method of claim 2, wherein the obtaining complete time series data comprises:

4. An attention mechanism based time series data deficit according to claim 3The method for interpolating the missing value is characterized in that the missing value of the original time sequence data x is interpolated to obtain complete time sequence data

The method comprises the following steps:

Wherein H ═ { H ═ H₁，H₂，H₃，…，H_i，…，H_m}，H_iAs a hidden vector at time i, α ═ α₁，α₂，α₃，…，α_i，…，α_m}，α_iAs weighting factor at time i, S ═ S₁，S₂，S₃，…，S_t，…，S_m}，S_tIs the complete value of the original time series data at the t-th time, S₀＝H_mI.e. the initial input vector of the decoder is S₀。

5. The method as claimed in claim 4, wherein the obtaining the weighting factor α for each time instant comprises:

K_i＝W_k·H_i

Q_n-1＝W_q·S_n-1

order:

wherein,

is a matrix K_iThe transposed matrix of (2);

6. The method as claimed in claim 4, wherein the weighted average is substituted into the hyperbolic tangent function to obtain the complete value S of the original time series data at the nth time_nThe method comprises the following steps:

the result of the weighted average is:

C_n-1＝α₁H₁+α₂H₂+…+α_iH_i+…+α_mH_m

order:

7. The method of claim 3, wherein the training the discriminator and the generator in a confrontational manner comprises:

training the discriminator with a loss function:

for complete timing data of the discriminator pair input

A probability value determined to be true;

training the generator with the loss function: