CN112417000B

CN112417000B - Time sequence missing value filling method based on bidirectional cyclic codec neural network

Info

Publication number: CN112417000B
Application number: CN202011295072.5A
Authority: CN
Inventors: 邬惠峰; 丘嘉晨; 孙丹枫
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2022-01-07
Anticipated expiration: 2040-11-18
Also published as: CN112417000A

Abstract

The invention provides a time sequence missing value filling method based on a bidirectional cyclic codec neural network. The method combines a self-encoder and a recurrent neural network, and can realize modeling of a time sequence containing a missing value; the method measures the difference between a filling sequence and a label sequence through two training losses, and reversely updates an encoder and a decoder in an asynchronous mode; the method amplifies the reaction of the network to missing data by coordinating the gating units. The method solves the problems that the general method can not correctly model the time-space relationship of the time sequence containing the missing value and the filling effect is sensitive to the change of the missing rate.

Description

Time sequence missing value filling method based on bidirectional cyclic codec neural network

Technical Field

The invention relates to the field of artificial intelligence, in particular to a time sequence missing value filling method based on a bidirectional cyclic codec neural network.

Background

In application tasks of the multidimensional time series of the industrial Internet of things, such as context recognition, predictive maintenance, anomaly detection and the like, a complete time series is a precondition for smooth execution of the tasks. However, a large number of device accesses and environmental instability problems lead to missing values that are prevalent in the multidimensional time series of the industrial internet of things. The existing multidimensional time series filling means comprise mean filling, cluster filling, regression filling and the like. The mean filling effect depends on the difference between data points, the filling precision is not high, and large deviation is easily caused particularly when continuous missing occurs. A clustering filling-based method, such as a fuzzy C-means method, cannot model a space-time relationship, and the filling effect is greatly influenced by the deletion rate. With a regression filling-based method, such as recurrent neural network regression, the robustness of the training process to the missing values is poor, and the spatiotemporal relationship in the time sequence containing the missing values cannot be correctly modeled. Aiming at missing value filling of a multidimensional time sequence, a filling method which can model a space-time relation in the time sequence containing the missing value and has a filling effect insensitive to the change of the missing rate is urgently needed.

Disclosure of Invention

The invention aims to provide a time sequence missing value filling method based on a bidirectional cyclic codec neural network. The method can overcome the defects that the time-space relationship in the time sequence containing the missing value is difficult to be correctly modeled by the existing time sequence filling method, and the filling effect is greatly influenced by the change of the missing rate.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a time sequence missing value filling method based on a bidirectional cyclic codec neural network can effectively model the space-time relationship in a time sequence containing missing values and improve filling performance and stability. The method comprises the following steps:

step S1: and taking a continuous time sequence with a fixed length and without missing values from the historical database as a tag sequence, and taking the tag sequence as an input sequence after artificially creating missing points.

Step S2: and inputting the input sequence into a neural network of a bidirectional cyclic coder-decoder to obtain an output sequence.

Step S3: and calculating the training loss of the output sequence and the label sequence of the bidirectional cyclic codec neural network, and reversely updating the neural network.

Step S4: after the neural network model is trained, the time sequence containing the missing value is input into the neural network of the bidirectional cyclic codec, and the obtained output sequence is the filled time sequence.

In step S1, the tag data is represented as X ═ { X ═ X¹,x²,…,x^t,…,x^T}，

Where T represents the time series length and D represents the number of data attributes. The method for artificially creating the missing points comprises the following steps:

in the formula (I), the compound is shown in the specification,

a value representing the d-th attribute at the t-th instant in the time series of artificial deletions,

representing the value of the d-th attribute at the t-th time of the tag data.

Is a value other than 0 or 1,

to represent

In the absence of any of the above-described agents,

to represent

Are not deleted. By using

To measure the deletion degree of the time series.

In step S2, the input sequence is input into the bidirectional cyclic codec neural network to obtain a calculation process of the output sequence, which includes the following steps:

step S21: obtaining an input sequence

Wherein T represents the time series length and D represents the number of data attributes;

step S21: calculating the average value of each attribute in the input time sequence

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is a value other than 0 or 1,

to represent

In the absence of any of the above-described agents,

to represent

Is not deleted;

step S22: from T-1 to T-T, the output of the encoder network is obtained by iterating the following equation:

wherein, W_s，γ，b_sFor learnable network weights, W_sIs D × N in shape_hThe shape of gamma is 1X 1, b_sIs in the shape of 1 x D,

shape N of_h×N_h. Represents a matrix product,. indicates a Hadamard product,. sigma. indicates a sigmoid function. In particular, it is possible to use, for example,

is an all 0 matrix. f. of_GR(.) represents a cyclic body function containing coordinated gating cells. Recording s at each time t^tS ═ S¹,s²,…s^t,…,s^T}，

Where T represents the time series length and D represents the number of data attributes. Finally, the encoder network is marked as

Wherein theta is_eRepresenting all trainable weights in the encoder network,

represents an input sequence;

step S23: from T to 1, the output of the decoder network is obtained by iterating the following equation:

wherein, W_y，b_yIs a learnable network weight, W_yIs D × N in shape_h，b_yIs in the shape of 1 x D,

shape N of_h×N_h。

Representing a standard round function for the decoder network. Record y at each time t^tValue of (a), Y ═ Y¹,y²,…,y^t,…,y^T}，

Where T represents the time series length and D represents the number of data attributes. Order to

Wherein

Representing a non-linear function. Finally, the encoder network is marked as

Wherein theta is_dRepresents all trainable weights in the decoder network, S represents the input sequence from the encoder;

step S24: the output of the bi-directional cyclic codec neural network is

Where T represents the time series length and D represents the number of data attributes. Calculated by the following formula

In step S22, a cyclic function including a coordinated gating unit is expressed by the following formula:

wherein, W_gIs a learnable network weight W_gIs in the shape of (N)_h+D)×N_h，[.]The splicing operation of the matrix is represented.

Standard loop body functions for the encoder network are shown, such as the LSTM loop body and the GRU loop body.

In step S3, the procedure of calculating the training loss and updating the network in the reverse direction includes the following steps:

step S31: obtaining tag data X ═ { X ═ X¹,x²,…,x^t,…,x^T}，

Where T represents the time series length and D represents the number of data attributes. Calculation encoder

Loss of power

And decoder

Loss of power

The expression is as follows:

step S32: step S22 is executed to obtain S ═ S¹,s²,…,s^T}, calculating the gradient

Updating theta by gradient descent method_e。

Step S33: step S22 is executed to obtain S ═ S¹,s²,…,s^T}，

Step S34: step S23 is executed to obtain

Calculating gradients

Updating theta by gradient descent method_d。

Step S35: steps S31-S34 are performed until

And (6) converging.

In step S4, in the practical application process, the time-series missing value padding process is:

step S41: splitting the time series to be filled into N parts { XR }according to fixed time series length T₁,XR₂,…,XR_i,…}。

Step S42: from i-1 to i-N, respectively

Step S43: obtaining a filled time series

Compared with the prior art, the invention has the following technical effects:

1. due to the use of the automatic encoder network and the recurrent neural network and the iterative process described in the step S22, the invention can correctly model the spatiotemporal relationship in the time sequence containing the missing value.

2. The invention uses the coordinated gate control unit, amplifies the response of the neural network to the missing data, and can reduce the influence of the change of the missing rate on the filling effect.

3. The invention can weaken gradient explosion and diffusion phenomena in the training process and improve the interpretability of the encoder network and the decoder network due to the asynchronous reverse updating process of the encoder network and the decoder network.

4. The invention does not limit the characteristics of the time sequence to be filled and introduce prior knowledge about the time sequence to be filled, and is suitable for rapid deployment and application in a scene.

Drawings

Fig. 1 is a schematic view of a scene of filling missing values in a multidimensional time series of a sensor according to an embodiment of the present application;

fig. 2 is a schematic diagram of a bi-directional cyclic codec based neural network according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a cyclic network of coordinated gate units according to an embodiment of the present application.

Detailed Description

All terms relating to artificial intelligence used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

The invention provides a time sequence missing value filling method based on a neural network of a bidirectional cyclic coder-decoder, which is used for filling a sensor time sequence. The embodiment provided by the application is a scene schematic diagram of filling missing values of a multidimensional time series of a sensor as shown in fig. 1. The acquisition equipment is connected with a plurality of monitoring equipment, the data are acquired at short intervals and uploaded to the server at long intervals, and the data received by the server can be regarded as a time sequence with equal length. If the time sequence is not lost, the server directly stores the sequence into a historical database; if the data contains missing data, the server inputs the sequence into a filling module, and the filling module stores the filled result into a historical database. The monitoring device may be a sensor or a controller, among others. The collecting device can be a gateway, a field controller and the like, and the collected data can be sensor reading or controller signals and the like. Illustratively, an embedded programmable logic controller (ePLC) is widely used in various industrial fields due to its easy programming and high reliability, and is suitable as an acquisition device. The server may be a stand-alone server or a service in a cloud service platform. It should be noted that the technical solution of the present application may be applied to the above application scenarios, is not applicable to limiting the present solution, and may also be applied to other scenarios requiring time-series missing value padding.

The technical solution of the embodiments of the present application will be described in detail below by taking a specific base station implementation as an example. A base station is an important infrastructure for mobile communication, in which a large number of sensors record the operating state of the base station. And selecting a total of 7 sensors from the base station, wherein the sensors are respectively a first ballast current, a first ballast temperature, a second ballast current, a second ballast temperature, a third ballast current, a third ballast temperature and a base station environment temperature, and the acquisition period is 30 minutes. The historical database stored a total of 17520 pieces of data from 2018, month 2 to 2019, month 2, each piece of data contained 7 attributes consisting of 7 sensor readings and contained 18.7% of the data attribute missing. The uploading period of the acquisition equipment is 7 hours, namely 14 pieces of data.

In this embodiment, the method for filling missing values in a time series based on a neural network of a bi-directional cyclic codec includes the following steps:

step S1: and (3) taking a continuous time sequence which does not contain the missing value and has the length of 14 out of the historical database as a tag sequence, and taking the tag sequence after artificially creating the missing point as an input sequence.

In the present embodiment, in the step S1, the tag data is represented as X ═ { X ═ X¹,x²,…,x^t,…,x¹⁴}，

The method for artificially creating the missing points comprises the following steps:

in the formula (I), the compound is shown in the specification,

a value indicating the d-th attribute at the t-th time in the tag sequence indicates a Hadamard product.

Is a value other than 0 or 1,

to represent

In the absence of any of the above-described agents,

to represent

Is not deleted;

in this embodiment, in step S2, the training data is input into the computation process of the bi-directional cyclic codec neural network to obtain the output sequence, and the steps are as follows:

step S21: obtaining training sequences

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is a value other than 0 or 1,

to represent

In the absence of any of the above-described agents,

to represent

Are not deleted.

Step S22: from t 1 to t 14, the output of the encoder network is obtained by iterating the following equation:

wherein, W_s，γ，b_sFor learnable network weights, W_sIs 7X 64, gamma is 1X 1, b_sThe shape of (a) is 1 x 7,

the shape of (1) is 64 × 64. Denotes a matrix product,. indicates a Hadamard product,. sigma. indicates a sigmoid function. In particular, it is possible to use, for example,

is an all 0 matrix. f. of_GR(.) represents a cyclic body function containing coordinated gating cells. Recording s at each time t^tS ═ S¹,s²,…s^t,…,s¹⁴}，

Finally, the encoder network is marked as

Wherein theta is_eRepresenting all trainable weights in the encoder network.

Step S23: from t-14 to t-1, the output of the decoder network is obtained by iterating the following equation:

wherein, W_y，b_yIs a learnable network weight, W_yIs 7X 64, b_yThe shape of (a) is 1 x 7,

the shape of (1) is 64 × 64.

Representing a standard round function for the decoder network. In this embodiment, a GRU loop body is used. Recording the value of yt at each instant t, Y ═ Y¹,y²,…y^t,…,y¹⁴}，

Order to

Wherein

Finally, the encoder network is marked as

Wherein theta is_dRepresenting all trainable weights in the decoder network and S represents the input sequence from the encoder.

Step S24: the output of the bi-directional cyclic codec neural network is

Calculated by the following formula

In this embodiment, in step S22, the cyclic function including the coordinated gating unit is expressed by the following formula:

wherein, W_gIs a learnable network weight, W_gIs 71X 64 [.]The splicing operation of the matrix is represented.

A standard loop body function for the encoder network is shown, in this embodiment, a GRU loop body is used.

In this embodiment, in the step S3, the process of calculating the training loss and updating the network in the reverse direction includes the following steps:

step S31: obtaining tag data X ═ { X ═ X¹,x²,…,x^t _,…,x¹⁴}，

Calculation encoder

Loss of power

And decoder

Loss of power

The expression is as follows:

step S32: step S22 is executed to obtain S ═ S¹,s²,…,s¹⁴}, calculating the gradient

Updating theta by gradient descent method_e。

Step S33: step S22 is executed to obtain S ═ S¹,s²,…,s¹⁴}，

Step S34: step S23 is executed to obtain

Calculating gradients

Updating theta by gradient descent method_d。

Step S35: steps S31-S34 are performed until

And (6) converging.

In this embodiment, in the step S4, in the practical application process, the time-series missing value padding process is:

step S41: the 1400-length time series to be filled is split into 100 parts { XR } according to a fixed time series length 14₁,XR₂,…,XR_i,…,XR₁₀₀}。

Step S42: from i to 1 to i to 100, respectively

Step S43: obtaining a filled time series

Specifically, the experimental comparison process and the results of this example with other time series filling methods containing deletion values are as follows:

1. three experiments were performed with deletion rates of 10%, 20%, and 30%.

2. We compare the filling performance of this embodiment with other time series filling methods, Bidirectiontemporal filling for time series (BRITS), Linear memory vector temporal network (LIME-RNN).

3. The difference in filling performance we denote by Mean Relative Error (MRE):

wherein x represents a sequence without missing values,

indicates the sequence filled after x makes a deletion.

Table 1 shows the filling accuracy of the present invention compared to other time series filling methods with missing values for different missing cases.

TABLE 1

Rate of absence	LIME-RNN	BRITS	This example
				30％	19.11％	15.91％	11.97％
20％	14.28％	13.26％	10.94％
				10％	11.85％	11.73％	9.94％

The table data shows that the time series missing value filling method based on the bidirectional cyclic codec neural network provided by the embodiment can obtain higher filling precision than that of the existing method under the conditions of the missing rate of 10%, 20% and 30%, is less affected by the change of the missing rate, and has certain reference value and practical economic benefit.

The above description of the embodiments is only intended to facilitate the understanding of the method of the invention and its core idea. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A time sequence missing value filling method based on a bidirectional cyclic codec neural network is characterized by comprising the following steps:

step S1: taking a continuous time sequence with a fixed length and without a missing value from a historical database as a tag sequence, and taking the tag sequence as an input sequence after a missing point is artificially generated by the tag sequence;

step S2: inputting the input sequence into a neural network of a bidirectional cyclic coder-decoder to obtain an output sequence;

step S3: calculating the training loss of the output sequence and the label sequence of the bidirectional cyclic codec neural network, and reversely updating the neural network;

step S4: after training the neural network model, inputting the time sequence containing the missing value into a bidirectional cyclic codec neural network, wherein the obtained output sequence is the filled time sequence;

wherein, the step S2 further includes the following steps:

step S21: obtaining an input sequence

Where T represents the time series length, D represents the number of data attributes,

represents the d-th attribute;

representing the mean of T input sequences of length, calculating the mean of each attribute therein

The method comprises the following steps:

wherein the content of the first and second substances,

is m^tCorresponds to the d-th component of

Is a value other than 0 or 1,

to represent

In the absence of any of the above-described agents,

to represent

Is not deleted;

shape N of_h×N_h(ii) a Represents a matrix product, which indicates a Hadamard product;

is a matrix of all 0; f. of_GR(.) represents a cyclic body function containing coordinated gating cells; recording s at each time t^tS ═ S¹,s²,…s^t,…,s^T}，

Wherein T represents the time series length and D represents the number of data attributes; finally, the encoder network is marked as

Wherein theta is_eRepresenting all trainable weights in the encoder network,

represents an input sequence;

shape N of_h×N_h；

Represents a standard cyclic function for the decoder network; record y at each time t^tValue of (a), Y ═ Y¹,y²,…,y^t,…,y^T}，

Wherein T represents the time series length and D represents the number of data attributes; order to

Wherein

Representing a non-linear function; finally, the encoder network is marked as

step S24: the output of the bi-directional cyclic codec neural network is

Wherein T represents the time series length and D represents the number of data attributes; calculated by the following formula

In step S22, the cyclic function of the coordinated gating cell is expressed by the following formula:

wherein, W_gIs a learnable network weight having a shape of (N)_h+D)×N_h，[.]Representing a splicing operation of the matrix, and sigma (.) representing a sigmoid function;