WO2019208998A1

WO2019208998A1 - Gru-based cell structure design robust to missing data and noise in time series data in recurrent neural network

Info

Publication number: WO2019208998A1
Application number: PCT/KR2019/004873
Authority: WO
Inventors: 오혜연; 박성준; 박정국
Original assignee: 한국과학기술원
Priority date: 2018-04-27
Filing date: 2019-04-23
Publication date: 2019-10-31
Also published as: KR102310490B1; KR20190124846A

Abstract

Provided is a recurrent artificial neural network model capable of imputing a missing value and reducing noise, simultaneously, in time series data in accordance to a problem being predicted, the recurrent artificial neural network model comprising, in a single cell structure, all of the steps of: (a) reducing noise in time series data by means of a weighted average method using a learnable noise reduction filter; (b) imputing a missing value; and (c) storing, in a hidden state vector, information which must be memorized at the present time through GRU computation. In addition, in configuring the recurrent artificial neural network model, the present invention is characterized in that, in step (a), a weighted parameter for reducing noise included in a cell structure is learned so as to be optimized for a task in a process of training the recurrent artificial neural network model to be adequate for a prediction task. By means of such method, the recurrent artificial neural network model performing missing value imputation and noise reduction, simultaneously, for time series data without separate preprocessing may be used for various machine learning tasks.

Description

GRU-based Cell Structure Design Robust to Data Dropping and Noise in Recursive Neural Networks

The present invention adds a weighted average filter to the internal structure of a cell used when designing a recurrent neural network model, and then applies the parameter of the filter when training the neural network model. The present invention relates to a new recursive neural network model based on a Gated Recurrent Unit (GRU) that can be trained together to learn missing time and noisy time series input data.

In general, when time series data is used in a classifier such as a neural network model, data omission and noise included in the data are processed and used in a preprocessing process. The missing data in the preprocessing process is replaced using algorithms such as global mean or weighted mean value, or linear regression or support vector machine-based regression. Data noise is also mitigated through methods such as moving average filter, wavelet filter, and fuzzy logic.

However, there is a limit that this data dropping and noise processing technique is applied regardless of the target system of the neural network model. Therefore, when the neural network model is trained through the input data, the data that has already been preprocessed is not modified. Therefore, the structure of the neural network model or the characteristics of the target system cannot be effectively reflected in the data loss and noise processing.

In order to solve the limitations of the data loss and noise processing methods as described above, an approach for replacing missing data and mitigating noise by modifying a cell structure in a neural network model has been proposed. When data preprocessing is performed through the cell structure, the neural network model has the advantage that the parameters of the function used for preprocessing can be learned together. However, there is no cell structure that simultaneously replaces missing data in the cell structure and reduces noise, and there is room for improvement of the accuracy of the target system.

SUMMARY OF THE INVENTION An object of the present invention is to solve the problems described above. When designing a recursive neural network model, a GRU-based cell having a weighted average filter is used to design a recursive neural network model. By learning the parameters together, it is possible to provide a method for learning a recursive neural network model without separate preprocessing for time series data with missing data and noise.

In particular, the cell structure proposed in the present invention provides a method of performing noise mitigation through a learnable and flexible weighted average filter that can substitute for missing values, and provides a method of noise mitigation filter according to a problem to be predicted. It is to provide a learning algorithm that learns the parameters together.

In order to achieve the above object, the present invention provides a recursive artificial neural network model capable of simultaneously replacing missing values and noise mitigation in accordance with a problem to be predicted, and (a) using a noise mitigation filter that can be learned from time series data. Mitigating noise using a weighted average method, (b) replacing missing values, and (c) storing the information to be remembered at the present point in time in a latent state vector through a GRU operation. It features.

In addition, the present invention in constructing a recursive artificial neural network model, in the step (a), in the process of learning the weight parameter for noise mitigation included in the cell structure to fit the recursive artificial neural network model to the prediction task, It is characterized in that it is learned to be optimized for the task.

As described above, when constructing a recursive neural network using data missing and noise robust GRU cells according to the present invention, the performance of the target system of any recursive neural network operating on time series data in which data missing and noise exist. This improved effect is obtained.

1 is a block diagram of a conventional GRU cell structure.

2 is a block diagram of a cell structure of the present invention.

DETAILED DESCRIPTION Hereinafter, specific contents for carrying out the present invention will be described with reference to the drawings. The cell structure proposed by the present invention is as follows.

1. Each time point t of time series data having a length T is composed of N-dimensional vectors representing N features, and may include missing values and noise.

2. Remove the noise using the noise reduction layer. The value computed by the layer is the most recently observed input relative to time t.

The noise is removed.

2. 1. The value of the d th dimension in the N-dimensional vector at the start point t

If is observed, use this value as is. Conversely, the value of the d-dimensional dimension in the N-dimensional vector of the time point t

If is not observed, the most recently observed value for that dimension

Using.

silver

Indicates whether or not is observed, and is a mask with a value of 1 if it is observed or 0.

2. 2.

Is the time from point tk to point t

Weighted average of the population.

Is the k point in time at which the most recently computed missing values were replaced.

As the weighted average of, w is a learnable parameter representing the weight of each viewpoint in k-dimensional.

That is, the noise removal layer

Is the most recently observed noise removed at time t.

being.

3. Replace missing values based on the noise-reduced values from the previous layer. The value calculated by the layer is noise-free and missing values are considered.

being. The detailed procedure is as follows.

3. The value of the d-th dimension of the N-dimensional vector of the starting point t

Is observed, the noise is removed

Use the value as it is.

2. The value of the d th dimension in the N-dimensional vector at the start point t

If is not observed, decay rate

Applied

Using. That is, from the current time point t

Time passed to the time point was observed

In proportion to, apply an exponential decay rate.

Decay rate

silver

Increases in proportion to

Wow

Is a parameter that can be learned according to the data to determine the decay rate. Decay rate

Can have a value between 0 and 1, the closer this value is to 1, the most recently observed value

Instead, the overall mean or any constant

Is set so that all input values are not available

To converge.

4. Missing and noisy values

Perform GRU operation based on

The input vector x _t at time t is used to calculate the reset gate r _t and update gate z _t . Each gate is a hidden state of the previous point in time

R _t is a value between 0 and 1 calculated using the input value x _t and r _t is the latent state of the previous point in the calculation of the candidate hidden state h _t .

A represents how much to reflect, _t z is a potential state of the previous time when the current to calculate the potential state at the time _t h

How much to reflect.

5. The final value obtained in the cell structure of the present invention is the latent state h _t at the present time. This is the information processed up to the point in time

And the raw data of the present time, which is a vector representation of the information to be remembered at this time to perform the task in the time series data.

The present invention provides a recursive artificial neural network model capable of simultaneously replacing missing values of a time series data and mitigating noise according to a problem to be predicted. (A) Noise is weighted using a weighted average method using a noise mitigation filter that can be learned from time series data. Mitigating, (b) replacing missing values, and (c) storing information to be remembered at the present time in a latent state vector through a GRU operation in a single cell structure. In addition, the present invention in constructing a recursive artificial neural network model, in the step (a), in the process of learning the weight parameter for noise mitigation included in the cell structure to fit the recursive artificial neural network model to the prediction task, It is characterized by being learned to the task. By the above method, it is possible to utilize a recursive artificial neural network model that simultaneously performs missing value replacement and noise mitigation of time series data without any preprocessing, and can be utilized for various machine learning tasks.

: The value of the d-th dimension in the N-dimensional vector of the starting point t.

: The most recently observed value of the d-th dimension of the N-dimensional vector with respect to the time point t. If at time t-1 this value is observed last and this value at time t (

) Is missing value,

being. (When not considering decay rate)

: A mask value indicating whether a value of the d-th dimension is missing in the N-dimensional vector of the starting point t. if

1 is not a missing value,

Has a value of 0 if is missing.

: Most recently observed value when a missing value occurs

How much

Attenuation ratio to decide whether to reflect on the range, can have a value between 0 and 1.

: Input value to determine attenuation rate.

and

Indicates time difference between That is, the distance from the present time to the last time a value was observed. For example, if the current time is t

If was observed at time t-1

Is 1

To determine the decay rate

A learnable parameter to be multiplied by.

To determine the decay rate

Learnable parameters added to.

w: In the noise reduction layer,

Each to calculate

The weight multiplied by.

: For each time point

Is the value of the d-th dimension of the N-dimensional vector of the time point t at which missing values are replaced and noise removed, multiplied by the weight w.

: Candidate potential status. Using the input from this point in time, we generate a candidate for the potential state at this point.

h _t : the latent state at the present time. The vector representation shows the information to be remembered at the present time, calculated based on the latent state of the past and the candidate latent state at this time.

z _t : Update gate used in GRU operations. Parameter to the hidden state h _t-1 at the previous time

The inner and the input for the x _t to a value between 0 and 1 by applying a sigmoid activation function to the value of the inner product of W _z, plus the following: b _z, prior to when the current to calculate the potential state h _t the time Indicates how much to reflect the latent state h _t-1 at the time point.

r _t : Reset gate used in GRU operations. Invert the parameter U _r to the hidden state h _t-1 at the previous time, and to the input value x _t

Internally

Is a value between 0 and 1 calculated by applying the sigmoid activity function to the candidate hidden state.

When calculating, how much to reflect the latent state h _t-1 from the previous point in time.

Claims

In providing a recursive artificial neural network model capable of simultaneously replacing missing values and mitigating noise in accordance with a problem to be predicted,

(a) mitigating noise by a weighted average method using a noise mitigating filter that can be learned from time series data,

(b) replacing missing values,

(c) Recursive artificial neural network model comprising the step of storing in a latent state vector all the information to be remembered at the present time through GRU operation.