CN111832704A

CN111832704A - Design method of convolution input type nested recurrent neural network

Info

Publication number: CN111832704A
Application number: CN202010611409.2A
Authority: CN
Inventors: 张萌; 曹晗翔; 范津安; 张倩茹; 朱佳蕾
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-27

Abstract

The invention discloses a design method of a convolution input type nested recurrent neural network, which comprises the following steps: performing data combination and convolution operation processing on the current moment input data and the last moment output data; equally splitting the result after convolution to be used as each gate control unit in the original long and short term memory network unit; performing convolution operation in the inner nested unit as input, and performing gating calculation operation the same as that of the long-term and short-term memory network unit to obtain the output of the inner nested unit; the output of the inner nested unit is used as the memory unit value of the outer unit, and the final output value of the whole unit is obtained through the output gate. The invention provides the combination of the nested recurrent neural network and the convolution input, which not only improves the performance of the model for fitting the data associated for a long time, but also extracts the local association among the characteristic relations and reduces a certain number of parameters; the recurrent neural network has higher accuracy and fewer parameters than a general recurrent neural network.

Description

Design method of convolution input type nested recurrent neural network

Technical Field

The invention belongs to the technical field of data processing, relates to a method for designing a recurrent neural network unit, and particularly relates to a method for designing a convolution input type nested recurrent neural network.

Background

With the development of network technology and hardware in the internet of things era, the number of users and interconnected equipment is explosively increased. In 2017, the number of the devices of the internet of things exceeds 75 hundred million of the total population of the world for the first time, and by 2020, the number is expected to increase to more than 300 hundred million. However, the application of the internet of things equipment is still in the primary stage, and due to the large-scale quantity of the equipment and the simple characteristics of the structure of the equipment, a lot of potential safety hazards are still faced. Of which the most important is the security problem of malicious intrusion from the internet.

Recurrent neural networks are widely used for prediction of time series; for intrusion attacks from the network, the attacks have certain timeliness, and particularly DDOS attacks can last for up to several minutes, which provides a practical basis for the attack data to be modeled by using a recurrent neural network. In fact, based on the excellent robustness and fitting ability of LSTM (long short term memory network) and its variants, it has been widely applied in the task of time series prediction and achieved far better results than traditional machine learning.

With the proposal of a Long Short Term Memory (LSTM) network with a Recurrent Neural Network (RNN) classic structure, the robustness of the RNN structure is improved and the RNN structure is easier to train. Since the rise of deep learning and the rapid improvement of hardware computing capability in 2012, RNNs become a popular field of research again, and in the field of intrusion detection, RNNs based on deep learning also have achieved more achievements.

The door mechanism was the earliest of the most classical studies. Compared with a basic RNN unit, the long-short term memory network (LSTM) enhances the memory capacity of the traditional RNN for long time step information, alleviates the problem of gradient disappearance and lays the foundation for large-scale application of the RNN. And then, 4 gates of the long-short term memory network are simplified into 2 gates by the simplified Gated Current Unit (GRU), so that the number of parameters is greatly reduced, the precision is hardly reduced, and the possibility of hardware deployment of the RNN is provided. NestedLSTMs conceives a nested LSTM structure, and replaces the original memory storage unit Ct with another complete LSTM unit, thereby greatly enhancing the long-term memory capability. The Convolitional LSTM replaces the matrix operation of the input and gate in the original LSTM structure with two-dimensional convolution, thereby improving the memory capacity of the traditional LSTM for space-time information and reducing the number of parameters. The Grid LSTM provides improvement for a Stacked LSTM structure with more use, and an LSTM structure capable of defining dimensionality is conceived and can be used for modeling of a high-dimensionality complex model.

Related researchers have also made many improvements on RNN time-spread sequences. The CW-RNN changes the hidden layer connected with each time step in the RNN into a plurality of groups of hidden layers with different recursion periods, so that the farther dependence is obtained by recursion of the hidden layers with a few times and a longer period, and the learning capability of the information with the longer dependence is reduced. The scaled RNN proposes a hopping connection that can be freely combined with different RNN units, reducing network parameters, and improving training efficiency.

For the optimization problem in training, some weight initialization methods are generated. A Unitary matrix initialization method of a parameter matrix is provided by the Unitry-RNN aiming at the gradient elimination and explosion problems of the RNN, a weight matrix is expressed into a more complex form and computationally efficient form through a special parameterization skill, and a new parameter matrix after parameter updating is still a Unitary matrix. Kanai S [ predetermined gradient iterations in gated regenerative units [ C ]// Advances in Neural Information Processing systems.2017: 435-.

RNNs also find application in network intrusion detection. Kim J [ Applying receiving neural network to interaction detection with session free optimization [ C ]// interactive work on Information Security applications, Char, 2015:357 and 369 ] to solve some problems existing in RNN, provides a Hessian tree optimization method to avoid the influence of gradient explosion and disappearance on training; the detailed description is given to the data preprocessing of the KDD CUP 99 data set, and the data preprocessing comprises different processing methods of symbolic features, Boolean features, percentage features and digital features, and features irrelevant to attacks in original data are abandoned. KIM T-Y [ Web traffic protection using C-LSTM Neural Networks [ J ].2018,106(66-76 ]) ] proposes a method for network traffic Detection using C-LSTM, which comprises extracting high-dimensional features in a large amount of data by using CNN, then training LSTM as classifier to achieve 98.7% accuracy on Yahoo S5 Web data set, which is superior to the same LSTM, CNN and GRU structure J ].2017,5(21954-61.] taking RNN as a key part of an intrusion detection system, performing two-class and multi-class prediction on an NSL-KDD data set, giving detailed experimental results, and the result shows that the method is superior to the traditional algorithm.

However, the sensitivity to long-time sequences is poor and the test accuracy is low when the existing recurrent neural network models network intrusion data, so a new technical scheme is needed to solve the problem.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the problems of poor sensitivity to long-time sequences and low test accuracy rate in the prior art when the network is modeled to invade data by the conventional recurrent neural network, the design method of the convolution input type nested recurrent neural network is provided, and on the basis of solving the problems, the number of parameters used is reduced to a certain extent, the local correlation among characteristics is considered, and the test accuracy rate is improved.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a method for designing a convolution-input type nested recurrent neural network, comprising the steps of:

s1: performing data combination and convolution operation processing on the current moment input data and the last moment output data;

s2: equally splitting the result after convolution, taking the result as each gate control unit in the original long and short term memory network unit, and entering an outer layer unit after gate control calculation operation; processing the previous time reservation information and the current time input as the input of the inner-layer nested unit;

s3: performing convolution operation in the inner nested unit as input, and performing gating calculation operation the same as that of the long-term and short-term memory network unit to obtain the output of the inner nested unit;

s4: the output of the inner nested unit is used as the memory unit value of the outer unit, and the final output value of the whole unit is obtained through the output gate.

Further, the step S1 is specifically: and combining the current moment input data and the last moment output data, and selecting a convolution kernel to perform one-dimensional convolution on each piece of data subjected to numerical processing in the combined data.

Further, the convolution operations in step S1 and step S3 are specifically: and taking each piece of data as input, and amplifying output into the number of channels of the number of hidden layer units.

Further, the two input parts in the step S1 and the step S2 are spliced in the channel number dimension of the original data.

Further, the combination of the last-time retained information and the current-time input information processed in step S2 is equal to the memory cells constituting the single-layer long-short term memory network cell.

Further, the gating calculation operations in step S2 and step S3 are calculated using an activation function.

Further, the activation function includes a sigmoid function and a tanh function.

The method of the invention replaces the memory unit in the long and short term memory network unit with a long and short term memory unit which is basically the same as the external memory unit to form a nested structure; the matrix multiplication operation after the data are input into the memory unit is changed into one-dimensional convolution operation, and the data output dimension is expanded. For network intrusion data, a recurrent neural network has become an important and effective method, but the accuracy, the number of parameters and the computational complexity of the method are the elbows of the scene-oriented application. The invention provides the combination of the nested recurrent neural network and the convolution input, which not only improves the performance of the model for fitting the data associated for a long time, but also extracts the local association among the characteristic relations and reduces a certain number of parameters; the recurrent neural network has higher accuracy and fewer parameters than a general recurrent neural network.

Has the advantages that: compared with the prior art, the method has the advantages that the traditional long-short term memory network unit is improved into a new structure with convolution as input and the traditional long-short term memory network unit is nested inside and outside, the nested structure can enable a model to be more accurate in long-time sequence data fitting, local correlation among single characteristics can be effectively extracted through convolution input, the problems of poor sensitivity to long-time sequences and low test accuracy rate in the process of data intrusion of the traditional recurrent neural network modeling network are solved, and compared with the traditional long-short term memory network unit, the new unit structure is obviously improved, and the test accuracy rate can be effectively improved.

Drawings

FIG. 1 is a diagram of the structure of the internal units of a convolutional input nested recurrent neural network (C-NLSTM);

FIG. 2 is a schematic diagram of a first time step outer layer convolution;

FIG. 3 is a schematic diagram of the convolution in the remaining cases;

FIG. 4 is a diagram of a memory unit of the long term memory network;

FIG. 5 is a schematic diagram of a time expansion of a long-short term memory network.

Detailed Description

The invention is further elucidated with reference to the drawings and the embodiments.

The invention designs a convolution input type nested recurrent neural network (C-NLSTM), and the design method comprises the following steps:

(1) combining the input data at the current moment and the output data at the previous moment, and selecting a proper convolution kernel for performing 1-dimensional convolution on each piece of data subjected to numerical processing in the data;

(2) equally splitting the result after convolution, taking the result as each gate control unit in the original long-term and short-term memory network unit, and entering an outer layer unit; and processing the time keeping information and the time input as the input of the inner-layer nested unit.

(3) And performing similar convolution operation in the inner nested unit as input, and performing the same gating calculation operation as the long-short term memory network unit to obtain the output of the inner nested unit.

(4) The output of the inner nested unit is used as the memory unit value of the outer unit, and the final output value of the whole unit is obtained through the output gate.

(5) And (4) expanding the specified time step of the nested recursion neural network unit based on the convolution input formula and formed by the steps (1) to (4), and training and predicting.

The convolution operation in the steps (1) and (3) is specifically as follows: and taking each piece of data as input, and amplifying output into the number of channels of the number of hidden layer units.

And (3) splicing the two input parts in the steps (1) and (2) under the dimension of the channel number of the original data.

The combination of the last time reserved information and the current time input information processed in the step (2) is equal to a memory unit forming a single-layer long-short term memory network unit.

In this embodiment, the method is practically applied to obtain the convolution-input nested recurrent neural network shown in fig. 1, and the specific design process is as follows:

1. first, data enters from the outer cell, currently inputting x^tAnd output h at the last moment^t-1And splicing the original data in the dimension of the channel number to form the input data of the actual unit. Assuming correlation between t consecutive pieces of data, each piece of data having i features, the inputs required to enter the model include: current skin input x^tAnd output h at the last moment^t-1. If the initial time step t is equal to 0, outputting h at the last moment^t-1Does not exist, and skin inputs x^tWith only one channel, the input shape is denoted as [ i, 1 ]](ii) a The rest(s)When the time step t is 1, 2^tThe matrix size is [ i, h ]]Output h at the last moment^t-1The matrix size is [ i, h ]]So the input size of the model is [ i, 2 x h ]]。

2. Performing outer layer convolution operation, as shown in fig. 2, the operation process specifically includes: as shown in fig. 2, when the time step t is 0, in the calculation of the outer one-dimensional convolution, a one-dimensional convolution kernel [ k, 1, 4 × h ] is used to perform convolution on the input data, where the size of the convolution kernel is k, the number of input convolution channels is 1, and the number of output convolution channels is 4 × h. And (4) carrying out zero filling operation on the original input matrix, so that the feature number after convolution is the same as the feature number i input before convolution, and the convolution output result is [ i, 4 x h ].

When the time step t is 1 and later, the convolution calculation method is as shown in fig. 3. At this time, the outer layer convolution uses one-dimensional convolution kernel [ k, 2 × h, 4 × h ] to perform convolution on the input data, where the size of the convolution kernel is k, the number of input convolution channels is 2 × h, and the number of output convolution channels is 4 × h. And performing zero padding operation on the original input by the convolution operation, so that the feature number after convolution is the same as the feature number i input before convolution, and the convolution output result is [ i, 4 x h ].

Then, the convolution result is divided into 4 gate control units Z of outer layer units in 4 equal parts in the dimension of the number of channelsⁱ，Z^f，Z^g，Z^oEach unit shape is [ i, h ]]。

Zⁱ，Z^f，Z^g，Z^o＝[i，h]

The basic principle of a long short term memory network (LSTM) unit is formulated as follows:

Z^f＝σ(W_fx^t+U_fh^t-1+b_f)

Zⁱ＝σ(W_ix^t+U_ih^t-1+b_i)

Z^o＝σ(W_ox^t+U_oh^t-1+b_o)

C^t＝Zf⊙C^t-1+Zⁱ⊙tanh(W_cx^t+U_ch^t-1+b_c)

h^t＝Z^oΘtanh(C^t)

O^t＝f(W_oh^t+b_o)

processing the external input data according to the formula of the long-term and short-term memory network unit:

C^t＝σ_f(Z^f)⊙C^t-1+σ_i(Zⁱ)⊙σ_c(Z^g)

σ_f，σ_iand σ_cFor the activation function, the sigmoid and tanh functions are taken here respectively:

after activating the function sigmoid, the gate control unit Z^fAnd ZⁱIs scaled to have all values at 0, 1]Matrices in the range, which can store the value C for the last moment^t-1And the processed input sigma at the current time_c(Z^g) And the selection of the weighting is carried out, so that the functions of partial forgetting of the stored value at the previous moment and gating of the current input are realized. This operation does not affect the data shape size, output C^tThe size is still [ i, h]。

Then, take out the external storage unit C^tAs input of the inner layer unit

And

as shown in the following equation:

here the activation function σ_f，σ_iAnd σ_cInner layer input as before

And

the size is still [ i, h]。

After entering the inner layer unit, will

And

splicing in the dimension of the number of channels to form input data of an inner layer unit, wherein the shape and the size are [ i, 2 x h]。

3. For the one-dimensional convolution operation of the data input in the inner layer, which is the same as the outer layer one-dimensional convolution calculation process after time step t is 1, as shown in fig. 3, the input data is convolved by using one-dimensional convolution kernels [ k, 2 × h, 4 × h ], where the size of the convolution kernel is k, the number of input channels is 2 × h, and the number of output channels is 4 × h. And performing zero padding operation on the original input by the convolution operation, so that the feature number after convolution is the same as the feature number i of the input before convolution, and the result after convolution is [ i, 4 x h ].

The convolution result is then equally divided into 4 gating cells of inner cells 4 in the channel number dimension

Each cell shape is [ i, h ]]。

and

for the activation function, the sigmoid and tanh functions are taken here respectively:

similarly, after activating the function sigmoid, the gate control unit

And

is scaled to have all values at 0, 1]A four-dimensional matrix in the range that stores values for the last time instant

Processed input at the current time

And current time output

And the selection of the weighting is carried out, so that the partial forgetting of the stored value at the previous moment and the gating of the input and the output at the current moment are realized. This operation does not affect the data shape size, output C^tAnd

the size is still [ i, h]。

4. Deriving the output of the inner layer unit

A memory cell C using the output as an original outer layer cell^tReturning to the skin unit, the output is processed as follows:

O_t＝σ_o(Z^o)

h^t＝O_tΘσ_h(C^t)

wherein σ_oAnd σ_hFor the activation function, the sigmoid and tanh functions are taken here respectively:

since the calculation does not change the original shape and size, C^tAnd h^tAll sizes are [ i, h]This is in contrast to the initial assumption that the skin output size is [ i, h ]]And (6) matching.

From this, the final output h of the whole unit is obtained^tAnd an internal storage value C^tThe forward calculation of the complete unit at the current time step t is completed. Followed by using the obtainedh^tInput x at time step t +1^t+1And splicing as model input of the t +1 time step, performing forward calculation of the t +1 time step, and stopping the forward calculation until the time step reaches the maximum value.

In this embodiment, a schematic diagram of a memory cell of a long term short term memory network is obtained as shown in FIG. 4, inputting x^tAnd h^t-1Performing matrix multiplication on the weight matrix with the size (the number of the feature dimension multiplied by the number of the hidden layer units) to obtain a matrix; dividing the hidden layer unit into four gating matrixes in the number dimension, controlling the gating of each content, and obtaining C according to a long-term and short-term memory network formula^tMemory cell and final output O^t。

From this, the final output h of the whole unit is obtained^tAnd an internal storage value C^tThe forward calculation of the complete unit at the current time step t is completed. Followed by using the h obtained^tInput x at time step t +1^t+1And splicing as model input of the t +1 time step, performing forward calculation of the t +1 time step, and stopping the forward calculation until the time step reaches the maximum value.

Referring to fig. 5, the recurrent neural network elements are expanded at specified time steps. The unit is a general term of the long and short term memory network unit, and can also refer to the unit designed in the invention. During the calculation of t time step, the input of one unit is the current input x^tOutput h with the previous time step^t-1And the input is C^tAnd C^t-1The output at the current time is h^t. Output h of the previous time step after expansion according to the specified time step^t-1And x^tForming an input of a current time step; inputting the calculated value and the previous memory cell value C^t-1Is added in a certain proportion to obtain the current storage unit value C^t(ii) a Calculating the current storage unit value to obtain the current time output h^t. In the t +1 time step calculation process, x is used^t+1And t time step output h^tThe splice constitutes the unit input, and the rest of the calculations are the same as in the t time step.

To verify the effectiveness of the method of the invention, comparative tests were performed using the method and classical LSTM under the same conditions.

The experiment uses a tensoflow open source framework, wherein the C-NLSTM unit is realized by itself, and four input hyper-parameters are designed, namely the number of nested layers, the input shape, the number of hidden layer units and the size of a one-dimensional convolution kernel. LSTM cells are the BasiclSTM cells in the tf.nn.rnn _ cell library in the tensoflow framework.

And performing one-hot coding on the data (traversing all different values in each column, and enabling all the different values to be classified into one class, wherein the class belongs to the label 1 of the class and the class does not belong to the label 0 of the class). And inputting the data into a C-NLSTM or LSTM unit, acquiring a return tensor, and finally setting a full connection layer for final classification.

Table 1 shows the comparison between C-NLSTM units and general LSTM units on KDD99 data sets, and in order to ensure the validity of the comparison, the two methods use the same superparameters and training rounds, and the specific settings and results are shown in table 1.

TABLE 1

It can be seen that under the identical hyper-parameters and software and hardware conditions, the accuracy of C-NLSTM on KDD99 data set reaches 98.46%, which is improved to a certain extent compared with the accuracy of 97.33% of LSTM. And the new unit has stronger fitting capability on data which is dependent for a long time, can effectively extract the association between adjacent features, and is suitable for the learning of a large data set.

Claims

1. A design method of a convolution input type nested recurrent neural network is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein the method further comprises: the step S1 specifically includes: and combining the current moment input data and the last moment output data, and selecting a convolution kernel to perform one-dimensional convolution on each piece of data subjected to numerical processing in the combined data.

3. The method of claim 1, wherein the method further comprises: the convolution operations in step S1 and step S3 are specifically: and taking each piece of data as input, and amplifying output into the number of channels of the number of hidden layer units.

4. The method of claim 1, wherein the method further comprises: the two input parts in the step S1 and the step S2 are spliced in the channel number dimension of the original data.

5. The method of claim 1, wherein the method further comprises: the combination of the last-time retained information and the current-time input information processed in step S2 is equal to the memory cells constituting the single-layer long-short term memory network cell.

6. The method of claim 1, wherein the method further comprises: the gating calculation operations in the steps S2 and S3 are calculated using an activation function.

7. The method of claim 6, wherein the method further comprises: the activation function includes a sigmoid function and a tanh function.