CN110610158A

CN110610158A - Human body posture identification method and system based on convolution and gated cyclic neural network

Info

Publication number: CN110610158A
Application number: CN201910869443.7A
Authority: CN
Inventors: 张雷; 王震宇; 王焜; 滕起
Original assignee: Nanjing Normal University
Current assignee: Nanjing Normal University
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2019-12-24

Abstract

The invention discloses a human body posture identification method and a system based on a convolution and gate control cyclic neural network, wherein the method comprises the following steps: (1) collecting sensor data and recording corresponding action types; (2) preprocessing sensor data, and dividing the data into a training sample and a testing sample; (3) training a convolution and gate control cyclic neural network model by adopting a training sample, and continuously adjusting model parameters according to requirements; (4) transplanting the trained convolution and gating cyclic neural network model to a mobile intelligent terminal; (5) and preprocessing the sensor data acquired in real time on the mobile intelligent terminal, and inputting the preprocessed sensor data into the trained convolution and gate control cyclic neural network model to obtain a human body posture recognition result. The invention utilizes the artificial intelligence-convolution and gate control cyclic neural network identification method, the identification precision is high, and the identification types are many; compared with a video or image recognition method, the method can effectively protect the privacy of the user.

Description

Human body posture identification method and system based on convolution and gated cyclic neural network

Technical Field

The invention relates to the technical field of wearable intelligent monitoring in the field of artificial intelligence, in particular to a human body posture identification system method and system based on a convolution and gate control cyclic neural network.

Background

The human body posture recognition technology is widely applied to the fields of virtual reality, mobile games, medical care, human-computer interaction, image recognition and the like. The gesture recognition techniques are generally classified into two types: non-wearable and wearable. The non-wearable technology, as the name implies, refers to a human body gesture recognition technology, such as an image recognition technology, in which the gesture recognition device does not come into direct contact with a human body. Compared with a non-wearable type, the wearable human body posture recognition technology has the advantage of unlimited space and has better development space in research and application. Due to the diversity of human body postures and the difference of individual actions, how to establish a posture recognition model with high recognition accuracy is a research subject which is always studied and paid attention to at present.

Generally, in order to maintain high recognition accuracy, a plurality of sensor devices are placed on the extra joints of the human body. Although this method can intuitively find the acceleration characteristics of various motions, it is inconvenient in practical applications to require a user to carry a plurality of sensors. How to use less or even only one set of sensors for high-accuracy human posture recognition is a very practical research problem.

The human body gesture recognition is carried out by using a built-in sensor of the smart phone or the smart watch, a plurality of research applications have been already carried out at home and abroad, and most smart bracelet watches and mobile phones on the market at present have gesture recognition application programs APP. Most of the human body posture recognition methods are threshold detection methods, namely, motion types are classified by judging whether data which are original or processed by a sensor are larger than or smaller than a preset good threshold. The method is simple in calculation and occupies less memory of the intelligent mobile device, but meanwhile, the method has the following obvious defects: the accuracy of different products is uneven, and the types of actions which can be identified are very limited. This is the reason for the technical gaps between developers of various companies, on the one hand, and the limitations of such methods, on the more important hand. The more classes of actions that need to be identified, the more complex such an algorithm is to build.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the problems in the prior art, the invention aims to provide a multi-class human body posture recognition method based on a convolution and gate control cyclic neural network, which has high recognition precision and can recognize various action types.

The technical scheme is as follows: the invention relates to a human body posture identification method based on a convolution and gate control cyclic neural network, which comprises the following steps:

(1) collecting sensor data and recording corresponding action types;

(2) preprocessing the sensor data, and dividing the data into a training sample and a testing sample;

(3) training a convolution and gate control cyclic neural network model by using a training sample, testing the accuracy of the model by using a test sample, and continuously adjusting model parameters according to requirements;

(4) transplanting the trained convolution and gating cyclic neural network model to a mobile intelligent terminal;

(5) and preprocessing the sensor data acquired in real time on the mobile intelligent terminal, and inputting the preprocessed sensor data into the trained convolution and gate control cyclic neural network model to obtain a human body posture recognition result.

Further, comprising:

in the step (1), the sensor data includes nine-axis sensor data including six-axis linear acceleration signals and three-axis angular velocity signals, wherein the six-axis linear acceleration signals are divided into three-axis human body acceleration signals and three-axis gravity acceleration signals.

Further, comprising:

in the step (1), the step of recording the corresponding action categories comprises the steps of firstly determining the sampling frequency of the intelligent terminal, acquiring sensor data of different action categories according to the sampling frequency, and making oscillograms which accord with different action categories according to the sensor data, wherein the action categories comprise walking, going upstairs, going downstairs, sitting, standing and lying.

Further, comprising:

in the step (3), the convolution and gated circulation neural network model sequentially includes an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a first circulation layer, a second circulation layer and an output layer, the input data is preprocessed sensor data, the number of convolution kernels of the first convolution layer, the second convolution layer and the third convolution layer is 16, 32 and 64 respectively, the length of a one-dimensional convolution kernel is 8, the step length is 1, the filling mode is complement 0, the length of a filter of the first pooling layer and the length of a filter of the second pooling layer are 2, the step length is 2, the filling mode is complement 0, the maximum pooling strategy is adopted, GRUs are adopted as circulation units in the first circulation layer and the second circulation layer, and the number of circulation neurons is 128.

Further, comprising:

in the step (2), the model parameters comprise gated cyclic unit neuron number adjustment, loss function and convolution kernel adjustment.

A system for human gesture recognition based on convolutional and gated recurrent neural networks, comprising: the acquisition module is used for acquiring sensor data and recording corresponding action types;

the preprocessing module is used for preprocessing the sensor data and dividing the data into a training sample and a testing sample;

the training module is used for training the convolution and gate control cyclic neural network model by adopting a training sample, testing the accuracy of the model by using a test sample and continuously adjusting the model parameters according to the requirement;

the result identification module is used for transplanting the trained convolution and gating cyclic neural network model to the mobile intelligent terminal; and preprocessing the sensor data acquired in real time on the mobile intelligent terminal, and inputting the preprocessed sensor data into the trained convolution and gate control cyclic neural network model to obtain a human body posture recognition result.

Further, comprising:

in the preprocessing module, the sensor data comprises nine-axis sensor data including six-axis linear acceleration signals and three-axis angular velocity signals, wherein the six-axis linear acceleration signals are divided into three-axis human body acceleration signals and three-axis gravity acceleration signals.

Further, comprising:

in the preprocessing module, recording the corresponding action types comprises firstly determining the sampling frequency of the intelligent terminal, acquiring sensor data of different action types according to the sampling frequency, and making oscillograms according with different action types according to the sensor data, wherein the action types comprise walking, going upstairs, going downstairs, sitting, standing and lying.

Further, comprising:

in the training module, the convolution and gated circulation neural network model sequentially comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a first circulation layer, a second circulation layer and an output layer, the input data are preprocessed sensor data, the convolution kernel number of the first convolution layer, the convolution kernel number of the second convolution layer and the convolution kernel number of the third convolution layer are respectively 16, 32 and 64, the length of the one-dimensional convolution kernel is 8, the step length is 1, the filling mode is 0, the filter length of the first pooling layer and the filter length of the second pooling layer are 2, the step length is 2, the filling mode is 0, the maximum pooling strategy is adopted, GRUs are adopted as circulation units in the first circulation layer and the second circulation layer, and the circulation neuron number is 128.

Further, comprising:

in the training module, the model parameters comprise gated cyclic unit neuron number adjustment, loss function and convolution kernel adjustment.

Has the advantages that: compared with the prior art, the invention has the following remarkable progress: 1. by using an artificial intelligence-convolution and gate control cyclic neural network identification method, the identification precision is high, and the identification types are multiple; 2. the number of the actions identified by the identification method is extensible, the extension operation is simple, and the operation of developers is easy; 3. compared with a video or image recognition method, the method can effectively protect the privacy of the user; 4. the method can be applied to android smart phones and smart watches commonly used by people, and has good portability.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of the present invention;

fig. 3a, 3b, 3c, 3d, 3e and 3f are schematic partial sensor data waveforms corresponding to six different actions of walking, going upstairs, going downstairs, sitting, standing and lying respectively;

FIG. 4a is a schematic of the structure of LSTM and FIG. 4b is a schematic of the structure of GRU;

FIG. 5 is a graph of cross entropy (cross _ entropy) as a function of training times;

FIG. 6 is a confusion matrix diagram of a trained model.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Deep learning has a good development prospect in pattern recognition. Deep learning has originated from the study of artificial neural networks. Among them, the Convolutional Neural Network (CNN) is a Deep Neural Network (DNN) capable of acting as a feature extractor. CNN can gradually abstract the features of input data by superimposing multiple layers of convolutional neural networks. The convolutional neural network belongs to the artificial intelligence category, has higher efficiency than the traditional method in the construction of the pattern recognition classifier, is easy to expand, and can realize more recognition models than the traditional method in action recognition types.

The Long Short-term Memory (LSTM) unit is a complex Recurrent Neural Network (RNN) originally proposed by Hochreiter et al to solve the problem of RNN performing poorly on Long sequence data. LSTM is distinguished from the general recurrent neural network in that it is able to learn and preserve long-term information through a specific threshold. The three thresholds of the LSTM are a forgetting gate, an input gate and an output gate respectively. Gated Current Unit (GRU) is a more modified variant of LSTM, originally proposed by Cho et al. Similar to the structure of LSTM, the GRU structure has different gates, but it changes the forgetting gate and the input gate into an update gate, and makes some other changes, such as merging the cell state and the hidden state. GRU has been shown to exceed LSTM in video, speech, text, etc. data sets, both in terms of convergence speed, parameter update and generalization.

Research on GRU is mainly focused on fields such as speech modeling and document modeling at present. GRUs have not been used in the field of human motion recognition based on sensor data. Therefore, aiming at the problem of human body motion recognition based on sensor data, the invention provides a new DeepConvGRU network model: the network is formed by combining CNN and GRU, can automatically extract characteristics and model time dependence relationship, and has higher convergence speed.

A multi-class human body posture identification method based on convolution and gated cyclic neural network comprises the following steps:

step 1, acquiring data of an accelerometer and a gyroscope of mobile intelligent terminal equipment under the condition of supervision and recording by a third party, attaching an action category label in advance, and using the action category label as a sample when a human body posture recognition model is trained;

the data is nine-axis sensor data comprising six-axis linear acceleration signals and three-axis angular velocity signals, wherein the six-axis linear acceleration signals are divided into three-axis human body acceleration signals and three-axis gravity acceleration signals.

Step 2, preprocessing the data of the accelerometer and the gyroscope, including filtering and normalizing the data, adjusting the data into a convolution product and a gated cyclic neural network input format, and dividing the data into two types, wherein one type is a training sample and the other type is a testing sample;

all the collected data are labeled firstly, and then divided into training samples and testing samples according to a certain proportion, in the invention, m% in the data is used as a training sample, n% is used as a testing sample, and m% + n% is 100%, and m% is more than or equal to 70% and less than or equal to 90%, and n% is more than or equal to 10% and less than or equal to 30%

Step 3, training the convolution and gate control cyclic neural network by adopting the training sample, testing the accuracy of the convolution and gate control cyclic neural network by using the testing sample, and continuously adjusting according to the requirement, wherein the method specifically comprises the following steps:

3.1, establishing a multilayer convolution and gating cyclic neural network model;

3.2, introducing training samples to adjust parameters of the convolutional and gated cyclic neural network model to obtain a model with high accuracy; wherein, the parameter adjustment of the convolution and gating cyclic neural network model comprises the following steps: gated cyclic unit neuron number modulation, loss function and convolution kernel modulation.

Step 4, transplanting the trained convolution and gating cyclic neural network model (human body posture recognition model) to a mobile intelligent terminal to realize a real-time terminal posture recognition processing function;

and 5, acquiring data of the accelerometer and the gyroscope by using the mobile intelligent terminal, preprocessing the data, and inputting the preprocessed data into the trained convolutional neural network model to obtain a human posture recognition result.

The human body posture recognition model is obtained based on the preset training set and the convolutional neural network structure training, and six action postures of walking, going upstairs, going downstairs, sitting, standing and lying can be recognized.

FIG. 1 is a flow chart of target processing, wherein after time sequences of an accelerometer and a gyroscope of human body movement are acquired from an intelligent mobile terminal, the time sequences are integrated and processed, then input to an initial convolution and gated cyclic neural network for model training, and a trained model meeting design requirements is output to the mobile terminal, so that human body actions can be identified on the mobile intelligent terminal in an off-line manner.

Fig. 2 is a structural diagram of a convolution and gated recurrent neural network, which mainly includes: 3 convolutional layers, 2 pooling layers, 2 cyclic layers, and 1 output layer. The inputs are pre-processed accelerometer and gyroscope data.

The number of convolution layers and the number of neurons are set in such a way that the matrix size of the convolution layer output can be more suitable as the input of the circulation layer, and the circulation layer of the invention uses a gated circulation unit (GRU) instead of an LSTM circulation unit. Since the GRU is simpler than the LSTM structure, the GRU performs better in terms of the number of training steps required and convergence time. In addition, GRUs require less computing resources than LSTM, making future migration to the mobile end more convenient.

The sampling frequency is set to be 25-50Hz, and in the embodiment, the sampling frequency of the intelligent acquisition terminal is set to be 50 Hz. The partial motion sensor data waveforms acquired at this frequency are shown in fig. 3(a), 3(b), 3(c), 3(d), 3(e), and 3 (f). One motion sample is defined every 2.56 seconds, i.e. one sample per 128 sets of data. Of course, the sampling frequency can be set to an appropriate value according to actual requirements.

In order to train convolutional and gated cyclic neural networks, the invention classifies the collected samples into two categories: training samples and test samples. The training sample is used as the input of the convolution and gate control cyclic neural network to carry out model training, and the test sample is used as the basis for considering the identification accuracy. In this embodiment, it is preferable that 70% of the data set is used as the training set and 30% of the data set is used as the test set.

As inputs to the convolutional and gated recurrent neural networks, the present example sizes the nine-axis sensor data, i.e., accelerometer and gyroscope data, to (128,9), representing length and depth, respectively, to fit the training of the convolutional and gated recurrent neural networks. The acceleration sensor is a 128x9 matrix, and data is nine-axis sensor data comprising six-axis linear acceleration signals and three-axis angular velocity signals, wherein the six-axis linear acceleration signals are divided into three-axis human body acceleration signals and three-axis gravity acceleration signals. Of course, the appropriate value can be set according to the actual requirement, and is not limited herein.

In the convolutional layer, a plurality of convolutional cores can perform convolution operation on input information, namely preprocessed accelerometer and gyroscope data, to generate a plurality of feature maps, wherein the feature maps are feature matrixes extracted from original data by the convolutional layer and are called feature maps, and the feature maps are not actual pictures, so that compared with a video or image recognition method, the method can effectively protect user privacy.

Its mathematical model can be described as:

wherein the content of the first and second substances,for the jth feature map of the ith layer, f is a non-linear activation function, M^lIs the number of feature maps for the l-th layer,a convolution kernel for mapping the ith characteristic diagram of the l layer to the jth characteristic diagram of the (l +1) layer through convolution operation. Nonlinear activation usually selects a modified Linear Unit (ReLU), and the ReLU has the advantages of reducing the dependency relationship among parameters and reducing the generation of an overfitting problem. The formulation of ReLU is as follows:

wherein the content of the first and second substances,represents the output value of the convolution operation,is thatThe activation value of (c).

The convolutional neural network is different from the general neural network in that the convolutional neural network includes a feature extractor composed of convolutional layers and sub-sampling layers. In the convolutional layer of the convolutional neural network, one neuron is connected to only part of the neighbor neurons. In a convolutional layer of CNN, there are usually several feature planes (featuremaps), each of which is composed of some neurons arranged in a rectangle, and the neurons of the same feature plane share a weight, where the shared weight is a convolution kernel. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel learns to obtain a reasonable weight in the training process of the network. Sharing weights (convolution kernels) brings the immediate benefit of reducing the connections between layers of the network, while reducing the risk of over-fitting.

The convolution part of the invention only needs to set the length and the number of the convolution kernels. Values of the convolution kernel length and the number of convolution kernels are empirical values, no fixed value-taking method is available, the convolution kernel length is 8 in the example of the invention, the number of convolution kernels of 3 convolution layers is 16, 32 and 64 respectively, and the data is only used for reference.

The pooling layer is sandwiched between successive convolutional layers for compressing the amount of data and parameters, reducing overfitting. The common pooling modes are maximum pooling and average pooling, wherein the average pooling can be used for averaging all the characteristic points in the window, and the maximum pooling is used for taking the maximum characteristic point in the window. In order to extract the most obvious features in the feature map, the network structure adopts a maximum pooling strategy.

The essential feature of a recurrent neural network is that there are both internal feedback and feedforward connections between processing units. From the system point of view, the system is a feedback dynamic system, embodies the process dynamic characteristics in the calculation process, and has stronger dynamic behavior and calculation capability than a feedforward neural network. The recurrent neural network has become one of the important targets of the international neural network expert research. The GRU differs from the conventional RNN mainly in that it incorporates a "processor" in the algorithm that determines whether information is useful or not, and this processor-oriented structure is called a cell. Two gates, called update gate and output gate, are placed in a cell.

Equations (3) - (6) describe how the GRU updates the cell state in one time step t. Assume that the hidden state (i.e., output) of the GRU at time t is h_tIt can be hidden from the previous moment by the hidden state h_t-1And candidate hidden statesIs represented by:

in which the door z is updated_tThe degree of hidden state update is determined. The update gate is calculated by:

z_t＝σ(W_zx_t+U_zh_t-1) (4)

this linear sum of old hidden states and candidate hidden states is computed in a manner very similar to LSTM, but GRU has no way to control the degree of output of its cell state, i.e. it outputs the complete cell state at each step.

The specific formula for calculating the candidate hidden state is as follows:

wherein r is_tIs a set of reset gates, and is a dot-product (i.e., vector element-by-vector) symbol. When the reset door is closed (i.e. when r is_tClose to 0), the reset gate will cause the cell to forget the previous state.

Reset gate r_tIs calculated in a manner similar to the update gate:

r_t=σ(W_rx_t+U_rh_t-1) (6)

fig. 4 is a structural illustration of LSTM and GRU, from which the structural differences between LSTM and GRU can be clearly seen.

The circulation part of the invention adopts 2 GRU circulation layers, and only the number of the GRU neurons needs to be set. The number of the neurons of the GRU is an empirical value, there is no fixed value-taking method, the number of the neurons of the GRU in the example of the present invention is 128, and this data is only used for reference.

The final experimental parameters of the model are listed below: the number of convolution kernels of the 3 convolution layers is 16, 32 and 64 respectively, the length of each one-dimensional convolution kernel is 8, the step length is 1, and the filling mode is 0 supplementation; the 2 pooling layers are distributed among the 3 convolution layers, the length of each pooling filter is 2, the step length is 2, the filling mode is 0 supplement, and a maximum pooling strategy is adopted; the 2 circulation layers all adopt GRUs as circulation units, and the number of circulation neurons is 128; and training the model by adopting an ADAM optimization method, and setting the learning rate to be 0.001.

If the amount of training data is not large enough, the data needs to be reused. 1000 data are input into a neural network for training each time and a training loss is measured, the recognition accuracy is measured every 50 times, in order to compare the difference of convergence speed of GRUs used in a circulation layer and LSTMs used in the circulation layer, convergence curves of two models based on the training step number and the training loss are drawn at the same time, DeepConvLSTM is a neural network model using the LSTM in the circulation layer, DeepConvGRU is a neural network model using the GRUs in the circulation layer, and the rest layer numbers and the neuron numbers are the same except that circulation units are different, as shown in FIG. 5.

As shown in FIG. 6, when the trained convolutional and gated recurrent neural networks meet the design requirements, the model can be extracted to the mobile intelligent terminal for use. If the training convolution and gated cyclic neural network does not meet the design requirements, the number of neurons of each hidden layer needs to be modified. It is appropriate to modify the number of neurons to which value, and repeated testing is required. If the method for modifying the number of the neurons of each hidden layer has little influence on the identification accuracy, the number of the hidden layers or the number of training samples is recommended to be added.

On the basis of the identification method, the application also obtains a human body posture identification system based on a convolution and gate control cyclic neural network, which comprises the following steps: the acquisition module is used for acquiring sensor data and recording corresponding action types;

Further, comprising:

It should be noted that the human body gesture recognition device in the embodiment of the present invention may be specifically integrated in an intelligent mobile terminal, and the intelligent terminal may be specifically a terminal such as a smart phone or a smart watch, which is not limited herein.

Therefore, the human body posture recognition device in the embodiment of the invention acquires the sensor data of the intelligent terminal, is based on the acquired sensor data of the intelligent terminal, and inputs the preprocessed data into the trained human body posture recognition model to obtain the human body posture recognition result. Because the human body posture recognition model is obtained based on the preset training set convolution and the gated cyclic neural network training, the human body posture can be recognized by inputting the preprocessed sensor data into the trained human body posture recognition model, and the human body posture recognition based on the sensor data by a non-visual means is realized.

Claims

1. A human body posture identification method based on a convolution and gate control cyclic neural network is characterized by comprising the following steps:

(1) collecting sensor data and recording corresponding action types;

2. The method for recognizing human body posture based on convolution and gated cyclic neural network as claimed in claim 1, wherein in the step (1), the sensor data comprises nine-axis sensor data including six-axis linear acceleration signal and three-axis angular velocity signal, wherein the six-axis linear acceleration signal is divided into three-axis human body acceleration signal and three-axis gravity acceleration signal.

3. The method for recognizing the human body posture based on the convolution and gated cyclic neural network as claimed in claim 2, wherein in the step (1), the recording of the corresponding action category comprises firstly determining a sampling frequency of the intelligent terminal, collecting sensor data of different action categories at the sampling frequency, and making a waveform diagram according with the different action categories according to the sensor data, wherein the action categories comprise walking, going upstairs, going downstairs, sitting, standing and lying.

4. The method of human pose recognition based on convolutional and gated-cyclic neural networks of claim 1, characterized in that in the step (3), the convolution and gating cyclic neural network model sequentially comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a first cyclic layer, a second cyclic layer and an output layer, the input data are preprocessed sensor data, the number of convolution kernels of the first convolution layer, the second convolution layer and the third convolution layer is 16, 32 and 64 respectively, the length of each one-dimensional convolution kernel is 8, the step length is 1, the filling mode is complement 0, the length of each filter of the first pooling layer and the second pooling layer is 2, the step length is 2, the filling mode is complement 0, the maximum pooling strategy is adopted, the first circulation layer and the second circulation layer both adopt GRUs as circulation units, and the number of circulation neurons is 128.

5. The method of claim 4, wherein in the step (2), the model parameters comprise gated cyclic unit neuron number adjustment, loss function and convolution kernel adjustment.

6. A human body posture recognition system based on a convolution and gated recurrent neural network is characterized by comprising: the acquisition module is used for acquiring sensor data and recording corresponding action types;

7. The system of claim 6, wherein in the preprocessing module, the sensor data comprises nine-axis sensor data including six-axis linear acceleration signals and three-axis angular velocity signals, wherein the six-axis linear acceleration signals are divided into three-axis human acceleration signals and three-axis gravitational acceleration signals.

8. The system of claim 7, wherein the preprocessing module is configured to record the corresponding motion categories including walking, going upstairs, going downstairs, sitting, standing, and lying, and to collect sensor data of different motion categories according to the sampling frequency by first determining the sampling frequency of the intelligent terminal, and to make a waveform map according to the different motion categories according to the sensor data.

9. The system of claim 6, it is characterized in that in the training module, the convolution and gating cyclic neural network model sequentially comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a first cyclic layer, a second cyclic layer and an output layer, the input data are preprocessed sensor data, the number of convolution kernels of the first convolution layer, the second convolution layer and the third convolution layer is 16, 32 and 64 respectively, the length of each one-dimensional convolution kernel is 8, the step length is 1, the filling mode is complement 0, the length of each filter of the first pooling layer and the second pooling layer is 2, the step length is 2, the filling mode is complement 0, the maximum pooling strategy is adopted, the first circulation layer and the second circulation layer both adopt GRUs as circulation units, and the number of circulation neurons is 128.

10. The system according to claim 9, wherein the model parameters in the training module include gated cyclic unit neuron number adjustment, loss function and convolution kernel adjustment.