CN111444820B

CN111444820B - Gesture recognition method based on imaging radar

Info

Publication number: CN111444820B
Application number: CN202010215230.5A
Authority: CN
Inventors: 张雷; 张博; 吴沫君
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2021-06-04
Anticipated expiration: 2040-03-24
Also published as: CN111444820A

Abstract

The invention relates to a gesture recognition method based on an imaging radar. Belonging to the field of human-computer interaction. The method disclosed by the invention uses the imaging radar as a hardware carrier, and combines a self-coding technology and a recurrent neural network to realize high-accuracy identification of the dynamic gesture of the user. The method can be applied to different imaging radars. For the gesture recognition method based on camera equipment realizes, the module that this method realized is lighter, does not receive the influence of light intensity simultaneously during the gesture in the discernment environment, because do not need the camera to shoot the video recording, user privacy can not revealed in this method, can be applied in a lot of scenes such as intelligent household electrical appliances control, intelligent automobile driver's cabin control.

Description

Gesture recognition method based on imaging radar

Technical Field

The invention relates to a gesture recognition method based on an imaging radar, and belongs to the technical field of man-machine interaction.

Background

In recent years, gesture recognition is a research hotspot of human-computer interaction. The traditional gesture recognition method is based on the image collected by a camera for recognition. The image shot by the camera can clearly keep hand information, but the image is too large, and contains a lot of data which are useless for gesture recognition. The real-time processing of the camera images not only requires higher hardware computing speed, but also obtains gesture recognition results influenced by ambient light, and the method cannot realize higher gesture recognition accuracy in many occasions. Moreover, the camera can shoot the user image, so that the privacy problem is easily caused. There are some methods that implement gesture recognition methods using ultrasonic radar and machine learning. Since the resolution of the ultrasonic radar is particularly low, it is difficult to achieve a high gesture recognition rate.

Improving the recognition accuracy of gesture recognition is an important goal of gesture recognition. The resolution ratio of formation of image radar is higher than the resolution ratio of ultrasonic wave, and the formation of image result is not influenced by ambient light, and the detection range can pierce through the object obstacle. When the imaging radar is used for hand recognition, the signal data volume is small, and the requirement on hardware calculation speed is low. If an effective method is explored for processing the data of the imaging radar, high-precision gesture recognition is realized. Has wide application prospect.

Disclosure of Invention

The invention aims to provide a gesture recognition method based on an imaging radar, which aims to solve the problems that the dynamic gesture recognition rate is low, the gesture recognition effect is greatly influenced by ambient light and the like in the prior art, and the accuracy of gesture recognition is improved.

The invention provides a gesture recognition method based on an imaging radar, which comprises the following steps:

(1) collecting radar images of dynamic gestures to be recognized, forming a matrix A by the collected images, wherein the matrix A is a (N, S, T) and (W, H) matrix, the matrix A comprises N dynamic gestures, each dynamic gesture has S sample sequences, each sample sequence is formed by T radar images, each radar image comprises W, H and W pixels, W is the width of the radar image, and H is the height of the radar image;

(2) collecting radar images of any gesture to form a matrix B, wherein the matrix B is a matrix of M (W) H, the matrix B comprises M radar images, each radar image comprises W pixels H, W, H is the same as that in the step (1), W is the width of the radar image, and H is the height of the radar image;

(3) constructing a self-coding-decoding neural network E, which specifically comprises the following steps:

(3-1) the first layer of the self-encoding-decoding neural network E is a convolutional neural network, and the convolutional kernel weight W1 of the convolutional neural network is L_w1*P_w1*Q_w1Matrix, where L_w1Is the number of channels, P, of the convolution kernel_w1Is the width of the convolution kernel, Q_w1Is the height of the convolution kernel;

(3-2) the second layer of the self-encoding-decoding neural network E is a pooled neural network whose convolution kernel weight W2 is a 2 x 2 matrix;

(3-3) the third layer of the self-encoding-decoding neural network E is a convolutional neural network, and the convolutional kernel weight W3 of the convolutional neural network is L_W3*P_W3*Q_W3Matrix, where L_W3Is the number of channels, P, of the convolution kernel_W3Is the width of the convolution kernel, Q_W3Is the height of the convolution kernel;

(3-4) the fourth layer of the self-encoding-decoding neural network E is a pooled neural network, and the convolution kernel W4 of the pooled neural network is a 2 × 2 matrix;

(3-5) the fifth layer of the self-encoding-decoding neural network E is a fully-connected neural network, the number of the neurons is F5, and the weight of the neurons is W5;

(3-6) the sixth layer of the self-encoding-decoding neural network E is a fully-connected neural network, the number of the neurons is F6, and the weight of the neurons is W6;

(3-7) the seventh layer of the self-encoding-decoding neural network E is an upsampling neural network, and the convolution kernel W7 of the upsampling neural network is a 2 x 2 matrix;

(3-8) the eighth layer of the self-encoding-decoding neural network E is a convolutional neural network, and the convolutional kernel weight W8 of the convolutional neural network is L_W8*P_W8*Q_W8Matrix, where L_W8Is the number of channels, P, of the convolution kernel_W8Is the width of the convolution kernel, Q_W8Is the height of the convolution kernel;

(3-9) the ninth layer of the self-encoding-decoding neural network is an upsampling neural network, and a convolution kernel W9 of the upsampling neural network is a 2 x 2 matrix;

(3-10) the tenth layer of the self-encoding-decoding neural network is a convolutional neural network, and the convolutional kernel weight W10 of the convolutional neural network is L_W10*P_W10*Q_W10Matrix, where L_W10Is the number of channels, P, of the convolution kernel_W10Is the width of the convolution kernel, Q_W10Is the height of the convolution kernel;

(3-11) obtaining a self-encoding-decoding neural network E according to the steps (3-1) - (3-10);

(4) inputting the matrix B of the random gesture radar image collected in the step (2) into the self-coding-decoding neural network E in the step (3-11), and outputting the matrix B as a self-coding-decoded radar image E (B);

(5) with a loss function B²-E(B)²Training the self-coding-decoding neural network E (3-11) by using a gradient descent method to obtain trained matrixes W1 ', W2 ', … … W10 ';

(6) and (3) constructing a feature extraction neural network C by using the matrices W1 ', W2 ', … … W5 ' trained in the step (5), and specifically comprising the following steps:

(6-1) the first layer of the feature extraction neural network C is a convolutional neural network, and the convolutional kernel is W1';

(6-2) the second layer of the feature extraction neural network C is a pooled neural network, and the convolution kernel is W2';

(6-3) the third layer of the feature extraction neural network C is a convolutional neural network, and the convolutional kernel is W3';

(6-4) the fourth layer of the feature extraction neural network C is a pooled neural network, and the convolution kernel is W4';

(6-5) the fifth layer of the feature extraction neural network C is a fully-connected neural network, the number of the neurons is F5, and the weight of the neurons is W5';

(6-6) obtaining a feature extraction neural network C according to the steps (6-1) to (6-5);

(7) inputting the matrix A in the step (1) into the feature extraction neural network C in the step (6-6) to obtain a feature matrix CM, wherein the matrix CM is a matrix of (N S T) F5, and contains the features of the N dynamic gestures in the step (1), namely, each dynamic gesture has S sample sequences, each sample sequence has T radar images, T is the number of the radar images in the step (1), F5 features are extracted from each radar image, and F5 is the number of neurons in the 5 th layer in the feature extraction neural network C;

(8) constructing a recurrent neural network (RN), and specifically comprising the following steps:

(8-1) the first layer of the recurrent neural network RN is a long-term memory neural network layer, the number of neurons is R, and the weight of the neurons is W_ifco；

(8-2) the second layer of the recurrent neural network RN is a SoftMax classification layer, the number of the neurons is N +1, N is the number of the dynamic gesture types in the step (1), and the weight of the neurons is Ws;

(8-3) obtaining a recurrent neural network (RN) according to the step (8-1) and the step (8-2);

(9) inputting the feature matrix CM described in the step (7) into the recurrent neural network RN in the step (8), and outputting a predicted dynamic gesture classification result RE;

(10) taking the highest accuracy rate of the dynamic gesture classification result RE in the step (9) as a training targetTraining the weights W of the recurrent neural network RN of step (8) respectively using a gradient descent method_ifcoAnd Ws to obtain a trained matrix W_ifco'and Ws';

(11) using the weights W1 ', W2 ', … … W5 ' trained in the step (5) and the W trained in the step (10)_ifco'and Ws', constructing a gesture recognition neural network GR, and specifically comprising the following steps:

(11-1) the first layer of the gesture recognition neural network GR is a convolutional neural network, a convolution kernel W1';

(11-2) the second layer of the gesture recognition neural network GR is a pooled neural network, a convolution kernel W2';

(11-3) the third layer of the gesture recognition neural network GR is a convolution neural network, and a convolution kernel W3';

(11-4) the fourth layer of the gesture recognition neural network GR is a pooled neural network, and the convolution kernel W4';

(11-5) the fifth layer of the gesture recognition neural network GR is a fully connected neural network, the number of the neurons is F5, namely the number of the neurons of the feature extraction neural network C in the step (6), and the weight of the neurons is W5';

(11-6) the sixth layer of the gesture recognition neural network GR is a long-time memory neural network layer, the number of the neurons is R, namely the number of the neurons of the recurrent neural network RN in the step (8), and the weight of the neurons is W_ifco’；

(11-7) the seventh layer of the gesture recognition neural network GR is a SoftMax classification layer, the number of the neurons is N +1, and the weight of the neurons is Ws ', namely the matrix Ws' trained in the step (10);

(11-8) obtaining a gesture recognition neural network GR according to the steps (11-1) - (11-7);

(12) and (3) acquiring imaging radar images of the target to be recognized in real time, wherein each T image in the imaging radar images forms a sequence I, the sequence I is used as the real-time input of the gesture recognition neural network GR in the step (11), the output of the gesture recognition neural network GR is the gesture of the target to be recognized, and the gesture recognition based on the imaging radar is realized.

The imaging radar-based gesture recognition method provided by the invention has the advantages that:

according to the gesture recognition method based on the imaging radar, the imaging radar is used as a hardware carrier, and the self-coding technology and the cyclic neural network are combined to realize high-accuracy recognition of the dynamic gesture of the user. The method can be applied to different imaging radars. For the gesture recognition method based on camera equipment realizes, the module that this method realized is lighter, does not receive the influence of light intensity simultaneously during the gesture in the discernment environment, because do not need the camera to shoot the video recording, user privacy can not revealed in this method, can be applied in a lot of scenes such as intelligent household electrical appliances control, intelligent automobile driver's cabin control.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention.

Detailed Description

The imaging radar-based gesture recognition method provided by the invention has a flow chart shown in fig. 1, and comprises the following steps:

(3-1) the first layer of the self-encoding-decoding neural network E is a convolutional neural network, and the convolutional kernel weight W1 of the convolutional neural network is L_w1*P_w1*Q_w1Matrix, where L_w1Is the number of channels, P, of the convolution kernel_w1Is the width of the convolution kernel, Q_w1Is thatThe height of the convolution kernel;

(8-1) the first layer of the recurrent neural network RN is long and shortMemory neural network layer, the number of neurons is R, the weight of neurons is W_ifco；

(10) respectively training the weight W of the recurrent neural network RN in the step (8) by using a gradient descent method with the highest accuracy of the dynamic gesture classification result RE in the step (9) as a training target_ifcoAnd Ws to obtain a trained matrix W_ifco'and Ws';

Claims

1. A gesture recognition method based on imaging radar is characterized by comprising the following steps: