CN109508670B

CN109508670B - Static gesture recognition method based on infrared camera

Info

Publication number: CN109508670B
Application number: CN201811341659.8A
Authority: CN
Inventors: 金展翌; 张�雄; 樊兆雯; 仲雪飞
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-11-12
Filing date: 2018-11-12
Publication date: 2021-10-12
Anticipated expiration: 2038-11-12
Also published as: CN109508670A

Abstract

The invention provides a static gesture recognition method based on an infrared camera, belonging to the field of image processing technology and gesture recognition of Infrared (IR) images, and mainly comprising the following steps: preprocessing an infrared image; constructing a convolutional neural network, and extracting features of the infrared gesture image; and outputting the final gesture classification according to the classification weight ratio. Compared with the traditional visible light camera, the infrared camera does not depend on the external environment light any more, so that the static gesture recognition method based on the infrared camera provided by the invention can effectively and accurately extract gesture features under the conditions of no light, weak light, illumination in different scenes and background noise interference, can accurately classify and recognize gestures, can output correct expected results, and has good algorithm robustness.

Description

Static gesture recognition method based on infrared camera

Technical Field

The invention relates to a static gesture recognition technology based on an infrared camera, and belongs to the technical field of image processing gesture recognition.

Background

Along with the increasing frequency and diversification of human-computer interaction, the demand of people on the simplicity and convenience of human-computer interaction is continuously improved, and the limitation of the traditional human-computer interaction mode is more and more shown by depending on a mouse and a keyboard. And a more natural and direct human-computer interaction interface is provided based on non-contact gesture recognition, so that the operation is simple and the flexibility is high. In recent years, with the development of sensors, the accuracy and the portability degree are greatly improved, and the gesture recognition enters the usability stage.

Gesture recognition based on a common camera depends on ambient light, once light is insufficient or no light environment exists, gesture recognition accuracy is reduced or even recognition cannot be achieved. Meanwhile, under a complex background or a near-skin color background, the common camera has poor gesture feature extraction effect, and influences the final recognition result. Compared with the prior art, the gesture recognition based on the infrared camera has wider application scenes.

Compared with the traditional machine learning algorithm, the model based on the convolutional neural network saves artificial characteristic engineering, solves the problem in a centralized mode and realizes a closed end-to-end learning mode. Meanwhile, compared with the traditional neural network, the parameters are reduced by sharing the weight, and the idea of local perception of the image is embodied.

Disclosure of Invention

In order to solve the problems, the invention discloses a static gesture recognition method based on an infrared camera, which overcomes the defects that gesture recognition cannot be realized in an environment with insufficient light or a complex background, the recognition speed of a traditional learning algorithm is low and the detection accuracy is not high in the prior art, and provides the static gesture recognition method based on the infrared camera, so that the gesture recognition can be realized quickly, accurately and efficiently in various environments.

In order to achieve the purpose, the invention provides the following technical scheme: a static gesture recognition method based on an infrared camera comprises the following steps:

step 1, training a convolutional neural network;

step 2, acquiring an infrared image, and reading infrared image data of an infrared camera;

step 3, preprocessing the image, matching the size of the scaled image with the input of a convolutional neural network, and normalizing the image data;

and 4, recognizing the static gesture, extracting gesture features by a convolutional neural network, and normalizing an exponential function to obtain a recognition result.

Further, the training of the convolutional neural network in step 1 includes the following steps:

step 1-1, building a convolutional neural network;

step 1-2, making a training sample set and a testing sample set;

and 1-3, training the constructed convolutional neural network by using the sample set.

Further, the structure of the convolutional neural network is built in the step 1-1:

based on the AlexNet model proposed in 2012, the network structure has 8 layers, the front 5 layers are convolution layers, the rear 3 layers are full connection layers, and the output of the last full connection layer is transmitted to the softmax layer and corresponds to different classification labels. And on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace a large convolution kernel operation of the original model, so that the calculated amount is reduced, and the model efficiency is improved.

Further, the step 1-2 of making the training sample set and the testing sample set includes the following steps:

step 1-2-1, acquiring infrared gesture images of 10 different people at different angles under a single background;

step 1-2-2, performing data amplification on an original image by using an ImageDataGenerator tool kit built in a keras framework to avoid training overfitting caused by a small data set;

step 1-2-3, randomly disordering the sample set to improve the prediction result of the model in the test set;

and 1-2-4, converting the amplified training sample set and the test sample set into an IDX data format so as to be convenient for storing and reading the sample sets.

Further, the training of the constructed convolutional neural network by using the sample set in the steps 1-3 comprises the following training techniques:

initializing the network with Xavier by formula

In the formula, n_iIndicates the number of neurons in layer i, n_i+1Indicates the number of neurons in layer i +1

Initializing parameters to be within the range;

adopting RMSProp model optimization algorithm and adopting a formula

Sdw＝βSdw+{1-β}dw² (2)

Sdb＝βSdb+{1-β}db² (3)

Wherein w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent squaring differential terms, alpha represents learning rate, and epsilon is a small number (e.g. 10^ -8) added in actual operation, in order to prevent numerical instability caused by too small denominator

Updating the network and accelerating the learning speed;

by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability; the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;

further, the image preprocessing in step 3 includes the following steps:

the image size is reduced in an equal proportion, and the boundary is filled with 0 pixel, so that the image size is matched with the input of the convolutional neural network;

the gray data of the image to be detected 0-255 are normalized to be between-1 and 1, so that adverse effects caused by singular sample data are eliminated, the recognition accuracy is improved, the model convergence is accelerated, and the training speed is increased.

Further, the static gesture recognition in step 4 includes the following steps:

extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the model trained in the step 1;

by normalizing the exponential function formula

In the formula, the jth element is represented, the sum of all elements is represented, the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the proportion of each classification label is calculated within the range of each element between (0 and 1), and the label corresponding to the maximum proportion is selected as the recognition result to be output.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the static gesture recognition based on the infrared camera provided by the invention can realize rapid, accurate and efficient gesture recognition in an environment with insufficient light or a complex background. Compared with the traditional learning algorithm which is low in identification speed and low in detection accuracy, the convolutional neural network-based learning algorithm saves artificial characteristic engineering, reduces parameters and is good in algorithm robustness.

Drawings

FIG. 1 is a diagram of 10 gesture classifications provided by the embodiments of the present invention;

FIG. 2 is a flowchart of a static gesture recognition method based on an infrared camera according to the present invention;

FIG. 3 is a training process of convolutional neural network in step 1 of the present invention;

FIG. 4 is the structure of convolutional neural network in step 1 of the present invention;

FIG. 5 is the visualization result of the weights of the first two convolutional layers of the convolutional neural network in step 1 of the present invention.

Detailed Description

The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.

The embodiment realizes the classification and recognition of 10 gestures, and the gesture categories are shown in fig. 1. The following description is based on the present embodiment of a static gesture recognition method based on a thermal imager camera, and the specific steps are shown in fig. 2, and the method includes the following steps:

step 1, training of convolutional neural network, as shown in FIG. 3

The method comprises the following steps:

step 1-1, building a convolutional neural network, wherein the network structure is shown in FIG. 4;

step 1-2, making a training sample set and a test sample set

The method comprises the following four steps: acquiring 10 infrared gesture images of different people at different angles under a single background; performing data amplification on the original image by using an ImageDataGenerator tool kit built in a keras framework; randomly scrambling the sample set; converting the amplified training sample set and the amplified test sample set into an IDX data format;

step 1-3, training the constructed convolutional neural network by using a sample set, and showing the weight visualization result of the convolutional layer as shown in FIG. 5

The method comprises the following four training skills:

initializing the network with Xavier by formula

Initializing parameters to be within the range;

adopting RMSProp model optimization algorithm and adopting a formula

Sdw＝βSdw+{1-β}dw² (2)

Sdb＝βSdb+{1-β}db² (3)

Updating the network and accelerating the learning speed;

by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability;

and (3) adopting batch normalization operation to relieve gradient dispersion, and normalizing the data of each layer in the middle of the network so as to relieve the gradient dispersion.

Step 2, acquiring infrared image

And reading infrared image data of the infrared camera.

Step 3, image preprocessing

The method comprises the following steps:

Step 4, static gesture recognition

The method comprises the following steps:

extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the trained model;

by normalizing the exponential function formula

Claims

1. A static gesture recognition method based on an infrared camera is characterized by comprising the following steps:

step 1, training a convolutional neural network: building a convolutional neural network, manufacturing a training sample set and a test training sample set, and training the built convolutional neural network by using the training sample set;

step 2, acquiring an infrared image: reading infrared image data of an infrared camera;

step 3, image preprocessing: the size of the scaled image is matched with the input of the convolutional neural network, and the image data is normalized;

step 4, static gesture recognition: extracting gesture features by the convolutional neural network, and normalizing an exponential function to obtain a recognition result;

the structure of the convolutional neural network built in the step 1 is an AlexNet model, the network structure comprises 8 layers, the front 5 layers are convolutional layers, the rear 3 layers are full-connection layers, and the output of the last full-connection layer is transmitted to a softmax layer and corresponds to different classification labels;

on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace the large convolution kernel operation of the original model;

the step 1 of making the training sample set and testing the training sample set comprises the following steps:

step 1-1, acquiring infrared gesture images of 10 different people at different angles under a single background;

step 1-2, performing data amplification on an original image by using an ImageDataGenerator kit built in a keras framework to avoid training and fitting of a small data set;

step 1-3, randomly disordering the sample set to improve the prediction result of the model in the test set;

step 1-4, converting the amplified training sample set and the test training sample set into an IDX data format so as to be convenient for storing and reading the sample set;

the step 1 of training the constructed convolutional neural network by using the training sample set comprises the following training steps:

initializing a network by using Xavier; by the formula

Initializing parameters to the range so as to meet the condition that the variance of the activation value and the variance of the state gradient of each layer in the propagation process is consistent;

updating the network by using RMSProp model optimization algorithm, and passing through a formula

Sdw＝βSdw+{1-β}dw² (2)

Sdb＝βSdb+{1-β}db² (3)

In the formula, w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent the squaring of differential terms, and alpha represents a learning rate;

e is a value added in actual operation, in order to prevent the value instability caused by too small denominator; squaring the differential term, and then performing gradient updating by using the square root to reduce the swing on the path reaching the minimum value and accelerate the learning speed;

a random inactivation network regularization algorithm is adopted to avoid overfitting of a training result, and partial neurons are reserved by setting random inactivation probability to form a small-scale network;

the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;

the training speed is accelerated by adopting a packet convolution method.

2. The infrared camera-based static gesture recognition method of claim 1, wherein in the step 3, the scaled image size is matched with the input of the convolutional neural network, the image size is scaled down in an equal proportion, and the boundary is filled with 0 pixel, so as to match the image size with the input of the convolutional neural network.

3. The infrared camera based static gesture recognition method according to claim 1, wherein in the step 3, the image data is normalized, and the gray scale data of the image to be detected from 0 to 255 is normalized to-1 to 1, so as to eliminate adverse effects caused by singular sample data, improve recognition accuracy, accelerate model convergence and improve training speed.

4. The method for recognizing the static gesture based on the infrared camera according to claim 1, wherein the gesture features are extracted by using a convolutional neural network in the step 4, and the features of the infrared gesture image to be detected are extracted through convolution and pooling calculation based on the model trained in the step 1.

5. The method according to claim 1, wherein the exponential function is normalized in step 4 by a formula

In the formula (I), the compound is shown in the specification,

it is shown that the j-th element,

the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the range of each element is between (0 and 1), the proportion of each classification label is calculated, and the label corresponding to the maximum proportion is selected as the recognition result to be output.