CN109508670B - Static gesture recognition method based on infrared camera - Google Patents
Static gesture recognition method based on infrared camera Download PDFInfo
- Publication number
- CN109508670B CN109508670B CN201811341659.8A CN201811341659A CN109508670B CN 109508670 B CN109508670 B CN 109508670B CN 201811341659 A CN201811341659 A CN 201811341659A CN 109508670 B CN109508670 B CN 109508670B
- Authority
- CN
- China
- Prior art keywords
- training
- neural network
- image
- convolutional neural
- infrared
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
- G06V40/113—Recognition of static hand signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
Abstract
The invention provides a static gesture recognition method based on an infrared camera, belonging to the field of image processing technology and gesture recognition of Infrared (IR) images, and mainly comprising the following steps: preprocessing an infrared image; constructing a convolutional neural network, and extracting features of the infrared gesture image; and outputting the final gesture classification according to the classification weight ratio. Compared with the traditional visible light camera, the infrared camera does not depend on the external environment light any more, so that the static gesture recognition method based on the infrared camera provided by the invention can effectively and accurately extract gesture features under the conditions of no light, weak light, illumination in different scenes and background noise interference, can accurately classify and recognize gestures, can output correct expected results, and has good algorithm robustness.
Description
Technical Field
The invention relates to a static gesture recognition technology based on an infrared camera, and belongs to the technical field of image processing gesture recognition.
Background
Along with the increasing frequency and diversification of human-computer interaction, the demand of people on the simplicity and convenience of human-computer interaction is continuously improved, and the limitation of the traditional human-computer interaction mode is more and more shown by depending on a mouse and a keyboard. And a more natural and direct human-computer interaction interface is provided based on non-contact gesture recognition, so that the operation is simple and the flexibility is high. In recent years, with the development of sensors, the accuracy and the portability degree are greatly improved, and the gesture recognition enters the usability stage.
Gesture recognition based on a common camera depends on ambient light, once light is insufficient or no light environment exists, gesture recognition accuracy is reduced or even recognition cannot be achieved. Meanwhile, under a complex background or a near-skin color background, the common camera has poor gesture feature extraction effect, and influences the final recognition result. Compared with the prior art, the gesture recognition based on the infrared camera has wider application scenes.
Compared with the traditional machine learning algorithm, the model based on the convolutional neural network saves artificial characteristic engineering, solves the problem in a centralized mode and realizes a closed end-to-end learning mode. Meanwhile, compared with the traditional neural network, the parameters are reduced by sharing the weight, and the idea of local perception of the image is embodied.
Disclosure of Invention
In order to solve the problems, the invention discloses a static gesture recognition method based on an infrared camera, which overcomes the defects that gesture recognition cannot be realized in an environment with insufficient light or a complex background, the recognition speed of a traditional learning algorithm is low and the detection accuracy is not high in the prior art, and provides the static gesture recognition method based on the infrared camera, so that the gesture recognition can be realized quickly, accurately and efficiently in various environments.
In order to achieve the purpose, the invention provides the following technical scheme: a static gesture recognition method based on an infrared camera comprises the following steps:
step 1, training a convolutional neural network;
step 3, preprocessing the image, matching the size of the scaled image with the input of a convolutional neural network, and normalizing the image data;
and 4, recognizing the static gesture, extracting gesture features by a convolutional neural network, and normalizing an exponential function to obtain a recognition result.
Further, the training of the convolutional neural network in step 1 includes the following steps:
step 1-1, building a convolutional neural network;
step 1-2, making a training sample set and a testing sample set;
and 1-3, training the constructed convolutional neural network by using the sample set.
Further, the structure of the convolutional neural network is built in the step 1-1:
based on the AlexNet model proposed in 2012, the network structure has 8 layers, the front 5 layers are convolution layers, the rear 3 layers are full connection layers, and the output of the last full connection layer is transmitted to the softmax layer and corresponds to different classification labels. And on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace a large convolution kernel operation of the original model, so that the calculated amount is reduced, and the model efficiency is improved.
Further, the step 1-2 of making the training sample set and the testing sample set includes the following steps:
step 1-2-1, acquiring infrared gesture images of 10 different people at different angles under a single background;
step 1-2-2, performing data amplification on an original image by using an ImageDataGenerator tool kit built in a keras framework to avoid training overfitting caused by a small data set;
step 1-2-3, randomly disordering the sample set to improve the prediction result of the model in the test set;
and 1-2-4, converting the amplified training sample set and the test sample set into an IDX data format so as to be convenient for storing and reading the sample sets.
Further, the training of the constructed convolutional neural network by using the sample set in the steps 1-3 comprises the following training techniques:
initializing the network with Xavier by formula
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to be within the range;
adopting RMSProp model optimization algorithm and adopting a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
Wherein w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent squaring differential terms, alpha represents learning rate, and epsilon is a small number (e.g. 10^ -8) added in actual operation, in order to prevent numerical instability caused by too small denominator
Updating the network and accelerating the learning speed;
by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability; the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;
further, the image preprocessing in step 3 includes the following steps:
the image size is reduced in an equal proportion, and the boundary is filled with 0 pixel, so that the image size is matched with the input of the convolutional neural network;
the gray data of the image to be detected 0-255 are normalized to be between-1 and 1, so that adverse effects caused by singular sample data are eliminated, the recognition accuracy is improved, the model convergence is accelerated, and the training speed is increased.
Further, the static gesture recognition in step 4 includes the following steps:
extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the model trained in the step 1;
by normalizing the exponential function formula
In the formula, the jth element is represented, the sum of all elements is represented, the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the proportion of each classification label is calculated within the range of each element between (0 and 1), and the label corresponding to the maximum proportion is selected as the recognition result to be output.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the static gesture recognition based on the infrared camera provided by the invention can realize rapid, accurate and efficient gesture recognition in an environment with insufficient light or a complex background. Compared with the traditional learning algorithm which is low in identification speed and low in detection accuracy, the convolutional neural network-based learning algorithm saves artificial characteristic engineering, reduces parameters and is good in algorithm robustness.
Drawings
FIG. 1 is a diagram of 10 gesture classifications provided by the embodiments of the present invention;
FIG. 2 is a flowchart of a static gesture recognition method based on an infrared camera according to the present invention;
FIG. 3 is a training process of convolutional neural network in step 1 of the present invention;
FIG. 4 is the structure of convolutional neural network in step 1 of the present invention;
FIG. 5 is the visualization result of the weights of the first two convolutional layers of the convolutional neural network in step 1 of the present invention.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The embodiment realizes the classification and recognition of 10 gestures, and the gesture categories are shown in fig. 1. The following description is based on the present embodiment of a static gesture recognition method based on a thermal imager camera, and the specific steps are shown in fig. 2, and the method includes the following steps:
step 1, training of convolutional neural network, as shown in FIG. 3
The method comprises the following steps:
step 1-1, building a convolutional neural network, wherein the network structure is shown in FIG. 4;
step 1-2, making a training sample set and a test sample set
The method comprises the following four steps: acquiring 10 infrared gesture images of different people at different angles under a single background; performing data amplification on the original image by using an ImageDataGenerator tool kit built in a keras framework; randomly scrambling the sample set; converting the amplified training sample set and the amplified test sample set into an IDX data format;
step 1-3, training the constructed convolutional neural network by using a sample set, and showing the weight visualization result of the convolutional layer as shown in FIG. 5
The method comprises the following four training skills:
initializing the network with Xavier by formula
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to be within the range;
adopting RMSProp model optimization algorithm and adopting a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
Wherein w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent squaring differential terms, alpha represents learning rate, and epsilon is a small number (e.g. 10^ -8) added in actual operation, in order to prevent numerical instability caused by too small denominator
Updating the network and accelerating the learning speed;
by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability;
and (3) adopting batch normalization operation to relieve gradient dispersion, and normalizing the data of each layer in the middle of the network so as to relieve the gradient dispersion.
And reading infrared image data of the infrared camera.
Step 3, image preprocessing
The method comprises the following steps:
the image size is reduced in an equal proportion, and the boundary is filled with 0 pixel, so that the image size is matched with the input of the convolutional neural network;
the gray data of the image to be detected 0-255 are normalized to be between-1 and 1, so that adverse effects caused by singular sample data are eliminated, the recognition accuracy is improved, the model convergence is accelerated, and the training speed is increased.
Step 4, static gesture recognition
The method comprises the following steps:
extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the trained model;
by normalizing the exponential function formula
In the formula, the jth element is represented, the sum of all elements is represented, the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the proportion of each classification label is calculated within the range of each element between (0 and 1), and the label corresponding to the maximum proportion is selected as the recognition result to be output.
Claims (5)
1. A static gesture recognition method based on an infrared camera is characterized by comprising the following steps:
step 1, training a convolutional neural network: building a convolutional neural network, manufacturing a training sample set and a test training sample set, and training the built convolutional neural network by using the training sample set;
step 2, acquiring an infrared image: reading infrared image data of an infrared camera;
step 3, image preprocessing: the size of the scaled image is matched with the input of the convolutional neural network, and the image data is normalized;
step 4, static gesture recognition: extracting gesture features by the convolutional neural network, and normalizing an exponential function to obtain a recognition result;
the structure of the convolutional neural network built in the step 1 is an AlexNet model, the network structure comprises 8 layers, the front 5 layers are convolutional layers, the rear 3 layers are full-connection layers, and the output of the last full-connection layer is transmitted to a softmax layer and corresponds to different classification labels;
on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace the large convolution kernel operation of the original model;
the step 1 of making the training sample set and testing the training sample set comprises the following steps:
step 1-1, acquiring infrared gesture images of 10 different people at different angles under a single background;
step 1-2, performing data amplification on an original image by using an ImageDataGenerator kit built in a keras framework to avoid training and fitting of a small data set;
step 1-3, randomly disordering the sample set to improve the prediction result of the model in the test set;
step 1-4, converting the amplified training sample set and the test training sample set into an IDX data format so as to be convenient for storing and reading the sample set;
the step 1 of training the constructed convolutional neural network by using the training sample set comprises the following training steps:
initializing a network by using Xavier; by the formula
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to the range so as to meet the condition that the variance of the activation value and the variance of the state gradient of each layer in the propagation process is consistent;
updating the network by using RMSProp model optimization algorithm, and passing through a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
In the formula, w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent the squaring of differential terms, and alpha represents a learning rate;
e is a value added in actual operation, in order to prevent the value instability caused by too small denominator; squaring the differential term, and then performing gradient updating by using the square root to reduce the swing on the path reaching the minimum value and accelerate the learning speed;
a random inactivation network regularization algorithm is adopted to avoid overfitting of a training result, and partial neurons are reserved by setting random inactivation probability to form a small-scale network;
the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;
the training speed is accelerated by adopting a packet convolution method.
2. The infrared camera-based static gesture recognition method of claim 1, wherein in the step 3, the scaled image size is matched with the input of the convolutional neural network, the image size is scaled down in an equal proportion, and the boundary is filled with 0 pixel, so as to match the image size with the input of the convolutional neural network.
3. The infrared camera based static gesture recognition method according to claim 1, wherein in the step 3, the image data is normalized, and the gray scale data of the image to be detected from 0 to 255 is normalized to-1 to 1, so as to eliminate adverse effects caused by singular sample data, improve recognition accuracy, accelerate model convergence and improve training speed.
4. The method for recognizing the static gesture based on the infrared camera according to claim 1, wherein the gesture features are extracted by using a convolutional neural network in the step 4, and the features of the infrared gesture image to be detected are extracted through convolution and pooling calculation based on the model trained in the step 1.
5. The method according to claim 1, wherein the exponential function is normalized in step 4 by a formula
In the formula (I), the compound is shown in the specification,it is shown that the j-th element,the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the range of each element is between (0 and 1), the proportion of each classification label is calculated, and the label corresponding to the maximum proportion is selected as the recognition result to be output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811341659.8A CN109508670B (en) | 2018-11-12 | 2018-11-12 | Static gesture recognition method based on infrared camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811341659.8A CN109508670B (en) | 2018-11-12 | 2018-11-12 | Static gesture recognition method based on infrared camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109508670A CN109508670A (en) | 2019-03-22 |
CN109508670B true CN109508670B (en) | 2021-10-12 |
Family
ID=65748160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811341659.8A Active CN109508670B (en) | 2018-11-12 | 2018-11-12 | Static gesture recognition method based on infrared camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508670B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245578A (en) * | 2019-05-24 | 2019-09-17 | 北京大学 | A kind of gesture identification method using quanta particle swarm optimization optimization neural network weight |
KR20220010885A (en) | 2020-07-20 | 2022-01-27 | 에스케이하이닉스 주식회사 | Apparatus for recognizing motion by using ToF sensor, and method for operating the same |
CN115471917B (en) * | 2022-09-29 | 2024-02-27 | 中国电子科技集团公司信息科学研究院 | Gesture detection and recognition system and method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN108334814A (en) * | 2018-01-11 | 2018-07-27 | 浙江工业大学 | A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis |
CN108537794A (en) * | 2018-04-19 | 2018-09-14 | 上海联影医疗科技有限公司 | Medical image processing method, device and computer readable storage medium |
CN108537147A (en) * | 2018-03-22 | 2018-09-14 | 东华大学 | A kind of gesture identification method based on deep learning |
CN108734273A (en) * | 2018-04-17 | 2018-11-02 | 同济大学 | A kind of SQRT Activiation methods applied to neural network |
CN109196518A (en) * | 2018-08-23 | 2019-01-11 | 合刃科技(深圳)有限公司 | A kind of gesture identification method and device based on high light spectrum image-forming |
-
2018
- 2018-11-12 CN CN201811341659.8A patent/CN109508670B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN108334814A (en) * | 2018-01-11 | 2018-07-27 | 浙江工业大学 | A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis |
CN108537147A (en) * | 2018-03-22 | 2018-09-14 | 东华大学 | A kind of gesture identification method based on deep learning |
CN108734273A (en) * | 2018-04-17 | 2018-11-02 | 同济大学 | A kind of SQRT Activiation methods applied to neural network |
CN108537794A (en) * | 2018-04-19 | 2018-09-14 | 上海联影医疗科技有限公司 | Medical image processing method, device and computer readable storage medium |
CN109196518A (en) * | 2018-08-23 | 2019-01-11 | 合刃科技(深圳)有限公司 | A kind of gesture identification method and device based on high light spectrum image-forming |
Non-Patent Citations (1)
Title |
---|
基于卷积神经网络的红外图像识别研究与实现;郑辉;《中国优秀硕士学位论文全文数据库信息科技辑》;20180615;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109508670A (en) | 2019-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11417148B2 (en) | Human face image classification method and apparatus, and server | |
CN109359608B (en) | Face recognition method based on deep learning model | |
CN109190442B (en) | Rapid face detection method based on deep cascade convolution neural network | |
CN108108764B (en) | Visual SLAM loop detection method based on random forest | |
CN109101938B (en) | Multi-label age estimation method based on convolutional neural network | |
CN108665005B (en) | Method for improving CNN-based image recognition performance by using DCGAN | |
CN111753828B (en) | Natural scene horizontal character detection method based on deep convolutional neural network | |
CN109508670B (en) | Static gesture recognition method based on infrared camera | |
CN111832546B (en) | Lightweight natural scene text recognition method | |
CN111460980B (en) | Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion | |
CN108961675A (en) | Fall detection method based on convolutional neural networks | |
CN112580590A (en) | Finger vein identification method based on multi-semantic feature fusion network | |
CN106372624B (en) | Face recognition method and system | |
Ku et al. | Face recognition based on mtcnn and convolutional neural network | |
CN110543906B (en) | Automatic skin recognition method based on Mask R-CNN model | |
CN111428557A (en) | Method and device for automatically checking handwritten signature based on neural network model | |
CN111401156B (en) | Image identification method based on Gabor convolution neural network | |
CN109034066A (en) | Building identification method based on multi-feature fusion | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN113065426B (en) | Gesture image feature fusion method based on channel perception | |
CN112364974B (en) | YOLOv3 algorithm based on activation function improvement | |
CN112883931A (en) | Real-time true and false motion judgment method based on long and short term memory network | |
Xie et al. | Research on MTCNN face recognition system in low computing power scenarios | |
CN110136098B (en) | Cable sequence detection method based on deep learning | |
Hsia et al. | A fast face detection method for illumination variant condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |