CN109508670B - Static gesture recognition method based on infrared camera - Google Patents

Static gesture recognition method based on infrared camera Download PDF

Info

Publication number
CN109508670B
CN109508670B CN201811341659.8A CN201811341659A CN109508670B CN 109508670 B CN109508670 B CN 109508670B CN 201811341659 A CN201811341659 A CN 201811341659A CN 109508670 B CN109508670 B CN 109508670B
Authority
CN
China
Prior art keywords
training
neural network
image
convolutional neural
infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811341659.8A
Other languages
Chinese (zh)
Other versions
CN109508670A (en
Inventor
金展翌
张�雄
樊兆雯
仲雪飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201811341659.8A priority Critical patent/CN109508670B/en
Publication of CN109508670A publication Critical patent/CN109508670A/en
Application granted granted Critical
Publication of CN109508670B publication Critical patent/CN109508670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths

Abstract

The invention provides a static gesture recognition method based on an infrared camera, belonging to the field of image processing technology and gesture recognition of Infrared (IR) images, and mainly comprising the following steps: preprocessing an infrared image; constructing a convolutional neural network, and extracting features of the infrared gesture image; and outputting the final gesture classification according to the classification weight ratio. Compared with the traditional visible light camera, the infrared camera does not depend on the external environment light any more, so that the static gesture recognition method based on the infrared camera provided by the invention can effectively and accurately extract gesture features under the conditions of no light, weak light, illumination in different scenes and background noise interference, can accurately classify and recognize gestures, can output correct expected results, and has good algorithm robustness.

Description

Static gesture recognition method based on infrared camera
Technical Field
The invention relates to a static gesture recognition technology based on an infrared camera, and belongs to the technical field of image processing gesture recognition.
Background
Along with the increasing frequency and diversification of human-computer interaction, the demand of people on the simplicity and convenience of human-computer interaction is continuously improved, and the limitation of the traditional human-computer interaction mode is more and more shown by depending on a mouse and a keyboard. And a more natural and direct human-computer interaction interface is provided based on non-contact gesture recognition, so that the operation is simple and the flexibility is high. In recent years, with the development of sensors, the accuracy and the portability degree are greatly improved, and the gesture recognition enters the usability stage.
Gesture recognition based on a common camera depends on ambient light, once light is insufficient or no light environment exists, gesture recognition accuracy is reduced or even recognition cannot be achieved. Meanwhile, under a complex background or a near-skin color background, the common camera has poor gesture feature extraction effect, and influences the final recognition result. Compared with the prior art, the gesture recognition based on the infrared camera has wider application scenes.
Compared with the traditional machine learning algorithm, the model based on the convolutional neural network saves artificial characteristic engineering, solves the problem in a centralized mode and realizes a closed end-to-end learning mode. Meanwhile, compared with the traditional neural network, the parameters are reduced by sharing the weight, and the idea of local perception of the image is embodied.
Disclosure of Invention
In order to solve the problems, the invention discloses a static gesture recognition method based on an infrared camera, which overcomes the defects that gesture recognition cannot be realized in an environment with insufficient light or a complex background, the recognition speed of a traditional learning algorithm is low and the detection accuracy is not high in the prior art, and provides the static gesture recognition method based on the infrared camera, so that the gesture recognition can be realized quickly, accurately and efficiently in various environments.
In order to achieve the purpose, the invention provides the following technical scheme: a static gesture recognition method based on an infrared camera comprises the following steps:
step 1, training a convolutional neural network;
step 2, acquiring an infrared image, and reading infrared image data of an infrared camera;
step 3, preprocessing the image, matching the size of the scaled image with the input of a convolutional neural network, and normalizing the image data;
and 4, recognizing the static gesture, extracting gesture features by a convolutional neural network, and normalizing an exponential function to obtain a recognition result.
Further, the training of the convolutional neural network in step 1 includes the following steps:
step 1-1, building a convolutional neural network;
step 1-2, making a training sample set and a testing sample set;
and 1-3, training the constructed convolutional neural network by using the sample set.
Further, the structure of the convolutional neural network is built in the step 1-1:
based on the AlexNet model proposed in 2012, the network structure has 8 layers, the front 5 layers are convolution layers, the rear 3 layers are full connection layers, and the output of the last full connection layer is transmitted to the softmax layer and corresponds to different classification labels. And on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace a large convolution kernel operation of the original model, so that the calculated amount is reduced, and the model efficiency is improved.
Further, the step 1-2 of making the training sample set and the testing sample set includes the following steps:
step 1-2-1, acquiring infrared gesture images of 10 different people at different angles under a single background;
step 1-2-2, performing data amplification on an original image by using an ImageDataGenerator tool kit built in a keras framework to avoid training overfitting caused by a small data set;
step 1-2-3, randomly disordering the sample set to improve the prediction result of the model in the test set;
and 1-2-4, converting the amplified training sample set and the test sample set into an IDX data format so as to be convenient for storing and reading the sample sets.
Further, the training of the constructed convolutional neural network by using the sample set in the steps 1-3 comprises the following training techniques:
initializing the network with Xavier by formula
Figure BDA0001862725580000021
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to be within the range;
adopting RMSProp model optimization algorithm and adopting a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
Figure BDA0001862725580000031
Figure BDA0001862725580000032
Wherein w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent squaring differential terms, alpha represents learning rate, and epsilon is a small number (e.g. 10^ -8) added in actual operation, in order to prevent numerical instability caused by too small denominator
Updating the network and accelerating the learning speed;
by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability; the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;
further, the image preprocessing in step 3 includes the following steps:
the image size is reduced in an equal proportion, and the boundary is filled with 0 pixel, so that the image size is matched with the input of the convolutional neural network;
the gray data of the image to be detected 0-255 are normalized to be between-1 and 1, so that adverse effects caused by singular sample data are eliminated, the recognition accuracy is improved, the model convergence is accelerated, and the training speed is increased.
Further, the static gesture recognition in step 4 includes the following steps:
extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the model trained in the step 1;
by normalizing the exponential function formula
Figure BDA0001862725580000033
In the formula, the jth element is represented, the sum of all elements is represented, the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the proportion of each classification label is calculated within the range of each element between (0 and 1), and the label corresponding to the maximum proportion is selected as the recognition result to be output.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the static gesture recognition based on the infrared camera provided by the invention can realize rapid, accurate and efficient gesture recognition in an environment with insufficient light or a complex background. Compared with the traditional learning algorithm which is low in identification speed and low in detection accuracy, the convolutional neural network-based learning algorithm saves artificial characteristic engineering, reduces parameters and is good in algorithm robustness.
Drawings
FIG. 1 is a diagram of 10 gesture classifications provided by the embodiments of the present invention;
FIG. 2 is a flowchart of a static gesture recognition method based on an infrared camera according to the present invention;
FIG. 3 is a training process of convolutional neural network in step 1 of the present invention;
FIG. 4 is the structure of convolutional neural network in step 1 of the present invention;
FIG. 5 is the visualization result of the weights of the first two convolutional layers of the convolutional neural network in step 1 of the present invention.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The embodiment realizes the classification and recognition of 10 gestures, and the gesture categories are shown in fig. 1. The following description is based on the present embodiment of a static gesture recognition method based on a thermal imager camera, and the specific steps are shown in fig. 2, and the method includes the following steps:
step 1, training of convolutional neural network, as shown in FIG. 3
The method comprises the following steps:
step 1-1, building a convolutional neural network, wherein the network structure is shown in FIG. 4;
step 1-2, making a training sample set and a test sample set
The method comprises the following four steps: acquiring 10 infrared gesture images of different people at different angles under a single background; performing data amplification on the original image by using an ImageDataGenerator tool kit built in a keras framework; randomly scrambling the sample set; converting the amplified training sample set and the amplified test sample set into an IDX data format;
step 1-3, training the constructed convolutional neural network by using a sample set, and showing the weight visualization result of the convolutional layer as shown in FIG. 5
The method comprises the following four training skills:
initializing the network with Xavier by formula
Figure BDA0001862725580000041
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to be within the range;
adopting RMSProp model optimization algorithm and adopting a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
Figure BDA0001862725580000051
Figure BDA0001862725580000052
Wherein w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent squaring differential terms, alpha represents learning rate, and epsilon is a small number (e.g. 10^ -8) added in actual operation, in order to prevent numerical instability caused by too small denominator
Updating the network and accelerating the learning speed;
by adopting a random inactivation network regularization algorithm, overfitting of a training result is avoided by setting random inactivation probability;
and (3) adopting batch normalization operation to relieve gradient dispersion, and normalizing the data of each layer in the middle of the network so as to relieve the gradient dispersion.
Step 2, acquiring infrared image
And reading infrared image data of the infrared camera.
Step 3, image preprocessing
The method comprises the following steps:
the image size is reduced in an equal proportion, and the boundary is filled with 0 pixel, so that the image size is matched with the input of the convolutional neural network;
the gray data of the image to be detected 0-255 are normalized to be between-1 and 1, so that adverse effects caused by singular sample data are eliminated, the recognition accuracy is improved, the model convergence is accelerated, and the training speed is increased.
Step 4, static gesture recognition
The method comprises the following steps:
extracting the characteristics of the infrared gesture image to be detected through convolution and pooling calculation based on the trained model;
by normalizing the exponential function formula
Figure BDA0001862725580000053
In the formula, the jth element is represented, the sum of all elements is represented, the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the proportion of each classification label is calculated within the range of each element between (0 and 1), and the label corresponding to the maximum proportion is selected as the recognition result to be output.

Claims (5)

1. A static gesture recognition method based on an infrared camera is characterized by comprising the following steps:
step 1, training a convolutional neural network: building a convolutional neural network, manufacturing a training sample set and a test training sample set, and training the built convolutional neural network by using the training sample set;
step 2, acquiring an infrared image: reading infrared image data of an infrared camera;
step 3, image preprocessing: the size of the scaled image is matched with the input of the convolutional neural network, and the image data is normalized;
step 4, static gesture recognition: extracting gesture features by the convolutional neural network, and normalizing an exponential function to obtain a recognition result;
the structure of the convolutional neural network built in the step 1 is an AlexNet model, the network structure comprises 8 layers, the front 5 layers are convolutional layers, the rear 3 layers are full-connection layers, and the output of the last full-connection layer is transmitted to a softmax layer and corresponds to different classification labels;
on the basis of the original model, a small convolution kernel and Bottleneeck operation are adopted to replace the large convolution kernel operation of the original model;
the step 1 of making the training sample set and testing the training sample set comprises the following steps:
step 1-1, acquiring infrared gesture images of 10 different people at different angles under a single background;
step 1-2, performing data amplification on an original image by using an ImageDataGenerator kit built in a keras framework to avoid training and fitting of a small data set;
step 1-3, randomly disordering the sample set to improve the prediction result of the model in the test set;
step 1-4, converting the amplified training sample set and the test training sample set into an IDX data format so as to be convenient for storing and reading the sample set;
the step 1 of training the constructed convolutional neural network by using the training sample set comprises the following training steps:
initializing a network by using Xavier; by the formula
Figure FDA0003189263060000021
In the formula, niIndicates the number of neurons in layer i, ni+1Indicates the number of neurons in layer i +1
Initializing parameters to the range so as to meet the condition that the variance of the activation value and the variance of the state gradient of each layer in the propagation process is consistent;
updating the network by using RMSProp model optimization algorithm, and passing through a formula
Sdw=βSdw+{1-β}dw2 (2)
Sdb=βSdb+{1-β}db2 (3)
Figure FDA0003189263060000022
Figure FDA0003189263060000023
In the formula, w and b represent parameters to be solved, dw and db represent parameter gradients, Sdw and Sdb represent the squaring of differential terms, and alpha represents a learning rate;
e is a value added in actual operation, in order to prevent the value instability caused by too small denominator; squaring the differential term, and then performing gradient updating by using the square root to reduce the swing on the path reaching the minimum value and accelerate the learning speed;
a random inactivation network regularization algorithm is adopted to avoid overfitting of a training result, and partial neurons are reserved by setting random inactivation probability to form a small-scale network;
the gradient dispersion is relieved by batch normalization operation, so that the data of each layer in the middle of the network are normalized to relieve the gradient dispersion;
the training speed is accelerated by adopting a packet convolution method.
2. The infrared camera-based static gesture recognition method of claim 1, wherein in the step 3, the scaled image size is matched with the input of the convolutional neural network, the image size is scaled down in an equal proportion, and the boundary is filled with 0 pixel, so as to match the image size with the input of the convolutional neural network.
3. The infrared camera based static gesture recognition method according to claim 1, wherein in the step 3, the image data is normalized, and the gray scale data of the image to be detected from 0 to 255 is normalized to-1 to 1, so as to eliminate adverse effects caused by singular sample data, improve recognition accuracy, accelerate model convergence and improve training speed.
4. The method for recognizing the static gesture based on the infrared camera according to claim 1, wherein the gesture features are extracted by using a convolutional neural network in the step 4, and the features of the infrared gesture image to be detected are extracted through convolution and pooling calculation based on the model trained in the step 1.
5. The method according to claim 1, wherein the exponential function is normalized in step 4 by a formula
Figure FDA0003189263060000031
In the formula (I), the compound is shown in the specification,
Figure FDA0003189263060000032
it is shown that the j-th element,
Figure FDA0003189263060000033
the function realizes that a K-dimensional vector containing any real number is compressed into another K-dimensional real vector, the range of each element is between (0 and 1), the proportion of each classification label is calculated, and the label corresponding to the maximum proportion is selected as the recognition result to be output.
CN201811341659.8A 2018-11-12 2018-11-12 Static gesture recognition method based on infrared camera Active CN109508670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811341659.8A CN109508670B (en) 2018-11-12 2018-11-12 Static gesture recognition method based on infrared camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811341659.8A CN109508670B (en) 2018-11-12 2018-11-12 Static gesture recognition method based on infrared camera

Publications (2)

Publication Number Publication Date
CN109508670A CN109508670A (en) 2019-03-22
CN109508670B true CN109508670B (en) 2021-10-12

Family

ID=65748160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811341659.8A Active CN109508670B (en) 2018-11-12 2018-11-12 Static gesture recognition method based on infrared camera

Country Status (1)

Country Link
CN (1) CN109508670B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245578A (en) * 2019-05-24 2019-09-17 北京大学 A kind of gesture identification method using quanta particle swarm optimization optimization neural network weight
KR20220010885A (en) 2020-07-20 2022-01-27 에스케이하이닉스 주식회사 Apparatus for recognizing motion by using ToF sensor, and method for operating the same
CN115471917B (en) * 2022-09-29 2024-02-27 中国电子科技集团公司信息科学研究院 Gesture detection and recognition system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
CN108537794A (en) * 2018-04-19 2018-09-14 上海联影医疗科技有限公司 Medical image processing method, device and computer readable storage medium
CN108537147A (en) * 2018-03-22 2018-09-14 东华大学 A kind of gesture identification method based on deep learning
CN108734273A (en) * 2018-04-17 2018-11-02 同济大学 A kind of SQRT Activiation methods applied to neural network
CN109196518A (en) * 2018-08-23 2019-01-11 合刃科技(深圳)有限公司 A kind of gesture identification method and device based on high light spectrum image-forming

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
CN108537147A (en) * 2018-03-22 2018-09-14 东华大学 A kind of gesture identification method based on deep learning
CN108734273A (en) * 2018-04-17 2018-11-02 同济大学 A kind of SQRT Activiation methods applied to neural network
CN108537794A (en) * 2018-04-19 2018-09-14 上海联影医疗科技有限公司 Medical image processing method, device and computer readable storage medium
CN109196518A (en) * 2018-08-23 2019-01-11 合刃科技(深圳)有限公司 A kind of gesture identification method and device based on high light spectrum image-forming

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的红外图像识别研究与实现;郑辉;《中国优秀硕士学位论文全文数据库信息科技辑》;20180615;全文 *

Also Published As

Publication number Publication date
CN109508670A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
US11417148B2 (en) Human face image classification method and apparatus, and server
CN109359608B (en) Face recognition method based on deep learning model
CN109190442B (en) Rapid face detection method based on deep cascade convolution neural network
CN108108764B (en) Visual SLAM loop detection method based on random forest
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN108665005B (en) Method for improving CNN-based image recognition performance by using DCGAN
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN109508670B (en) Static gesture recognition method based on infrared camera
CN111832546B (en) Lightweight natural scene text recognition method
CN111460980B (en) Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion
CN108961675A (en) Fall detection method based on convolutional neural networks
CN112580590A (en) Finger vein identification method based on multi-semantic feature fusion network
CN106372624B (en) Face recognition method and system
Ku et al. Face recognition based on mtcnn and convolutional neural network
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
CN111428557A (en) Method and device for automatically checking handwritten signature based on neural network model
CN111401156B (en) Image identification method based on Gabor convolution neural network
CN109034066A (en) Building identification method based on multi-feature fusion
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN113065426B (en) Gesture image feature fusion method based on channel perception
CN112364974B (en) YOLOv3 algorithm based on activation function improvement
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
Xie et al. Research on MTCNN face recognition system in low computing power scenarios
CN110136098B (en) Cable sequence detection method based on deep learning
Hsia et al. A fast face detection method for illumination variant condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant