CN113052112B - Gesture motion recognition interaction system and method based on hybrid neural network - Google Patents

Gesture motion recognition interaction system and method based on hybrid neural network Download PDF

Info

Publication number
CN113052112B
CN113052112B CN202110361015.0A CN202110361015A CN113052112B CN 113052112 B CN113052112 B CN 113052112B CN 202110361015 A CN202110361015 A CN 202110361015A CN 113052112 B CN113052112 B CN 113052112B
Authority
CN
China
Prior art keywords
gesture
neural network
video
cnn
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110361015.0A
Other languages
Chinese (zh)
Other versions
CN113052112A (en
Inventor
王立军
于霄洋
李争平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN202110361015.0A priority Critical patent/CN113052112B/en
Publication of CN113052112A publication Critical patent/CN113052112A/en
Application granted granted Critical
Publication of CN113052112B publication Critical patent/CN113052112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a projection gesture motion recognition interaction method and system based on a 3D CNN and RNN hybrid neural network. The invention can obtain the depth information of the hand information, can improve the accuracy of identification, achieves the most advanced performance on the data set built by the user, combines the 3DCNN and the RNN mixed neural network, and has the fusion effect which is greatly improved compared with the prior algorithm effect of CNN+RNN.

Description

Gesture motion recognition interaction system and method based on hybrid neural network
Technical Field
The invention belongs to the technical field of image recognition, and relates to a gesture motion recognition interaction system and method based on a hybrid neural network.
Background
In recent years, with the rise of artificial intelligence, machine learning and deep learning have rolled up the surge of computers. Human-machine interaction has become a major issue in research in the field of machine vision today. Intelligent devices with man-machine interaction function are rapidly developing in the market. Gestures, which are the most commonly used human interaction in people's daily lives, have been applied to many smart devices.
Gestures and hand gestures are a common form of human communication. It is therefore also natural for humans to interact with machines using this communication. For example, simple interactive human-machine interaction can improve the comfort and safety of automobiles; the simple gesture interaction can more conveniently perform interaction of the intelligent home; the gesture recognition with high recognition accuracy can enable the VR/AR gesture recognition to run more smoothly.
Gesture recognition is in turn classified into static gesture recognition and dynamic gesture recognition. The sample for static gesture recognition training is a static picture. The dynamic hand gesture recognition training sample is dynamic hand motion, namely motion performed by the hand is detected in a real-time video. Gesture recognition is the meaning of interpreting the actions of a human hand. In gesture recognition systems today, a variety of gesture recognition techniques based on data such as depth cameras, color cameras, distance sensors, wearable inertial sensors, or other modality type sensors have been proposed by many researchers. Some of gesture recognition based on computer vision is static gesture recognition, and these methods can only be static gestures, and gesture recognition is unnatural. In real systems for human-computer interaction, automatic detection and classification of dynamic gestures is challenging because (1) people have great differences in making gestures, recognition, and classification; (2) The system must work online to avoid significant delays between performing the gesture and classifying.
Disclosure of Invention
In order to solve the problems, the invention provides a projection gesture motion recognition interaction method and a projection gesture motion recognition interaction system based on a 3D CNN and RNN hybrid neural network, which are characterized in that firstly, depth image videos, color image videos and infrared image videos of hands are acquired through a depth camera, the videos are subjected to format unification, then, video files are grouped and sent into a 3DCNN (three-dimensional convolutional neural network) network to perform motion learning of the videos, then image features are output, then, the RNN (recursive neural network) network is required to be used for cyclic training, and finally, recognition results are output.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a projection gesture motion recognition method based on a 3D CNN and RNN hybrid neural network comprises the following steps:
step one, image video dataset acquisition
Collecting hand data by using a depth camera, and creating a data set;
when the model is input, the model input of RGB three channels is converted into the model input of RGB+HSV six channels, HSV respectively represents hue, saturation and brightness, and the expression is as follows:
max=max(R/255,G/255,B/255) (1)
min=min(R/255,G/255,B/255) (2)
Figure SMS_1
(3)
Figure SMS_2
V=max (5)
wherein R, G, B is the red, green and blue component value of each frame of image;
and secondly, performing video learning on video data in the data set by adopting a three-dimensional convolutional neural network, and outputting image characteristics.
And thirdly, performing cyclic training on the image features output in the second step by adopting a recurrent neural network.
Further, the first step includes the following sub-steps:
1) Using a depth camera to shoot 10 sections of depth video, color video and infrared video in each gesture scene, presetting 10 gesture operations in a data set, wherein the gesture operations are as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J;
2) Adjusting the video sizes to maintain a uniform size;
3) And putting the video obtained in the previous step into different folders to generate a gesture label file.
4) And integrating the folders to complete the creation of the data set.
Further, the three-dimensional convolutional neural network in the second step performs the following operations:
the three-dimensional convolutional neural network performs frame sampling on the video, and 7 frames of images are extracted every second to serve as network input; extracting 5 channels of information from each frame, wherein the information of the three channels of gray, gradient-x and gradient-y is directly obtained by operating each frame respectively, and the information of the two channels of optflow-x and (optflow-y) needs to be extracted by using the information of the two frames;
the output of the above layer is used as input, the convolution operation is carried out on 5 input channel information by using 3D convolution kernels with the size of 7 x 3, and the layer adopts two different 3D convolution kernels;
performing max mapping operation, wherein the characteristic maps number after downsampling is kept unchanged;
the two groups of characteristic maps divided before are respectively operated by adopting a convolution kernel of 7 < 6 > -3, and in order to increase the quantity of the characteristic maps, three different convolution kernels are adopted by the 3D CNN to respectively carry out convolution operation on the two groups of characteristic maps;
sampling work is performed, down-sampling operation is performed on each characteristic maps by using a 3*3 kernel, and convolution operation is performed on each characteristic maps by using 7*4 2D convolution kernels.
The projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network comprises an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera; the three-dimensional convolutional neural network is used for carrying out video learning on video data in the data set to output image characteristics; the recurrent neural network is used for carrying out cyclic training on the image characteristics output by the three-dimensional convolutional neural network.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the TOF depth camera is adopted to collect hand data, and compared with RGB video used by most common gesture data sets, depth information of hand information can be obtained by the depth image and IR video. The invention introduces a new challenging multi-mode dynamic gesture data set, the data set is captured by depth, color and stereo infrared sensors, and the model input of RGB three channels is converted into the model input of RGB+HSV six channels during model input, so that the accuracy of recognition can be improved, and a guarantee is provided for a gesture recognition control scheme. The invention achieves the most advanced performance on the self-built data set, combines the 3DCNN and the RNN mixed neural network, and has the fusion effect which is greatly improved compared with the prior CNN+RNN algorithm effect. The gesture recognition method and the gesture recognition device enable gesture recognition to be more effective and simpler through simple interaction operation, and the recognition effect is obvious.
Drawings
Fig. 1 is a schematic flow chart of a gesture motion recognition interaction method based on a hybrid neural network.
Fig. 2 is a schematic diagram of an image video dataset acquisition step.
Fig. 3 is a schematic diagram of a convolution operation performed by the 3D CNN on an image sequence (video) using a 3D convolution kernel.
Fig. 4 is a schematic diagram of a 3D CNN architecture.
Fig. 5 is a schematic diagram of a simple recurrent neural network structure.
Fig. 6 is a schematic diagram of the input-output principle of the recurrent neural network.
Detailed Description
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
The gesture motion recognition interaction method based on the hybrid neural network provided by the invention comprises the following steps as shown in the figure:
step one, image video dataset acquisition
Compared with the RGB video used by most common gesture data sets, the depth image and the IR video can obtain the depth information of the hand information, the TOF depth camera is adopted to collect the hand data, the collection step is shown in fig. 2, and the method specifically comprises the following steps:
1) A depth camera was used to capture 10 segments of depth video, color video, infrared video in each gesture scene. 10 gesture operations are preset in the dataset, and the gesture operations are respectively as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J.
2) The video sizes are adjusted so that they remain of a uniform size, in this example 640 x 420.
3) And putting the video obtained in the previous step into different folders to generate a gesture label file.
4) And integrating the folders to complete the creation of the data set.
In order to enhance the accuracy of model identification, the invention converts the model input of RGB three channels into the model input of RGB+HSV six channels during model input, thus being capable of improving the accuracy of identification. HSV stands for Hue, saturation, value, respectively.
max=max(R/255,G/255,B/255) (1)
min=min(R/255,G/255,B/255) (2)
Figure SMS_3
(3)
Figure SMS_4
V=max (5)
Where R, G, B is the red, green and blue component value of each frame of image.
Step two, adopting a 3DCNN (three-dimensional convolutional neural network) network to perform video learning on video data in the data set
The conventional 2DCNN recognizes each frame of image of the video by using CNN, and performs convolution operation by using a 2D convolution kernel, without considering inter-frame motion information of a time dimension. The 3D CNN can better capture the characteristic information of time and space in the video, and as shown in fig. 3, the 3D CNN carries out convolution operation on the image sequence (video) by adopting a 3D convolution kernel.
In fig. 3, the time dimension of the convolution operation is 3, that is, the convolution operation is performed on three consecutive frames of images, and the 3D convolution is performed by stacking a plurality of consecutive frames to form a cube, and then applying a 3D convolution kernel in the cube. In this structure, each feature map in the convolution layer is connected to a plurality of adjacent consecutive frames in the previous layer, thereby capturing motion information.
3D CNN is well suited for spatiotemporal feature learning. Compared to 2D CNNs, 3D CNNs are able to better model time information through 3D convolution and 3D pooling operations. In 3D CNNs, convolution and pooling operations are performed spatiotemporally, whereas in 2D CNNs they are done only spatially. Whereas 3D convolution can preserve the time information of the input signal. The 3D CNN structure is shown in fig. 4.
The 3D CNN network samples the video frames, extracting 7 frames of 60 x 40 images per second as network inputs. The information of 5 channels extracted from each frame, namely gray, gradient-x and gradient-y, can be directly obtained by operating each frame respectively, and the information of two channels (optflow-x and optflow-y) can be extracted by using the information of two frames, so that the characteristic maps number of the H1 layer: (7+7+7+6+6=33), the size of the feature maps is still 60×40.
And then taking the output of the upper layer as input, and carrying out convolution operation on the input 5 channel information by using 3D convolution kernels with the size of 7 x 3 respectively. In order to increase the number of feature maps, two different 3D convolution kernels are used at this layer, so the number of feature maps of the C2 layer is: ((7-3) +1) 3+ ((6-30+1) 2) 2=23×2, the size of the characteristic maps is:
((60-7)+1)*((40-7)+1)=54*34。
next, a max mapping operation is performed, and the number of characteristic maps after downsampling remains unchanged, so that the number of characteristic maps of the S3 layer remains as follows: 23 x 2, the size of the feature maps is: ((54/2) × (34/2) =27×17.
Next, the two sets of characteristic maps divided before are respectively operated by using a convolution kernel of 7 6 3, and in order to increase the number of the characteristic maps, three different convolution kernels are adopted by the 3D CNN to respectively perform convolution operation on the two sets of characteristic maps. Characteristic maps number of C4 layer: 13×3×2=13×6, and the size of the characteristic maps of the C4 layer is: ((27-7) +1) ((17-6) +1) =21×12.
Next, the sampling operation needs to be performed, and a core of 3*3 is used for each characteristic maps, where the size of each map is: 7*4. Convolving each feature map with a 2D convolution kernel of 7*4, the size of each map: 1*1.
The present invention proposes a network using 3D CNN and connection-oriented time classification (CTC). CTC enables gesture classification based on the kernel period of the gesture without explicit pre-segmentation. The problem of low accuracy of detecting gestures and the problem of serious delay are solved, and the problem is a key element of gesture interaction.
And outputting image characteristics after the action learning of the video is performed through the steps.
Step three, adopting an RNN (recurrent neural network) network to carry out cyclic training on the image characteristics output in the step two
RNN (Recurrent Neural Network) is a type of neural network for processing sequence data. Time series data refers to data collected at different points in time, and such data reflects the state or degree of change over time of something, a phenomenon, or the like.
A simple recurrent neural network is shown in fig. 5, consisting of an input layer, a hidden layer and an output layer. In the figure, x is a vector, which represents the value of the input layer; s is a vector representing the value of the hidden layer; u is the weight matrix from the input layer to the hidden layer, o is also a vector, which represents the value of the output layer; v is the hidden layer to output layer weight matrix.
Fig. 6 is a schematic diagram of input and output of a recurrent neural network, where X is a data input, h (hidden state) is used to extract features and output y, and passes on to the next layer, such that each previous layer is represented at the next layer. The method for processing the sequence by the cyclic neural network is as follows: traversing all sequence elements and saving a state containing information about the viewed content. The RNN is a for loop that reuses the results of the calculations of the previous iteration of the loop. Such a structure allows us to efficiently process the sequence data extracted by the 3D CNN.
In the step, the RNN network is used for carrying out cyclic training, and finally, the recognition result is output.
The invention also provides a gesture motion recognition interaction system based on the hybrid neural network, which comprises an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera, and the first step is specifically realized; the three-dimensional convolutional neural network performs video learning on video data in the data set to output image features, and specifically realizes the second content of the step, and the recurrent neural network performs cyclic training on the image features output by the three-dimensional convolutional neural network, and specifically realizes the third content of the step.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (3)

1. The projection gesture motion recognition method based on the 3D CNN and RNN hybrid neural network is characterized by comprising the following steps of:
step one, image video dataset acquisition
Collecting hand data by using a depth camera, and creating a data set;
when the model is input, the model input of RGB three channels is converted into the model input of RGB+HSV six channels, HSV respectively represents hue, saturation and brightness, and the expression is as follows:
max = max(R/255,G/255,B/255) (1)
min = min(R/255,G/255,B/255) (2)
Figure QLYQS_1
(3)
Figure QLYQS_2
(4)
V = max (5)
wherein R, G, B is the red, green and blue component value of each frame of image;
secondly, video learning is carried out on video data in the gesture action data set by adopting a three-dimensional convolutional neural network, and image features are output;
the three-dimensional convolutional neural network performs the following operations:
the three-dimensional convolutional neural network performs frame sampling on the video, and 7 frames of images are extracted every second to serve as network input; extracting 5 channels of information from each frame, wherein the information of the three channels of gray, gradient-x and gradient-y is directly obtained by operating each frame respectively, and the information of the two channels of optflow-x and optflow-y is extracted by using the information of the two frames;
the output of the above layer is used as input, the convolution operation is carried out on 5 input channel information by using 3D convolution kernels with the size of 7 x 3, and the layer adopts two different 3D convolution kernels;
performing max mapping operation, wherein the characteristic maps number after downsampling is kept unchanged;
the method comprises the steps that 7 x 6 x 3 convolution kernels are respectively adopted for two groups of characteristic maps divided before, and three different convolution kernels are adopted for 3D CNN to respectively carry out convolution operation on the two groups of characteristic maps in order to increase the number of the characteristic maps;
sampling work is carried out, down-sampling operation is carried out on each characteristic map by adopting a 3*3 kernel, and convolution operation is carried out on each characteristic map by adopting a 7*4 2D convolution kernel;
and thirdly, performing cyclic training on the image features output in the second step by adopting a recurrent neural network, and finally outputting gesture motion recognition results.
2. The method for recognizing the projected gesture motion based on the 3D CNN and RNN hybrid neural network according to claim 1, wherein the first step comprises the following sub-steps:
1) Using a depth camera to shoot 10 sections of depth video, color video and infrared video in each gesture scene, presetting 10 gesture operations in a data set, wherein the gesture operations are as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J;
2) Adjusting the video sizes to maintain a uniform size;
3) Putting the video obtained in the previous step into different folders to generate a gesture label file;
4) And integrating the folders to complete the creation of the data set.
3. The projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network is characterized by comprising an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network, wherein the projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network is used for realizing the projection gesture motion recognition method based on the 3D CNN and RNN hybrid neural network according to any one of claims 1-2; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera; the three-dimensional convolutional neural network is used for carrying out video learning on video data in the data set to output image characteristics; the recurrent neural network is used for carrying out cyclic training on the image characteristics output by the three-dimensional convolutional neural network.
CN202110361015.0A 2021-04-02 2021-04-02 Gesture motion recognition interaction system and method based on hybrid neural network Active CN113052112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110361015.0A CN113052112B (en) 2021-04-02 2021-04-02 Gesture motion recognition interaction system and method based on hybrid neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110361015.0A CN113052112B (en) 2021-04-02 2021-04-02 Gesture motion recognition interaction system and method based on hybrid neural network

Publications (2)

Publication Number Publication Date
CN113052112A CN113052112A (en) 2021-06-29
CN113052112B true CN113052112B (en) 2023-06-02

Family

ID=76517207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110361015.0A Active CN113052112B (en) 2021-04-02 2021-04-02 Gesture motion recognition interaction system and method based on hybrid neural network

Country Status (1)

Country Link
CN (1) CN113052112B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010144050A1 (en) * 2009-06-08 2010-12-16 Agency For Science, Technology And Research Method and system for gesture based manipulation of a 3-dimensional image of object
CN107590432A (en) * 2017-07-27 2018-01-16 北京联合大学 A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
CN109344701A (en) * 2018-08-23 2019-02-15 武汉嫦娥医学抗衰机器人股份有限公司 A kind of dynamic gesture identification method based on Kinect
US10304208B1 (en) * 2018-02-12 2019-05-28 Avodah Labs, Inc. Automated gesture identification using neural networks
CN110532912A (en) * 2019-08-19 2019-12-03 合肥学院 A kind of sign language interpreter implementation method and device
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium
CN111079581A (en) * 2019-12-03 2020-04-28 广州久邦世纪科技有限公司 Method and device for identifying human skin
CN211293894U (en) * 2019-11-27 2020-08-18 华南理工大学 Hand-written interaction device in air

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11126835B2 (en) * 2019-02-21 2021-09-21 Tata Consultancy Services Limited Hand detection in first person view

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010144050A1 (en) * 2009-06-08 2010-12-16 Agency For Science, Technology And Research Method and system for gesture based manipulation of a 3-dimensional image of object
CN107590432A (en) * 2017-07-27 2018-01-16 北京联合大学 A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
CN107679491A (en) * 2017-09-29 2018-02-09 华中师范大学 A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data
CN108334814A (en) * 2018-01-11 2018-07-27 浙江工业大学 A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis
US10304208B1 (en) * 2018-02-12 2019-05-28 Avodah Labs, Inc. Automated gesture identification using neural networks
CN109344701A (en) * 2018-08-23 2019-02-15 武汉嫦娥医学抗衰机器人股份有限公司 A kind of dynamic gesture identification method based on Kinect
CN110532912A (en) * 2019-08-19 2019-12-03 合肥学院 A kind of sign language interpreter implementation method and device
CN211293894U (en) * 2019-11-27 2020-08-18 华南理工大学 Hand-written interaction device in air
CN111079581A (en) * 2019-12-03 2020-04-28 广州久邦世纪科技有限公司 Method and device for identifying human skin
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
3D Convolutional Neural Networks for Human Action Recognition;Shuiwang Ji等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;第35卷(第1期);221-231 *
Aviation Medical Simulation Training Based on Interactive Technology;Lin Jiang等;《2021 IEEE 4th International Conference on Computer and Communication Engineering Technology (CCET)》;387-391 *
DeepFinger: A Cascade Convolutional Neuron Network Approach to Finger Key Point Detection in Egocentric Vision with Mobile Camera;Yichao Huang等;《2015 IEEE International Conference on Systems, Man, and Cybernetics》;2944-2949 *
基于改进CNN+RNN的视频手势识别研究;丁小雪;《中国优秀硕士学位论文全文数据库 信息科技辑》(第07期);I138-1139 *

Also Published As

Publication number Publication date
CN113052112A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
Anwar et al. Image colorization: A survey and dataset
WO2021098261A1 (en) Target detection method and apparatus
Deng et al. MVF-Net: A multi-view fusion network for event-based object classification
Agarwal et al. Anubhav: recognizing emotions through facial expression
CN108363973B (en) Unconstrained 3D expression migration method
CN113343950B (en) Video behavior identification method based on multi-feature fusion
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN112487981A (en) MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN110942037A (en) Action recognition method for video analysis
CN113673584A (en) Image detection method and related device
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
CN114821764A (en) Gesture image recognition method and system based on KCF tracking detection
TW202145065A (en) Image processing method, electronic device and computer-readable storage medium
CN115328319A (en) Intelligent control method and device based on light-weight gesture recognition
CN104408444A (en) Human body action recognition method and device
Wani et al. Deep learning-based video action recognition: a review
US20240161461A1 (en) Object detection method, object detection apparatus, and object detection system
Le et al. Facial detection in low light environments using OpenCV
CN111881803B (en) Face recognition method based on improved YOLOv3
CN113052112B (en) Gesture motion recognition interaction system and method based on hybrid neural network
Howe et al. Comparison of hand segmentation methodologies for hand gesture recognition
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
Meshram et al. Convolution Neural Network based Hand Gesture Recognition System
CN115423982A (en) Desktop curling three-dimensional detection method based on image and depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant