CN113052112B - Gesture motion recognition interaction system and method based on hybrid neural network - Google Patents
Gesture motion recognition interaction system and method based on hybrid neural network Download PDFInfo
- Publication number
- CN113052112B CN113052112B CN202110361015.0A CN202110361015A CN113052112B CN 113052112 B CN113052112 B CN 113052112B CN 202110361015 A CN202110361015 A CN 202110361015A CN 113052112 B CN113052112 B CN 113052112B
- Authority
- CN
- China
- Prior art keywords
- gesture
- neural network
- video
- cnn
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000003993 interaction Effects 0.000 title abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 42
- 230000000306 recurrent effect Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 10
- 125000004122 cyclic group Chemical group 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 230000003068 static effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a projection gesture motion recognition interaction method and system based on a 3D CNN and RNN hybrid neural network. The invention can obtain the depth information of the hand information, can improve the accuracy of identification, achieves the most advanced performance on the data set built by the user, combines the 3DCNN and the RNN mixed neural network, and has the fusion effect which is greatly improved compared with the prior algorithm effect of CNN+RNN.
Description
Technical Field
The invention belongs to the technical field of image recognition, and relates to a gesture motion recognition interaction system and method based on a hybrid neural network.
Background
In recent years, with the rise of artificial intelligence, machine learning and deep learning have rolled up the surge of computers. Human-machine interaction has become a major issue in research in the field of machine vision today. Intelligent devices with man-machine interaction function are rapidly developing in the market. Gestures, which are the most commonly used human interaction in people's daily lives, have been applied to many smart devices.
Gestures and hand gestures are a common form of human communication. It is therefore also natural for humans to interact with machines using this communication. For example, simple interactive human-machine interaction can improve the comfort and safety of automobiles; the simple gesture interaction can more conveniently perform interaction of the intelligent home; the gesture recognition with high recognition accuracy can enable the VR/AR gesture recognition to run more smoothly.
Gesture recognition is in turn classified into static gesture recognition and dynamic gesture recognition. The sample for static gesture recognition training is a static picture. The dynamic hand gesture recognition training sample is dynamic hand motion, namely motion performed by the hand is detected in a real-time video. Gesture recognition is the meaning of interpreting the actions of a human hand. In gesture recognition systems today, a variety of gesture recognition techniques based on data such as depth cameras, color cameras, distance sensors, wearable inertial sensors, or other modality type sensors have been proposed by many researchers. Some of gesture recognition based on computer vision is static gesture recognition, and these methods can only be static gestures, and gesture recognition is unnatural. In real systems for human-computer interaction, automatic detection and classification of dynamic gestures is challenging because (1) people have great differences in making gestures, recognition, and classification; (2) The system must work online to avoid significant delays between performing the gesture and classifying.
Disclosure of Invention
In order to solve the problems, the invention provides a projection gesture motion recognition interaction method and a projection gesture motion recognition interaction system based on a 3D CNN and RNN hybrid neural network, which are characterized in that firstly, depth image videos, color image videos and infrared image videos of hands are acquired through a depth camera, the videos are subjected to format unification, then, video files are grouped and sent into a 3DCNN (three-dimensional convolutional neural network) network to perform motion learning of the videos, then image features are output, then, the RNN (recursive neural network) network is required to be used for cyclic training, and finally, recognition results are output.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a projection gesture motion recognition method based on a 3D CNN and RNN hybrid neural network comprises the following steps:
step one, image video dataset acquisition
Collecting hand data by using a depth camera, and creating a data set;
when the model is input, the model input of RGB three channels is converted into the model input of RGB+HSV six channels, HSV respectively represents hue, saturation and brightness, and the expression is as follows:
max=max(R/255,G/255,B/255) (1)
min=min(R/255,G/255,B/255) (2)
V=max (5)
wherein R, G, B is the red, green and blue component value of each frame of image;
and secondly, performing video learning on video data in the data set by adopting a three-dimensional convolutional neural network, and outputting image characteristics.
And thirdly, performing cyclic training on the image features output in the second step by adopting a recurrent neural network.
Further, the first step includes the following sub-steps:
1) Using a depth camera to shoot 10 sections of depth video, color video and infrared video in each gesture scene, presetting 10 gesture operations in a data set, wherein the gesture operations are as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J;
2) Adjusting the video sizes to maintain a uniform size;
3) And putting the video obtained in the previous step into different folders to generate a gesture label file.
4) And integrating the folders to complete the creation of the data set.
Further, the three-dimensional convolutional neural network in the second step performs the following operations:
the three-dimensional convolutional neural network performs frame sampling on the video, and 7 frames of images are extracted every second to serve as network input; extracting 5 channels of information from each frame, wherein the information of the three channels of gray, gradient-x and gradient-y is directly obtained by operating each frame respectively, and the information of the two channels of optflow-x and (optflow-y) needs to be extracted by using the information of the two frames;
the output of the above layer is used as input, the convolution operation is carried out on 5 input channel information by using 3D convolution kernels with the size of 7 x 3, and the layer adopts two different 3D convolution kernels;
performing max mapping operation, wherein the characteristic maps number after downsampling is kept unchanged;
the two groups of characteristic maps divided before are respectively operated by adopting a convolution kernel of 7 < 6 > -3, and in order to increase the quantity of the characteristic maps, three different convolution kernels are adopted by the 3D CNN to respectively carry out convolution operation on the two groups of characteristic maps;
sampling work is performed, down-sampling operation is performed on each characteristic maps by using a 3*3 kernel, and convolution operation is performed on each characteristic maps by using 7*4 2D convolution kernels.
The projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network comprises an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera; the three-dimensional convolutional neural network is used for carrying out video learning on video data in the data set to output image characteristics; the recurrent neural network is used for carrying out cyclic training on the image characteristics output by the three-dimensional convolutional neural network.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the TOF depth camera is adopted to collect hand data, and compared with RGB video used by most common gesture data sets, depth information of hand information can be obtained by the depth image and IR video. The invention introduces a new challenging multi-mode dynamic gesture data set, the data set is captured by depth, color and stereo infrared sensors, and the model input of RGB three channels is converted into the model input of RGB+HSV six channels during model input, so that the accuracy of recognition can be improved, and a guarantee is provided for a gesture recognition control scheme. The invention achieves the most advanced performance on the self-built data set, combines the 3DCNN and the RNN mixed neural network, and has the fusion effect which is greatly improved compared with the prior CNN+RNN algorithm effect. The gesture recognition method and the gesture recognition device enable gesture recognition to be more effective and simpler through simple interaction operation, and the recognition effect is obvious.
Drawings
Fig. 1 is a schematic flow chart of a gesture motion recognition interaction method based on a hybrid neural network.
Fig. 2 is a schematic diagram of an image video dataset acquisition step.
Fig. 3 is a schematic diagram of a convolution operation performed by the 3D CNN on an image sequence (video) using a 3D convolution kernel.
Fig. 4 is a schematic diagram of a 3D CNN architecture.
Fig. 5 is a schematic diagram of a simple recurrent neural network structure.
Fig. 6 is a schematic diagram of the input-output principle of the recurrent neural network.
Detailed Description
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
The gesture motion recognition interaction method based on the hybrid neural network provided by the invention comprises the following steps as shown in the figure:
step one, image video dataset acquisition
Compared with the RGB video used by most common gesture data sets, the depth image and the IR video can obtain the depth information of the hand information, the TOF depth camera is adopted to collect the hand data, the collection step is shown in fig. 2, and the method specifically comprises the following steps:
1) A depth camera was used to capture 10 segments of depth video, color video, infrared video in each gesture scene. 10 gesture operations are preset in the dataset, and the gesture operations are respectively as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J.
2) The video sizes are adjusted so that they remain of a uniform size, in this example 640 x 420.
3) And putting the video obtained in the previous step into different folders to generate a gesture label file.
4) And integrating the folders to complete the creation of the data set.
In order to enhance the accuracy of model identification, the invention converts the model input of RGB three channels into the model input of RGB+HSV six channels during model input, thus being capable of improving the accuracy of identification. HSV stands for Hue, saturation, value, respectively.
max=max(R/255,G/255,B/255) (1)
min=min(R/255,G/255,B/255) (2)
V=max (5)
Where R, G, B is the red, green and blue component value of each frame of image.
Step two, adopting a 3DCNN (three-dimensional convolutional neural network) network to perform video learning on video data in the data set
The conventional 2DCNN recognizes each frame of image of the video by using CNN, and performs convolution operation by using a 2D convolution kernel, without considering inter-frame motion information of a time dimension. The 3D CNN can better capture the characteristic information of time and space in the video, and as shown in fig. 3, the 3D CNN carries out convolution operation on the image sequence (video) by adopting a 3D convolution kernel.
In fig. 3, the time dimension of the convolution operation is 3, that is, the convolution operation is performed on three consecutive frames of images, and the 3D convolution is performed by stacking a plurality of consecutive frames to form a cube, and then applying a 3D convolution kernel in the cube. In this structure, each feature map in the convolution layer is connected to a plurality of adjacent consecutive frames in the previous layer, thereby capturing motion information.
3D CNN is well suited for spatiotemporal feature learning. Compared to 2D CNNs, 3D CNNs are able to better model time information through 3D convolution and 3D pooling operations. In 3D CNNs, convolution and pooling operations are performed spatiotemporally, whereas in 2D CNNs they are done only spatially. Whereas 3D convolution can preserve the time information of the input signal. The 3D CNN structure is shown in fig. 4.
The 3D CNN network samples the video frames, extracting 7 frames of 60 x 40 images per second as network inputs. The information of 5 channels extracted from each frame, namely gray, gradient-x and gradient-y, can be directly obtained by operating each frame respectively, and the information of two channels (optflow-x and optflow-y) can be extracted by using the information of two frames, so that the characteristic maps number of the H1 layer: (7+7+7+6+6=33), the size of the feature maps is still 60×40.
And then taking the output of the upper layer as input, and carrying out convolution operation on the input 5 channel information by using 3D convolution kernels with the size of 7 x 3 respectively. In order to increase the number of feature maps, two different 3D convolution kernels are used at this layer, so the number of feature maps of the C2 layer is: ((7-3) +1) 3+ ((6-30+1) 2) 2=23×2, the size of the characteristic maps is:
((60-7)+1)*((40-7)+1)=54*34。
next, a max mapping operation is performed, and the number of characteristic maps after downsampling remains unchanged, so that the number of characteristic maps of the S3 layer remains as follows: 23 x 2, the size of the feature maps is: ((54/2) × (34/2) =27×17.
Next, the two sets of characteristic maps divided before are respectively operated by using a convolution kernel of 7 6 3, and in order to increase the number of the characteristic maps, three different convolution kernels are adopted by the 3D CNN to respectively perform convolution operation on the two sets of characteristic maps. Characteristic maps number of C4 layer: 13×3×2=13×6, and the size of the characteristic maps of the C4 layer is: ((27-7) +1) ((17-6) +1) =21×12.
Next, the sampling operation needs to be performed, and a core of 3*3 is used for each characteristic maps, where the size of each map is: 7*4. Convolving each feature map with a 2D convolution kernel of 7*4, the size of each map: 1*1.
The present invention proposes a network using 3D CNN and connection-oriented time classification (CTC). CTC enables gesture classification based on the kernel period of the gesture without explicit pre-segmentation. The problem of low accuracy of detecting gestures and the problem of serious delay are solved, and the problem is a key element of gesture interaction.
And outputting image characteristics after the action learning of the video is performed through the steps.
Step three, adopting an RNN (recurrent neural network) network to carry out cyclic training on the image characteristics output in the step two
RNN (Recurrent Neural Network) is a type of neural network for processing sequence data. Time series data refers to data collected at different points in time, and such data reflects the state or degree of change over time of something, a phenomenon, or the like.
A simple recurrent neural network is shown in fig. 5, consisting of an input layer, a hidden layer and an output layer. In the figure, x is a vector, which represents the value of the input layer; s is a vector representing the value of the hidden layer; u is the weight matrix from the input layer to the hidden layer, o is also a vector, which represents the value of the output layer; v is the hidden layer to output layer weight matrix.
Fig. 6 is a schematic diagram of input and output of a recurrent neural network, where X is a data input, h (hidden state) is used to extract features and output y, and passes on to the next layer, such that each previous layer is represented at the next layer. The method for processing the sequence by the cyclic neural network is as follows: traversing all sequence elements and saving a state containing information about the viewed content. The RNN is a for loop that reuses the results of the calculations of the previous iteration of the loop. Such a structure allows us to efficiently process the sequence data extracted by the 3D CNN.
In the step, the RNN network is used for carrying out cyclic training, and finally, the recognition result is output.
The invention also provides a gesture motion recognition interaction system based on the hybrid neural network, which comprises an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera, and the first step is specifically realized; the three-dimensional convolutional neural network performs video learning on video data in the data set to output image features, and specifically realizes the second content of the step, and the recurrent neural network performs cyclic training on the image features output by the three-dimensional convolutional neural network, and specifically realizes the third content of the step.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (3)
1. The projection gesture motion recognition method based on the 3D CNN and RNN hybrid neural network is characterized by comprising the following steps of:
step one, image video dataset acquisition
Collecting hand data by using a depth camera, and creating a data set;
when the model is input, the model input of RGB three channels is converted into the model input of RGB+HSV six channels, HSV respectively represents hue, saturation and brightness, and the expression is as follows:
max = max(R/255,G/255,B/255) (1)
min = min(R/255,G/255,B/255) (2)
V = max (5)
wherein R, G, B is the red, green and blue component value of each frame of image;
secondly, video learning is carried out on video data in the gesture action data set by adopting a three-dimensional convolutional neural network, and image features are output;
the three-dimensional convolutional neural network performs the following operations:
the three-dimensional convolutional neural network performs frame sampling on the video, and 7 frames of images are extracted every second to serve as network input; extracting 5 channels of information from each frame, wherein the information of the three channels of gray, gradient-x and gradient-y is directly obtained by operating each frame respectively, and the information of the two channels of optflow-x and optflow-y is extracted by using the information of the two frames;
the output of the above layer is used as input, the convolution operation is carried out on 5 input channel information by using 3D convolution kernels with the size of 7 x 3, and the layer adopts two different 3D convolution kernels;
performing max mapping operation, wherein the characteristic maps number after downsampling is kept unchanged;
the method comprises the steps that 7 x 6 x 3 convolution kernels are respectively adopted for two groups of characteristic maps divided before, and three different convolution kernels are adopted for 3D CNN to respectively carry out convolution operation on the two groups of characteristic maps in order to increase the number of the characteristic maps;
sampling work is carried out, down-sampling operation is carried out on each characteristic map by adopting a 3*3 kernel, and convolution operation is carried out on each characteristic map by adopting a 7*4 2D convolution kernel;
and thirdly, performing cyclic training on the image features output in the second step by adopting a recurrent neural network, and finally outputting gesture motion recognition results.
2. The method for recognizing the projected gesture motion based on the 3D CNN and RNN hybrid neural network according to claim 1, wherein the first step comprises the following sub-steps:
1) Using a depth camera to shoot 10 sections of depth video, color video and infrared video in each gesture scene, presetting 10 gesture operations in a data set, wherein the gesture operations are as follows: gesture a, gesture B, gesture C, gesture D, gesture E, gesture F, gesture G, gesture H, gesture I, gesture J;
2) Adjusting the video sizes to maintain a uniform size;
3) Putting the video obtained in the previous step into different folders to generate a gesture label file;
4) And integrating the folders to complete the creation of the data set.
3. The projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network is characterized by comprising an image video data set acquisition module, a three-dimensional convolution neural network and a recurrent neural network, wherein the projection gesture motion recognition system based on the 3D CNN and RNN hybrid neural network is used for realizing the projection gesture motion recognition method based on the 3D CNN and RNN hybrid neural network according to any one of claims 1-2; the image video data set acquisition module is used for acquiring hand data by adopting a depth camera; the three-dimensional convolutional neural network is used for carrying out video learning on video data in the data set to output image characteristics; the recurrent neural network is used for carrying out cyclic training on the image characteristics output by the three-dimensional convolutional neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110361015.0A CN113052112B (en) | 2021-04-02 | 2021-04-02 | Gesture motion recognition interaction system and method based on hybrid neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110361015.0A CN113052112B (en) | 2021-04-02 | 2021-04-02 | Gesture motion recognition interaction system and method based on hybrid neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113052112A CN113052112A (en) | 2021-06-29 |
CN113052112B true CN113052112B (en) | 2023-06-02 |
Family
ID=76517207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110361015.0A Active CN113052112B (en) | 2021-04-02 | 2021-04-02 | Gesture motion recognition interaction system and method based on hybrid neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052112B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010144050A1 (en) * | 2009-06-08 | 2010-12-16 | Agency For Science, Technology And Research | Method and system for gesture based manipulation of a 3-dimensional image of object |
CN107590432A (en) * | 2017-07-27 | 2018-01-16 | 北京联合大学 | A kind of gesture identification method based on circulating three-dimensional convolutional neural networks |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN108334814A (en) * | 2018-01-11 | 2018-07-27 | 浙江工业大学 | A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis |
CN109344701A (en) * | 2018-08-23 | 2019-02-15 | 武汉嫦娥医学抗衰机器人股份有限公司 | A kind of dynamic gesture identification method based on Kinect |
US10304208B1 (en) * | 2018-02-12 | 2019-05-28 | Avodah Labs, Inc. | Automated gesture identification using neural networks |
CN110532912A (en) * | 2019-08-19 | 2019-12-03 | 合肥学院 | A kind of sign language interpreter implementation method and device |
CN111079641A (en) * | 2019-12-13 | 2020-04-28 | 科大讯飞股份有限公司 | Answering content identification method, related device and readable storage medium |
CN111079581A (en) * | 2019-12-03 | 2020-04-28 | 广州久邦世纪科技有限公司 | Method and device for identifying human skin |
CN211293894U (en) * | 2019-11-27 | 2020-08-18 | 华南理工大学 | Hand-written interaction device in air |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11126835B2 (en) * | 2019-02-21 | 2021-09-21 | Tata Consultancy Services Limited | Hand detection in first person view |
-
2021
- 2021-04-02 CN CN202110361015.0A patent/CN113052112B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010144050A1 (en) * | 2009-06-08 | 2010-12-16 | Agency For Science, Technology And Research | Method and system for gesture based manipulation of a 3-dimensional image of object |
CN107590432A (en) * | 2017-07-27 | 2018-01-16 | 北京联合大学 | A kind of gesture identification method based on circulating three-dimensional convolutional neural networks |
CN107679491A (en) * | 2017-09-29 | 2018-02-09 | 华中师范大学 | A kind of 3D convolutional neural networks sign Language Recognition Methods for merging multi-modal data |
CN108334814A (en) * | 2018-01-11 | 2018-07-27 | 浙江工业大学 | A kind of AR system gesture identification methods based on convolutional neural networks combination user's habituation behavioural analysis |
US10304208B1 (en) * | 2018-02-12 | 2019-05-28 | Avodah Labs, Inc. | Automated gesture identification using neural networks |
CN109344701A (en) * | 2018-08-23 | 2019-02-15 | 武汉嫦娥医学抗衰机器人股份有限公司 | A kind of dynamic gesture identification method based on Kinect |
CN110532912A (en) * | 2019-08-19 | 2019-12-03 | 合肥学院 | A kind of sign language interpreter implementation method and device |
CN211293894U (en) * | 2019-11-27 | 2020-08-18 | 华南理工大学 | Hand-written interaction device in air |
CN111079581A (en) * | 2019-12-03 | 2020-04-28 | 广州久邦世纪科技有限公司 | Method and device for identifying human skin |
CN111079641A (en) * | 2019-12-13 | 2020-04-28 | 科大讯飞股份有限公司 | Answering content identification method, related device and readable storage medium |
Non-Patent Citations (4)
Title |
---|
3D Convolutional Neural Networks for Human Action Recognition;Shuiwang Ji等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;第35卷(第1期);221-231 * |
Aviation Medical Simulation Training Based on Interactive Technology;Lin Jiang等;《2021 IEEE 4th International Conference on Computer and Communication Engineering Technology (CCET)》;387-391 * |
DeepFinger: A Cascade Convolutional Neuron Network Approach to Finger Key Point Detection in Egocentric Vision with Mobile Camera;Yichao Huang等;《2015 IEEE International Conference on Systems, Man, and Cybernetics》;2944-2949 * |
基于改进CNN+RNN的视频手势识别研究;丁小雪;《中国优秀硕士学位论文全文数据库 信息科技辑》(第07期);I138-1139 * |
Also Published As
Publication number | Publication date |
---|---|
CN113052112A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
Anwar et al. | Image colorization: A survey and dataset | |
WO2021098261A1 (en) | Target detection method and apparatus | |
Deng et al. | MVF-Net: A multi-view fusion network for event-based object classification | |
Agarwal et al. | Anubhav: recognizing emotions through facial expression | |
CN108363973B (en) | Unconstrained 3D expression migration method | |
CN113343950B (en) | Video behavior identification method based on multi-feature fusion | |
CN110032932B (en) | Human body posture identification method based on video processing and decision tree set threshold | |
CN112487981A (en) | MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation | |
CN110942037A (en) | Action recognition method for video analysis | |
CN113673584A (en) | Image detection method and related device | |
CN111833360B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN114821764A (en) | Gesture image recognition method and system based on KCF tracking detection | |
TW202145065A (en) | Image processing method, electronic device and computer-readable storage medium | |
CN115328319A (en) | Intelligent control method and device based on light-weight gesture recognition | |
CN104408444A (en) | Human body action recognition method and device | |
Wani et al. | Deep learning-based video action recognition: a review | |
US20240161461A1 (en) | Object detection method, object detection apparatus, and object detection system | |
Le et al. | Facial detection in low light environments using OpenCV | |
CN111881803B (en) | Face recognition method based on improved YOLOv3 | |
CN113052112B (en) | Gesture motion recognition interaction system and method based on hybrid neural network | |
Howe et al. | Comparison of hand segmentation methodologies for hand gesture recognition | |
Özyurt et al. | A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function | |
Meshram et al. | Convolution Neural Network based Hand Gesture Recognition System | |
CN115423982A (en) | Desktop curling three-dimensional detection method based on image and depth |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |