CN107808143A - Dynamic gesture identification method based on computer vision - Google Patents
Dynamic gesture identification method based on computer vision Download PDFInfo
- Publication number
- CN107808143A CN107808143A CN201711102008.9A CN201711102008A CN107808143A CN 107808143 A CN107808143 A CN 107808143A CN 201711102008 A CN201711102008 A CN 201711102008A CN 107808143 A CN107808143 A CN 107808143A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- gesture
- frame
- msup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of dynamic gesture identification method based on computer vision.Solves the problems, such as the Dynamic Recognition of the gesture under complex background.Implementation step is:Collection gesture data collection is simultaneously manually marked, the priori frame of cluster acquisition training is carried out to the true frame of image set of mark, structure end to end can simultaneously future position, size and classification convolutional neural networks, training network obtains weight, load weight to be identified to network, input images of gestures, the position coordinates and generic information that the processing of non-maxima suppression method obtains, final recognition result image is obtained, identification information is recorded in real time and obtains dynamic gesture interpretation result.The defects of being carried out instant invention overcomes hand detection in gesture identification in the prior art and classification identification substep, it greatly simplify the process of gesture identification, the degree of accuracy and the speed of identification are improved, enhances the robustness of identifying system, and realizes the function to dynamic gesture interpretation.
Description
Technical field
The invention belongs to technical field of image processing, further relates to the target identification technology of image, is specifically one kind
Dynamic gesture identification method based on computer vision.Available in image gesture position detection and state recognition, so as to
More accurately information is provided for the application such as the follow-up sign language interpreter of gesture identification, game interactive.
Background technology
In recent years, with the development of the related disciplines such as computer vision and machine learning, human-computer interaction technology (human
Computer interaction) just gradually from by " centered on computer " to so that " artificial " center " changes.Made with human body itself
Interactive experience more directly perceived, comfortable is provided for operator for the natural user interface of intercommunion platform, is known including face
Not, gesture identification and body posture identification etc..Gesture wherein in daily life possesses very as intuitively exchange way naturally
Good application prospect:The smart machine in virtual reality is controlled using the gesture provided;As sign language interpreter, solve
The communicating questions of deaf-mute;Unmanned automatic identification traffic police gesture.Therefore, gesture identification have critically important researching value and
Meaning.
Gesture identification is concentrated mainly on two aspects, and one kind is to be based on sensing equipment (such as:Data glove+position tracking instrument)
Gesture identification, another kind is the gesture identification of view-based access control model.Because the gesture identification of view-based access control model can make operator with more
Natural mode is added to carry out man-machine interaction, and flexibility is bigger, so having obtained more researchs and concern.Current most gestures
Identification is all based on carrying out position detection and identification to the gesture in image, using first detecting hand position, then determines gesture class
Other two steps recognition methods.
Paper " the Real-Time Hand Gesture Recognition Using that Zhi-hua Chen et al. are delivered
Finger Segmentation”(The scientific world journal,2014(3):267872) one kind is proposed in
Method based on hand detection and SHAPE DETECTION.This method extracts hand region and binaryzation first with Background difference, so
After be partitioned into finger and palm, recycle finger quantity and content (content refers to the title of finger, such as:Thumb, forefinger,
Middle finger etc.) gesture target is classified from original 13 templates.But this method is strict to image background requirement, only
Hand position can be just partitioned under single background by having.In addition, the gesture shape of the method identification is single, poor robustness is difficult
To promote.
Paper " the A Real-time Hand Gesture Recognition and Human- that Pei Xu are delivered
Proposed in Computer Interaction System " (In CVPR, IEEE, 2017) a kind of based on hand detection and CNN
The algorithm of identification.This method obtains the binary image for only including hand using the primary image processing method such as filtering, morphology,
It is then enter into convolutional neural networks LeNet and carries out feature extraction and identify, improves the degree of accuracy.But this method
Need to pre-process image, high is required to background color, and the detection and identification of gesture are carried out in two steps, i.e., first obtain
The position of gesture, then current gesture is classified to obtain state, identification step is cumbersome and time-consuming.
The content of the invention
It is an object of the invention to the deficiency for prior art, propose a kind of accuracy rate it is higher, it is more efficient based on
The dynamic gesture identification method of computer vision.
The present invention is a kind of dynamic gesture identification method based on computer vision, it is characterised in that includes following step
Suddenly:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, respectively to gesture therein
Manually marked, obtain the classification and coordinate data of True Data frame;
(2) cluster obtains priori frame:The True Data frame that manually marks is clustered, using the overlapping degree of the area of frame as
Loss metric, obtain several preliminary examination priori frames;
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:To change
The GoogLeNet networks entered are rolled up end to end as network frame with the loss function structure of constrained objective position simultaneously, classification
Product neutral net;
(4) end to end network is trained:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to image, and size selection is 32 multiple, is obtained
The images of gestures of reading after scaling;
(4c) carries out size scaling using the method for bilinear interpolation to input picture, zooms to fixed size, obtains energy
The image being input in convolutional network;
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are instructed
Practice, weight corresponding to the convolutional neural networks built;
(5) weight is loaded:Weight corresponding to the convolutional neural networks that step (4d) is obtained is loaded into step (3) structure
In convolutional neural networks;
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, is input to the convolution god for having loaded weight
Through being identified in network, while obtain the position coordinates and generic information of gesture target identification to be identified;
(7) redundant prediction frame is removed:The position coordinates obtained and generic letter are handled using non-maxima suppression method
Breath, obtains final prediction block:
(7a) arranges the score descending of all prediction blocks, chooses best result and its corresponding frame;
(7b) travels through remaining frame, if being more than certain threshold value with the overlapping area IOU of current best result frame, just by this frame
Delete;
(7c) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c),
The prediction frame data remained;
(8) visualization of prediction result:Prediction frame data is mapped in artwork, prediction block is drawn in artwork and is marked
Go out gesture target generic label;
(9) record and analyze:The classification and positional information of record gesture in real time, the real time data of gained is analyzed, to dynamic
Gesture is interpreted, and interpretation result is directly displayed at into screen.
It is of the invention that gesture is identified end to end using depth convolutional neural networks, can not only be real to dynamic gesture
When identify, and higher accuracy rate can be kept under complex background.
The present invention has advantages below compared with prior art:
1st, gesture is identified using convolutional neural networks by the present invention, the position detection and identification of gesture target in image
One step is completed, and step is succinct, and recognition speed is fast, overcomes two steps in the prior art and separately handles, first detects hand position, then know
The defects of real-time can not be ensured during other gesture.Network can extract the feature of images of gestures well simultaneously, in any angle pair
The identification of gesture has very high accuracy rate, and the background of image is not required, even also can be accurate under the background of complexity
Gesture really is identified, image background in the prior art is overcome and requires the defects of single;
2nd, the present invention is in training convolutional neural networks using the method for random scaling images of gestures size, and every iteration is several times
The size that images of gestures will be changed is input in convolutional neural networks.Algorithm uses every 10 batches, and network will be randomly
A new dimension of picture is selected, allows network to be attained by a good prediction effect in different input sizes, it is same
Network can be detected on different resolution.So that identical network can predict the detection of different resolution, robust
Property and generalization are stronger.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the natural scene gesture figure that the present invention uses in emulation experiment;
Fig. 3 is the gesture target identification result figure obtained in emulation experiment;
Fig. 4 is recognition result figure of the present invention to dynamic gesture, and wherein Fig. 4 (a) is semantic for the dynamic of " object " in sign language
The a certain frame of state gesture, Fig. 4 (b) are a certain frames of the process testing result;
Fig. 5 is the record figure to dynamic hand gesture recognition process gesture center point coordinate.
Embodiment
The present invention is described in detail below in conjunction with the accompanying drawings.
Embodiment 1
Gesture possesses good application prospect as intuitively exchange way naturally:Using the gesture provided to void
The smart machine intended in reality is controlled;As sign language interpreter, solve the communicating questions of deaf-mute;Unmanned automatic identification
Traffic police's gesture etc..Conventional method is substantially all used currently for the Gesture Recognition of view-based access control model, i.e., is first partitioned into gesture, then
Gesture is classified, this mode requires high to photographic quality, and is difficult to handle the gesture under complex background.Therefore limit
The development of gesture identification application.The present invention is directed to above-mentioned present situation, expands research and innovation, proposes that one kind is regarded based on computer
The dynamic gesture identification method of feel, referring to Fig. 1, including have the following steps:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, training set, which is used to train, to be rolled up
Product neutral net, test set are used for the accuracy rate for calculating the Network Recognition.The gesture in the images of gestures collected is marked, is obtained
Most press close to the rectangle frame size and center point coordinate of gesture, and the classification of corresponding gesture.Realization enters pedestrian to gesture therein
Work marks, and obtains the classification and coordinate data of True Data frame.
(2) cluster obtains priori frame:Cluster centre number is chosen, the True Data frame manually marked is clustered, by frame
The overlapping degree of area is clustered as loss metric, obtains several preliminary examination priori frames.Cluster centre number is set in this example
9 are set to, after using overlapping degree as the cluster of loss metric, obtained 9 preliminary examination priori frames, with this 9 preliminary examination priori
Preliminary examination prediction block of the frame as convolutional neural networks, the convergence time of convolutional neural networks can be shortened.Generally, cluster centre number
Size depend on the dense degrees of target numbers in picture, and target numbers are more in picture, the middle calculation of the cluster of setting
It is more.
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:To change
The GoogLeNet networks entered coordinate constrained objective position, size, the loss function structure end of classification simultaneously as network frame
To the convolutional neural networks at end.The convolutional neural networks end to end of constrained objective position and classification simultaneously are capable of in design one,
The network can predict position, size and the classification of target gesture simultaneously.The convolutional neural networks that the present invention is built make use of together
When constrained objective position, size and its classification loss function so that the network possesses while future position, size and class
Other function.The network calculations amount is small, and is easy to restrain, can be to 9000 target classifications on ImageNet data sets.
(4) end-to-end convolutional neural networks are trained:In order to strengthen robustness of the convolutional neural networks to picture size, batch
After reading in images of gestures, the images of gestures of reading is scaled twice.It is for the first time random from the images of gestures being originally inputted
Arbitrary dimension is zoomed to, is to zoom to specified size again from the arbitrary dimension image after scaling for the second time, will finally zoom to
It is trained in the images of gestures input convolutional neural networks of specified size, obtains training weight, specifically comprise the following steps:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to the images of gestures of reading so that the gesture figure after scaling
As the multiple that size is 32, the images of gestures of the reading after being scaled.This is done to the yardstick for increasing data is more
Sample, strengthen the robustness of network, and then improve recognition accuracy.
(4c) carries out size scaling using the method for bilinear interpolation to input picture again, zooms to fixed size, obtains
The image that can be input in convolutional network, in this example, the size of fixed size is 672*672.Image scaling is to fixed size
It is relevant with the structure of convolutional neural networks.
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are instructed
Practice, obtain weight corresponding to convolutional neural networks.
(5) weight is loaded:The network weight that step (4d) obtains is loaded into the convolutional neural networks of step (3) structure
In;This weight is network parameter required when predicting.
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, network first contracts the images of gestures of input
The size into 4 (c) is put, then is input in the network for loaded weight and is identified, while obtains the position of gesture target identification
Put coordinate, size and generic information.
(7) redundant prediction frame is removed:The position seat for obtaining gesture in images of gestures is handled using non-maxima suppression method
Mark and generic information, obtain final prediction block.The prediction result of same target is likely to be obtained multiple identification frames, with non-pole
Big restrainable algorithms remove the identification frame of redundancy, retain the data of a maximum identification frame of confidence level, and concrete operations are as follows:
(7a) arranges framed confidence score descending, chooses frame corresponding to confidence level best result;
(7b) travels through remaining frame, if being more than certain threshold with the overlapping area IOU of current confidence score highest frame
Value, just deletes frame;
(7c) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c),
The prediction frame data remained;Position of the data of prediction block including frame, size, classification.
(8) visualization of prediction result:The coordinate data and size of Forecasting recognition frame are with respect under 4 (c) size, also
It is the fixed dimension of scaling, the prediction frame data under fixed dimension is mapped in artwork size, artwork size is to be identified
Images of gestures size, prediction block is drawn in artwork and marks gesture target generic label.
(9) record and analyze:Identification of the present invention to single photo only needs 0.02 second, up to wanting for real-time gesture identification
Ask.Camera, the convolutional neural networks trained with this are called by opencv, the classification and position for recording gesture in real time are believed
Breath, the real time data of gained is analyzed, dynamic gesture is interpreted, interpretation result is directly displayed at screen.
The present invention builds convolutional neural networks end to end with the loss function of constrained objective position simultaneously, classification, simultaneously
Position, size and the classification of target are predicted, to simplify gesture identification step, improves the speed of identification;In the training stage, random contracting
Put in images of gestures feeding convolutional neural networks to be identified and train, enhance the robustness of network, improve the accurate of identification
Rate.
Embodiment 2
Based on the dynamic gesture identification method of computer vision with embodiment 1, in step (2) of the present invention to artificial mark
True Data frame cluster, specifically include and have the following steps:
(2a) reads the true frame data of the artificial mark of training set and test set sample;
(2b) sets cluster centre number, using k-means clustering algorithms, loss metric d according to the following formula (box,
Centroid) clustered, obtain priori frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack,
IOU (box, centroid) represents the similarity degree of other frames and Main subrack, that is, the ratio of the overlapping area of two frames, leads to
The common factor divided by union for crossing Main subrack and both other frames calculate.
The most representational several priori frames of true frame that the present invention can be gathered manually by cluster, priori frame are
For the preliminary examination frame of neural network prediction.The estimation range for being determined to reduce convolutional neural networks of priori frame, accelerates network
Convergence.
Embodiment 3
Based on the dynamic gesture identification method of computer vision with embodiment 1-2, the structure convolution in step (3) of the present invention
Neutral net, including have the following steps:
(3a) based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G
The convolutional neural networks of individual convolutional layer and 5 pond layers, G takes 25 in this example.
The convolutional network of the loss function training structure of (3b) according to the following formula:
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordLost for coordinate
Coefficient, 1≤λcoord≤ 5,3 are taken as in this example, this point is to ensure that the positional information of prediction gesture is accurate;S2Represent that picture is drawn
The number of subnetting lattice, B represent the number of each grid forecasting frame;When indicating target, j-th of prediction in i-th of grid
Whether frame is responsible for the prediction of this target;(xi,yi) the true frame center point coordinate of target is represented,Represent prediction block center
Point coordinates.Function Section 2 is lost for prediction frame width is high, (wi,hi) represent that the width of true frame is high,Represent prediction block
It is wide high.Function Section 3 and Section 4 are that the probability comprising target loses in prediction block, wherein λnoobjWhen expression does not include target
Loss coefficient, 0.1≤λcoord1 is taken in≤1 this example, to ensure that convolutional neural networks can distinguish target and background block;
When expression does not contain target, whether j-th of prediction block in i-th of grid is responsible for the prediction of this target;CiExpression includes mesh
Target true probability,Represent the probability that prediction includes target.Function Section 5 is prediction class probability loss,Represent i-th
Individual grid contains target's center's point;pi(c) real goal classification is represented,Represent the target classification of prediction;C represents classification
Number.
The position detection of gesture and classification identify that a step is completed in the embodiment of the present invention.Using convolutional neural networks to original
Images of gestures carries out feature extraction, then loses training network by reducing position loss and classification, makes network in detection gesture
Gesture species is identified while position.
Embodiment 4
Based on the dynamic gesture identification method of computer vision with embodiment 1-3, the use described in step (4b) of the present invention
Bilinear interpolation method is scaled at random to image, and size selection is 32 multiple, the input picture after being scaled,
Carry out as follows:
4b1:Read in a switch cubicle image to be identified.
4b2:Image is scaled at random using bilinear interpolation method, size selection is 32 multiple, is obtained
Input picture after scaling.
The pending high-voltage board switch image inputted in the embodiment of the present invention as shown in Figure 2, switchs the pixel of image
Scope is [600-1000], and the selection of picture size size is 32 multiple { 480,512 ... 832 } after scaling, minimum 480*480,
Maximum 832*832, the input picture after being scaled.
The present invention scales images of gestures size in training convolutional neural networks at random, to increase convolutional neural networks to figure
As the robustness of size.Algorithm uses every 10 batches, just randomly scales images of gestures, allows network in different input chis
A good prediction effect is attained by very little, consolidated network can be detected on different resolution.So that identical net
Network can predict the images of gestures of different resolution, and robustness and generalization are stronger.
Below in conjunction with the accompanying drawings, providing a more complete example, the present invention will be further described.
Embodiment 5
Based on the dynamic gesture identification method of computer vision with embodiment 1-4.Referring to accompanying drawing 1, specific implementation step bag
Include:
Step 1:Images of gestures is gathered, images of gestures is shot with camera, includes:" stone ", " scissors ", " cloth ", " rod ",
" OK ", " love " etc., referring to Fig. 2 (a)-(f).Fig. 2 (a) is that clench fist gesture, Fig. 2 (b) of positive and negative is positive and negative " scissors " gesture,
Fig. 2 (c) is positive and negative palm hand gesture, and Fig. 2 (d) is tree thumb gesture, and Fig. 2 (e) is " OK " gesture, and Fig. 2 (f) is " love " hand
Gesture.Some complex backgrounds are also included in per secondary images of gestures, and same gesture possesses a variety of anglecs of rotation.By collection
Images of gestures is divided into training set and test set, and the gesture in the images of gestures of collection is manually marked respectively, obtains true
The classification and coordinate data of real frame.
The natural scene images of gestures collection of collection totally 2500 width, 6 kinds of representative gestures are chosen in this example, uniformly point
The test set of training set and 500 width for 2000 width, referring to accompanying drawing 2.The shooting of image set uses 12,000,000 mobile phone camera,
Screening and artificial mark are carried out to the image of shooting
Step 2:Cluster obtains priori frame.
Read the true frame data of training set and test set sample.
In the present embodiment, the true frame of training set and test set sample is the target frame coordinate and class manually marked in image
Other information.
Using k-means clustering algorithms, loss metric d (box, centroid) according to the following formula is clustered, and is obtained first
Test frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack,
IOU (box, centroid) represents the similarity degree of other frames and Main subrack, is calculated by the common factor divided by union of the two.
The cluster centre frame number chosen in this example is that 5, IOU (box, centroid) calculates acquisition according to the following formula:
Wherein, ∩ represents the intersection area area of two frames of centroid and box, and ∪ represents centroid and box two
The union refion area of frame.
Step 3:Build convolutional neural networks.
Based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G volume
The convolutional neural networks of lamination and 5 pond layers, G takes 23 in this example.
The convolutional network of loss function training structure according to the following formula:
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordLost for coordinate
Coefficient, 5 are taken as in this example;Function Section 3 and Section 4 are that the probability comprising target loses in prediction block, wherein λnoobjRepresent
Loss coefficient during not comprising target, 0.5 is taken in this example.
Even same gesture, different shooting angle can also obtain different images.It is difficult in existing method
Accomplish to identify the stable of the different angle of same gesture, but the convolutional neural networks that the present invention is built can overcome same gesture
Possess the problem of multi-rotation angle is difficult to, there is good stability to gesture identification.
Step 4:Training network.
Batch reads in the images of gestures of training set sample.In the present embodiment, the training set image that network is read in per batch is
64 width.
Image is scaled at random using bilinear interpolation method, the images of gestures size selection after scaling is 32
Multiple, the input picture after being scaled.
As shown in Figure 2, the pixel coverage of images of gestures is [500- to the pending images of gestures inputted in the present embodiment
800], the multiple { 480,512 ... 732 } that the selection of picture size size is 32 after scaling, minimum 480*480, maximum 732*732,
Images of gestures after being scaled.
Size scaling is carried out to the images of gestures after scaling using the method for bilinear interpolation again, zoomed to fixed big
It is small, obtain the image that can be input in convolutional network.In this example, the size that images of gestures zooms to fixed size is 608*608.
The convolutional neural networks that structure is input to using the images of gestures of fixed size are trained, and obtain convolutional Neural net
Network weight, weight are exactly the parameter of convolutional neural networks, are used as when testing.Using training set sample training network, iteration 2
Obtain weight for ten thousand times, training is completed.
Step 5:The network weight that step 4 is obtained i.e. parameter is loaded into the convolutional neural networks that step 3 is built, to survey
Have a fling at preparation.
Step 6:Images of gestures to be identified in test set is read in, is input in the network for loaded weight and is identified,
Size, position coordinates and the generic information of gesture target identification are obtained, referring to Fig. 3, Fig. 3 (a)-(f) is that the present invention is right
Answer Fig. 2 (a)-(f) recognition result.
Step 7:The position obtained and generic information are handled using non-maxima suppression method, obtain final prediction
Frame.
All prediction blocks are arranged according to confidence score descending, choose best result and its corresponding frame;
Remaining prediction block is traveled through, if being more than certain threshold with the overlapping area IOU of current confidence score highest frame
Value, just deletes frame;
Continue to select highest scoring from untreated frame, repeat said process, the prediction block remained
Data;
Step 8:Prediction frame data is mapped in artwork, the classification and positional information of gesture is obtained, is drawn in artwork
Prediction block and target generic label is marked, referring to accompanying drawing 3, Fig. 3 (a) -3 (f), each prediction block upper left corner of every width figure
The gesture class label as predicted.
Step 9:The classification and positional information of record gesture in real time, referring to accompanying drawing 4, the real time data of gained is analyzed, to dynamic
State gesture is interpreted, and interpretation result is directly displayed at into screen, referring to table 1.
The real-time testing result of the dynamic hand gesture recognition of table 1
Predict gesture central point abscissa | Predict gesture central point ordinate | Gesture classification |
1164 | 371 | Scissor |
318 | 372 | Scissor |
1152 | 373 | Scissor |
364 | 384 | Scissor |
1097 | 380 | Scissor |
388 | 388 | Scissor |
1061 | 381 | Scissor |
1027 | 383 | Scissor |
430 | 409 | Scissor |
452 | 395 | Scissor |
1001 | 380 | Scissor |
465 | 397 | Scissor |
989 | 381 | Scissor |
510 | 395 | Scissor |
960 | 381 | Scissor |
524 | 392 | Scissor |
951 | 384 | Scissor |
557 | 395 | Scissor |
918 | 394 | Scissor |
561 | 396 | Scissor |
The data of table 1 are the portions for the dynamic process that the present invention inwardly moves horizontally to two gestures represented by Fig. 4 from both sides
Member record data.Fig. 4 (a) is a certain frame of the semantic dynamic gesture for " object " in sign language, and Fig. 4 (b) is the dynamic gesture mistake
The a certain frame of journey testing result.From the data analysis of table 1, gesture keeps the state of " scissors " constant.To the number of coordinates of table 1
According to visualization, figure expression is converted to, abscissa represents abscissa of the gesture central point in current frame image in as Fig. 5, Fig. 5,
Ordinate represents ordinate of the gesture central point in current frame image.Point in Fig. 5 represents gesture central point in current frame image
Coordinate, be two " scissors " gestures, the coordinate record of dynamic mobile from outside to inside.As can be known from Fig. 5, the dynamic shown in figure
The central point ordinate of gesture is basically unchanged, and abscissa changes greatly, and it is that two " scissors " gesture levels are drawn close to illustrate the process,
The implication of " object " in corresponding sign language, referring to Fig. 4.
In the embodiment of the present invention, by calculating the distribution histogram of movement locus, to judge the motion conditions of gesture, then tie
The change of gesture state in resultant motion, to judge that gesture expresses implication in whole dynamic process, the gesture of static state is both contained
Identification, contains dynamic gesture interpretation analysis again.
The technique effect of the present invention is explained again with reference to emulation.
Embodiment 6
Based on the dynamic gesture identification method of computer vision with embodiment 1-5.
Emulation experiment condition:
The hardware platform of emulation experiment of the present invention is:Dell Computer Intel (R) Core5 processors, dominant frequency 3.20GHz,
Internal memory 64GB;Simulation Software Platform is:Visual Studio softwares (2015) version.
Emulation experiment content and interpretation of result:
The emulation experiment of the present invention is specifically divided into two emulation experiments.
The data set position coordinate and categorical data of first manual markings collection, and it is fabricated to PASCAL VOC formatted datas
The 80% of collection, wherein data set is used as training set sample, and 20% is used as test set sample.
Emulation experiment 1:The contrast of the present invention and prior art:Using the present invention with the prior art based on hand detection and
The method of SHAPE DETECTION, based on hand detection and CNN know method for distinguishing, be trained respectively with identical training set sample, then use
Same test collection sample is evaluated various methods.Evaluation result is as shown in table 2, and the Alg1 in table 2 represents the side of the present invention
Method, Alg2 represent the method based on hand detection and SHAPE DETECTION, and Alg3 represents to know method for distinguishing based on hand detection and CNN.
2 three kinds of method emulation experiment test set accuracys rate of table
Test image | Alg1 | Alg2 | Alg3 |
Accuracy rate (%) | 98.0 | 31.3 | 78.6 |
Every width time (s) | 0.02 | 0.13 | 0.94 |
From Table 2, it can be seen that the present invention is detected compared to the method based on hand detection and SHAPE DETECTION, based on hand
Know method for distinguishing with CNN, gesture identification accuracy rate has obvious advantage, and discrimination is respectively increased nearly 67% and 20%, identification speed
Degree is also faster than 6 times and 47 times respectively relative to other two methods.Discrimination of the present invention, which is higher than the reason for other two kinds of algorithms, is,
The present invention can ensure very high discrimination to the multiple angles of complex background, gesture.Recognition speed of the present invention is higher than other
The reason for two kinds of algorithms is that the present invention constructs a convolutional neural networks end to end, can predict the position of gesture simultaneously
And classification, without being divided to two progress.Simulation result shows, the present invention have when carrying out gesture target identification discrimination height,
The better performances such as speed is fast, particularly under complex background condition.
Embodiment 7
Based on the dynamic gesture identification method of computer vision with embodiment 1-5, simulated conditions and content with embodiment 6.
Emulation experiment 2:Using the inventive method, different switch image scaling size conducts is used respectively on test set
The input of network, test evaluation result are as shown in table 2.
The heterogeneous networks of table 3 input the recognition result of size
From table 3 it is observed that the present invention when input picture zooms to certain size, target identification accuracy rate there is no
Significant change, so comprehensive discrimination and recognition rate etc. consider that it is 608*608 size images of gestures conducts to select fixed dimension
The optimum size of convolutional neural networks.
It is proposed by the present invention that gesture target identification can be obtained more preferably based on the dynamic gesture identification method of computer vision
Recognition accuracy, and real-time gesture identification can be carried out.
In summary, a kind of dynamic gesture identification method based on computer vision disclosed by the invention.Solve multiple
The Dynamic Recognition problem of gesture under miscellaneous background.Its step is:Collection gesture data collection is simultaneously manually marked;To the image of mark
Collect true frame and carry out the priori frame that cluster obtains training;Structure end to end can simultaneously future position, size and classification
Convolutional neural networks;Training network obtains weight;Weight is loaded to network;Input images of gestures is identified;Non- maximum suppression
The position coordinates and generic information that method processing processed obtains;Obtain final recognition result image;The letter of record identification in real time
Breath obtains dynamic gesture interpretation result.Instant invention overcomes hand detection in gesture identification in the prior art and classification identification substep
The defects of progress, the process of gesture identification is greatly simplify, improve the degree of accuracy and the speed of identification, enhance identifying system
Robustness, and realize to dynamic gesture interpretation function.Present invention can apply to the man-machine interaction in virtual reality,
The fields such as sign language interpreter, unmanned traffic police's gesture automatic identification.
Claims (4)
1. a kind of dynamic gesture identification method based on computer vision, it is characterised in that including having the following steps:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, gesture therein carried out respectively
Artificial mark, obtains the classification and coordinate data of True Data frame;
(2) cluster obtains priori frame:The True Data frame manually marked is clustered, loss is used as using the overlapping degree of the area of frame
Measurement, obtains several preliminary examination priori frames;
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:With improved
GoogLeNet networks are as network frame, and with the loss function structure of constrained objective position simultaneously, classification, convolution is refreshing end to end
Through network;
(4) end-to-end convolutional neural networks are trained:In order to strengthen robustness of the convolutional neural networks to picture size, batch is read in
After images of gestures, the images of gestures of reading is scaled twice.For the first time scaled at random from the images of gestures being originally inputted
To arbitrary dimension, it is to zoom to specified size again from the arbitrary dimension image after scaling for the second time, will finally zooms to specified
It is trained in the images of gestures input convolutional neural networks of size, obtains training weight, specifically comprise the following steps:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to image, and size selection is 32 multiple, is scaled
The images of gestures of reading afterwards;
Images of gestures after the scaling that (4c) is obtained using the method for bilinear interpolation to step 4 (b) carries out size scaling again,
Zoom to fixed size, obtain the image that can be input in convolutional network;
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are trained, and are obtained
To weight corresponding to convolutional neural networks;
(5) weight is loaded:The network weight that step (4d) obtains is loaded into the convolutional neural networks of step (3) structure;
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, is input in the network for loaded weight and carries out
Identification, while obtain the position coordinates and generic information of gesture target identification;
(7) redundant prediction frame is removed:The position coordinates obtained and generic information are handled using non-maxima suppression method, obtained
Obtain prediction block finally:
(7a) arranges framed score descending, chooses best result and its corresponding frame;
(7b) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c), obtain
The prediction frame data remained;
(7c) continues to select highest scoring from untreated frame, repeats said process, the prediction block remained
Data;
(8) visualization of prediction result:Prediction frame data is mapped in artwork, prediction block is drawn in artwork and marks hand
Gesture target generic label;
(9) record and analyze:The classification and positional information of record gesture in real time, the real time data of gained is analyzed, to dynamic gesture
It is interpreted, interpretation result is directly displayed at screen.
2. the dynamic gesture identification method according to claim 1 based on computer vision, it is characterised in that wherein step
(2) being clustered to the True Data frame that manually marks described in, specifically includes and has the following steps:
(2a) reads the true frame data of images of gestures training set and test set sample;
(2b) uses k-means clustering algorithms, and loss metric d (box, centroid) according to the following formula is clustered, and obtains first
Test frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack, IOU
(box, centroid) represents the similarity degree of other frames and Main subrack, is calculated by the common factor divided by union of the two.
3. the dynamic gesture identification method according to claim 1 based on computer vision, it is characterised in that wherein step
(3) the structure convolutional neural networks described in, including have the following steps:
(3a) based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G volume
The convolutional neural networks of lamination and 5 pond layers;
The convolutional network of the loss function training structure of (3b) according to the following formula:
<mrow>
<mi>l</mi>
<mi>o</mi>
<mi>s</mi>
<mi>s</mi>
<mo>=</mo>
<msub>
<mi>&lambda;</mi>
<mrow>
<mi>c</mi>
<mi>o</mi>
<mi>o</mi>
<mi>r</mi>
<mi>d</mi>
</mrow>
</msub>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>I</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<mo>&lsqb;</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>x</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
</mrow>
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mrow>
<mi>c</mi>
<mi>o</mi>
<mi>o</mi>
<mi>r</mi>
<mi>d</mi>
</mrow>
</msub>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>I</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<mo>&lsqb;</mo>
<msup>
<mrow>
<mo>(</mo>
<msqrt>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>w</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
</msqrt>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<msqrt>
<msub>
<mi>h</mi>
<mi>i</mi>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>h</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
</msqrt>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>&rsqb;</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>+</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>I</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>C</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>C</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mrow>
<mi>n</mi>
<mi>o</mi>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msub>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>I</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>n</mi>
<mi>o</mi>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>C</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>C</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>+</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<msubsup>
<mi>I</mi>
<mi>i</mi>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>c</mi>
<mo>&Element;</mo>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
<mi>s</mi>
<mi>s</mi>
<mi>e</mi>
<mi>s</mi>
</mrow>
</munder>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>p</mi>
<mi>i</mi>
</msub>
<mo>(</mo>
<mi>c</mi>
<mo>)</mo>
<mo>-</mo>
<msub>
<mover>
<mi>p</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mo>(</mo>
<mi>c</mi>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordFor coordinate loss coefficient,
5 are taken as in this example;S2The number of picture grid division is represented, B represents the number of each grid forecasting frame;Indicate target
When, whether j-th of prediction block in i-th of grid is responsible for the prediction of this target;(xi,yi) represent the true frame central point of target
Coordinate,Represent prediction block center point coordinate.Function Section 2 is lost for prediction frame width is high, (wi,hi) represent true frame
Width it is high,Represent that the width of prediction block is high.Function Section 3 and Section 4 are that the probability comprising target damages in prediction block
Lose, wherein λnoobjLoss coefficient when representing not including target, takes 0.5 herein;When expression does not contain target, i-th
Whether j-th of prediction block in grid is responsible for the prediction of this target;CiThe true probability for including target is represented,Represent prediction
Probability comprising target.Function Section 5 is prediction class probability loss,Represent that i-th of grid contains target's center's point;pi
(c) real goal classification is represented,Represent the target classification of prediction;C represents classification number.
4. described in the dynamic gesture identification method according to claim 1 based on computer vision, wherein step (4b)
Image is scaled at random using bilinear interpolation method, the selection of images of gestures size is 32 multiple, is scaled
Input picture afterwards, carry out as follows:
4b1:Read in an images of gestures to be identified;
4b2:Images of gestures is scaled at random using bilinear interpolation method, size selection is 32 multiple, is obtained
The images of gestures of reading after scaling.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711102008.9A CN107808143B (en) | 2017-11-10 | 2017-11-10 | Dynamic gesture recognition method based on computer vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711102008.9A CN107808143B (en) | 2017-11-10 | 2017-11-10 | Dynamic gesture recognition method based on computer vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107808143A true CN107808143A (en) | 2018-03-16 |
CN107808143B CN107808143B (en) | 2021-06-01 |
Family
ID=61592035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711102008.9A Active CN107808143B (en) | 2017-11-10 | 2017-11-10 | Dynamic gesture recognition method based on computer vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808143B (en) |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117806A (en) * | 2018-08-22 | 2019-01-01 | 歌尔科技有限公司 | A kind of gesture identification method and device |
CN109145756A (en) * | 2018-07-24 | 2019-01-04 | 湖南万为智能机器人技术有限公司 | Object detection method based on machine vision and deep learning |
CN109165555A (en) * | 2018-07-24 | 2019-01-08 | 广东数相智能科技有限公司 | Man-machine finger-guessing game method, apparatus and storage medium based on image recognition |
CN109325454A (en) * | 2018-09-28 | 2019-02-12 | 合肥工业大学 | A kind of static gesture real-time identification method based on YOLOv3 |
CN109583456A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object detection method based on Fusion Features and dense connection |
CN109697407A (en) * | 2018-11-13 | 2019-04-30 | 北京物灵智能科技有限公司 | A kind of image processing method and device |
CN109815876A (en) * | 2019-01-17 | 2019-05-28 | 西安电子科技大学 | Gesture identification method based on address events stream feature |
CN109934184A (en) * | 2019-03-19 | 2019-06-25 | 网易(杭州)网络有限公司 | Gesture identification method and device, storage medium, processor |
CN109948480A (en) * | 2019-03-05 | 2019-06-28 | 中国电子科技集团公司第二十八研究所 | A kind of non-maxima suppression method for arbitrary quadrilateral |
CN109948690A (en) * | 2019-03-14 | 2019-06-28 | 西南交通大学 | A kind of high-speed rail scene perception method based on deep learning and structural information |
CN110135398A (en) * | 2019-05-28 | 2019-08-16 | 厦门瑞为信息技术有限公司 | Both hands off-direction disk detection method based on computer vision |
CN110135408A (en) * | 2019-03-26 | 2019-08-16 | 北京捷通华声科技股份有限公司 | Text image detection method, network and equipment |
CN110135237A (en) * | 2019-03-24 | 2019-08-16 | 北京化工大学 | A kind of gesture identification method |
CN110163048A (en) * | 2018-07-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Identification model training method, recognition methods and the equipment of hand key point |
CN110287771A (en) * | 2019-05-10 | 2019-09-27 | 平安科技(深圳)有限公司 | Image palm area extracting method and device |
CN110348323A (en) * | 2019-06-19 | 2019-10-18 | 广东工业大学 | A kind of wearable device gesture identification method based on Neural Network Optimization |
CN110363158A (en) * | 2019-07-17 | 2019-10-22 | 浙江大学 | A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision |
WO2019201029A1 (en) * | 2018-04-19 | 2019-10-24 | 华为技术有限公司 | Candidate box update method and apparatus |
CN110414402A (en) * | 2019-07-22 | 2019-11-05 | 北京达佳互联信息技术有限公司 | A kind of gesture data mask method, device, electronic equipment and storage medium |
CN110458059A (en) * | 2019-07-30 | 2019-11-15 | 北京科技大学 | A kind of gesture identification method based on computer vision and identification device |
CN110796018A (en) * | 2019-09-30 | 2020-02-14 | 武汉科技大学 | Hand motion recognition method based on depth image and color image |
CN110837766A (en) * | 2018-08-17 | 2020-02-25 | 北京市商汤科技开发有限公司 | Gesture recognition method, gesture processing method and device |
CN111050266A (en) * | 2019-12-20 | 2020-04-21 | 朱凤邹 | Method and system for performing function control based on earphone detection action |
CN111061367A (en) * | 2019-12-05 | 2020-04-24 | 神思电子技术股份有限公司 | Method for realizing gesture mouse of self-service equipment |
CN111104820A (en) * | 2018-10-25 | 2020-05-05 | 中车株洲电力机车研究所有限公司 | Gesture recognition method based on deep learning |
CN111127457A (en) * | 2019-12-25 | 2020-05-08 | 上海找钢网信息科技股份有限公司 | Reinforcing steel bar number statistical model training method, statistical method, device and equipment |
CN111275010A (en) * | 2020-02-25 | 2020-06-12 | 福建师范大学 | Pedestrian re-identification method based on computer vision |
CN111310800A (en) * | 2020-01-20 | 2020-06-19 | 世纪龙信息网络有限责任公司 | Image classification model generation method and device, computer equipment and storage medium |
CN111382643A (en) * | 2018-12-30 | 2020-07-07 | 广州市百果园信息技术有限公司 | Gesture detection method, device, equipment and storage medium |
CN111476084A (en) * | 2020-02-25 | 2020-07-31 | 福建师范大学 | Deep learning-based parking lot dynamic parking space condition identification method |
CN111597888A (en) * | 2020-04-09 | 2020-08-28 | 上海容易网电子商务股份有限公司 | Gesture recognition method combining Gaussian mixture model and CNN |
CN111639740A (en) * | 2020-05-09 | 2020-09-08 | 武汉工程大学 | Steel bar counting method based on multi-scale convolution neural network |
CN111653103A (en) * | 2020-05-07 | 2020-09-11 | 浙江大华技术股份有限公司 | Target object identification method and device |
CN111880661A (en) * | 2020-07-31 | 2020-11-03 | Oppo广东移动通信有限公司 | Gesture recognition method and device |
CN112232282A (en) * | 2020-11-04 | 2021-01-15 | 苏州臻迪智能科技有限公司 | Gesture recognition method and device, storage medium and electronic equipment |
CN112416128A (en) * | 2020-11-23 | 2021-02-26 | 森思泰克河北科技有限公司 | Gesture recognition method and terminal equipment |
CN112464860A (en) * | 2020-12-10 | 2021-03-09 | 深圳市优必选科技股份有限公司 | Gesture recognition method and device, computer equipment and storage medium |
CN112487913A (en) * | 2020-11-24 | 2021-03-12 | 北京市地铁运营有限公司运营四分公司 | Labeling method and device based on neural network and electronic equipment |
CN113297956A (en) * | 2021-05-22 | 2021-08-24 | 温州大学 | Gesture recognition method and system based on vision |
CN113627265A (en) * | 2021-07-13 | 2021-11-09 | 深圳市创客火科技有限公司 | Unmanned aerial vehicle control method and device and computer readable storage medium |
CN114035687A (en) * | 2021-11-12 | 2022-02-11 | 郑州大学 | Gesture recognition method and system based on virtual reality |
CN114627561A (en) * | 2022-05-16 | 2022-06-14 | 南昌虚拟现实研究院股份有限公司 | Dynamic gesture recognition method and device, readable storage medium and electronic equipment |
WO2022120669A1 (en) * | 2020-12-10 | 2022-06-16 | 深圳市优必选科技股份有限公司 | Gesture recognition method, computer device and storage medium |
CN113312973B (en) * | 2021-04-25 | 2023-06-02 | 北京信息科技大学 | Gesture recognition key point feature extraction method and system |
CN117079493A (en) * | 2023-08-17 | 2023-11-17 | 深圳市盛世基业物联网有限公司 | Intelligent parking management method and system based on Internet of things |
US12051236B2 (en) | 2018-09-21 | 2024-07-30 | Bigo Technology Pte. Ltd. | Method for recognizing video action, and device and storage medium thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160334975A1 (en) * | 2015-05-12 | 2016-11-17 | Konica Minolta, Inc. | Information processing device, non-transitory computer-readable recording medium storing an information processing program, and information processing method |
US20170046568A1 (en) * | 2012-04-18 | 2017-02-16 | Arb Labs Inc. | Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis |
CN106960036A (en) * | 2017-03-09 | 2017-07-18 | 杭州电子科技大学 | A kind of database building method for gesture identification |
CN107168527A (en) * | 2017-04-25 | 2017-09-15 | 华南理工大学 | The first visual angle gesture identification and exchange method based on region convolutional neural networks |
-
2017
- 2017-11-10 CN CN201711102008.9A patent/CN107808143B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170046568A1 (en) * | 2012-04-18 | 2017-02-16 | Arb Labs Inc. | Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis |
US20160334975A1 (en) * | 2015-05-12 | 2016-11-17 | Konica Minolta, Inc. | Information processing device, non-transitory computer-readable recording medium storing an information processing program, and information processing method |
CN106960036A (en) * | 2017-03-09 | 2017-07-18 | 杭州电子科技大学 | A kind of database building method for gesture identification |
CN107168527A (en) * | 2017-04-25 | 2017-09-15 | 华南理工大学 | The first visual angle gesture identification and exchange method based on region convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
XIAO JIANG ET AL.: "A Dynamic Gesture Recognition Method Based on Computer Vision", 《2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP)》 * |
关然 等: "基于计算机视觉的手势检测识别技术", 《计算机应用与软件》 * |
Cited By (64)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390344A (en) * | 2018-04-19 | 2019-10-29 | 华为技术有限公司 | Alternative frame update method and device |
WO2019201029A1 (en) * | 2018-04-19 | 2019-10-24 | 华为技术有限公司 | Candidate box update method and apparatus |
CN110163048A (en) * | 2018-07-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Identification model training method, recognition methods and the equipment of hand key point |
CN110163048B (en) * | 2018-07-10 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Hand key point recognition model training method, hand key point recognition method and hand key point recognition equipment |
CN109145756A (en) * | 2018-07-24 | 2019-01-04 | 湖南万为智能机器人技术有限公司 | Object detection method based on machine vision and deep learning |
CN109165555A (en) * | 2018-07-24 | 2019-01-08 | 广东数相智能科技有限公司 | Man-machine finger-guessing game method, apparatus and storage medium based on image recognition |
CN110837766B (en) * | 2018-08-17 | 2023-05-05 | 北京市商汤科技开发有限公司 | Gesture recognition method, gesture processing method and device |
CN110837766A (en) * | 2018-08-17 | 2020-02-25 | 北京市商汤科技开发有限公司 | Gesture recognition method, gesture processing method and device |
CN109117806B (en) * | 2018-08-22 | 2020-11-27 | 歌尔科技有限公司 | Gesture recognition method and device |
CN109117806A (en) * | 2018-08-22 | 2019-01-01 | 歌尔科技有限公司 | A kind of gesture identification method and device |
US12051236B2 (en) | 2018-09-21 | 2024-07-30 | Bigo Technology Pte. Ltd. | Method for recognizing video action, and device and storage medium thereof |
CN109325454A (en) * | 2018-09-28 | 2019-02-12 | 合肥工业大学 | A kind of static gesture real-time identification method based on YOLOv3 |
CN109325454B (en) * | 2018-09-28 | 2020-05-22 | 合肥工业大学 | Static gesture real-time recognition method based on YOLOv3 |
CN111104820A (en) * | 2018-10-25 | 2020-05-05 | 中车株洲电力机车研究所有限公司 | Gesture recognition method based on deep learning |
CN109697407A (en) * | 2018-11-13 | 2019-04-30 | 北京物灵智能科技有限公司 | A kind of image processing method and device |
CN109583456B (en) * | 2018-11-20 | 2023-04-28 | 西安电子科技大学 | Infrared surface target detection method based on feature fusion and dense connection |
CN109583456A (en) * | 2018-11-20 | 2019-04-05 | 西安电子科技大学 | Infrared surface object detection method based on Fusion Features and dense connection |
CN111382643B (en) * | 2018-12-30 | 2023-04-14 | 广州市百果园信息技术有限公司 | Gesture detection method, device, equipment and storage medium |
CN111382643A (en) * | 2018-12-30 | 2020-07-07 | 广州市百果园信息技术有限公司 | Gesture detection method, device, equipment and storage medium |
CN109815876A (en) * | 2019-01-17 | 2019-05-28 | 西安电子科技大学 | Gesture identification method based on address events stream feature |
CN109948480A (en) * | 2019-03-05 | 2019-06-28 | 中国电子科技集团公司第二十八研究所 | A kind of non-maxima suppression method for arbitrary quadrilateral |
CN109948690A (en) * | 2019-03-14 | 2019-06-28 | 西南交通大学 | A kind of high-speed rail scene perception method based on deep learning and structural information |
CN109934184A (en) * | 2019-03-19 | 2019-06-25 | 网易(杭州)网络有限公司 | Gesture identification method and device, storage medium, processor |
CN110135237B (en) * | 2019-03-24 | 2021-11-26 | 北京化工大学 | Gesture recognition method |
CN110135237A (en) * | 2019-03-24 | 2019-08-16 | 北京化工大学 | A kind of gesture identification method |
CN110135408B (en) * | 2019-03-26 | 2021-02-19 | 北京捷通华声科技股份有限公司 | Text image detection method, network and equipment |
CN110135408A (en) * | 2019-03-26 | 2019-08-16 | 北京捷通华声科技股份有限公司 | Text image detection method, network and equipment |
CN110287771A (en) * | 2019-05-10 | 2019-09-27 | 平安科技(深圳)有限公司 | Image palm area extracting method and device |
CN110135398A (en) * | 2019-05-28 | 2019-08-16 | 厦门瑞为信息技术有限公司 | Both hands off-direction disk detection method based on computer vision |
CN110348323B (en) * | 2019-06-19 | 2022-12-16 | 广东工业大学 | Wearable device gesture recognition method based on neural network optimization |
CN110348323A (en) * | 2019-06-19 | 2019-10-18 | 广东工业大学 | A kind of wearable device gesture identification method based on Neural Network Optimization |
CN110363158B (en) * | 2019-07-17 | 2021-05-25 | 浙江大学 | Millimeter wave radar and visual cooperative target detection and identification method based on neural network |
CN110363158A (en) * | 2019-07-17 | 2019-10-22 | 浙江大学 | A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision |
CN110414402A (en) * | 2019-07-22 | 2019-11-05 | 北京达佳互联信息技术有限公司 | A kind of gesture data mask method, device, electronic equipment and storage medium |
CN110414402B (en) * | 2019-07-22 | 2022-03-25 | 北京达佳互联信息技术有限公司 | Gesture data labeling method and device, electronic equipment and storage medium |
CN110458059A (en) * | 2019-07-30 | 2019-11-15 | 北京科技大学 | A kind of gesture identification method based on computer vision and identification device |
CN110458059B (en) * | 2019-07-30 | 2022-02-08 | 北京科技大学 | Gesture recognition method and device based on computer vision |
CN110796018A (en) * | 2019-09-30 | 2020-02-14 | 武汉科技大学 | Hand motion recognition method based on depth image and color image |
CN111061367B (en) * | 2019-12-05 | 2023-04-07 | 神思电子技术股份有限公司 | Method for realizing gesture mouse of self-service equipment |
CN111061367A (en) * | 2019-12-05 | 2020-04-24 | 神思电子技术股份有限公司 | Method for realizing gesture mouse of self-service equipment |
CN111050266A (en) * | 2019-12-20 | 2020-04-21 | 朱凤邹 | Method and system for performing function control based on earphone detection action |
CN111127457A (en) * | 2019-12-25 | 2020-05-08 | 上海找钢网信息科技股份有限公司 | Reinforcing steel bar number statistical model training method, statistical method, device and equipment |
CN111310800B (en) * | 2020-01-20 | 2023-10-10 | 天翼数字生活科技有限公司 | Image classification model generation method, device, computer equipment and storage medium |
CN111310800A (en) * | 2020-01-20 | 2020-06-19 | 世纪龙信息网络有限责任公司 | Image classification model generation method and device, computer equipment and storage medium |
CN111476084A (en) * | 2020-02-25 | 2020-07-31 | 福建师范大学 | Deep learning-based parking lot dynamic parking space condition identification method |
CN111275010A (en) * | 2020-02-25 | 2020-06-12 | 福建师范大学 | Pedestrian re-identification method based on computer vision |
CN111597888A (en) * | 2020-04-09 | 2020-08-28 | 上海容易网电子商务股份有限公司 | Gesture recognition method combining Gaussian mixture model and CNN |
CN111653103A (en) * | 2020-05-07 | 2020-09-11 | 浙江大华技术股份有限公司 | Target object identification method and device |
CN111639740A (en) * | 2020-05-09 | 2020-09-08 | 武汉工程大学 | Steel bar counting method based on multi-scale convolution neural network |
CN111880661A (en) * | 2020-07-31 | 2020-11-03 | Oppo广东移动通信有限公司 | Gesture recognition method and device |
CN112232282A (en) * | 2020-11-04 | 2021-01-15 | 苏州臻迪智能科技有限公司 | Gesture recognition method and device, storage medium and electronic equipment |
CN112416128A (en) * | 2020-11-23 | 2021-02-26 | 森思泰克河北科技有限公司 | Gesture recognition method and terminal equipment |
CN112487913A (en) * | 2020-11-24 | 2021-03-12 | 北京市地铁运营有限公司运营四分公司 | Labeling method and device based on neural network and electronic equipment |
WO2022120669A1 (en) * | 2020-12-10 | 2022-06-16 | 深圳市优必选科技股份有限公司 | Gesture recognition method, computer device and storage medium |
CN112464860A (en) * | 2020-12-10 | 2021-03-09 | 深圳市优必选科技股份有限公司 | Gesture recognition method and device, computer equipment and storage medium |
CN113312973B (en) * | 2021-04-25 | 2023-06-02 | 北京信息科技大学 | Gesture recognition key point feature extraction method and system |
CN113297956A (en) * | 2021-05-22 | 2021-08-24 | 温州大学 | Gesture recognition method and system based on vision |
CN113297956B (en) * | 2021-05-22 | 2023-12-08 | 温州大学 | Gesture recognition method and system based on vision |
CN113627265A (en) * | 2021-07-13 | 2021-11-09 | 深圳市创客火科技有限公司 | Unmanned aerial vehicle control method and device and computer readable storage medium |
CN114035687A (en) * | 2021-11-12 | 2022-02-11 | 郑州大学 | Gesture recognition method and system based on virtual reality |
CN114035687B (en) * | 2021-11-12 | 2023-07-25 | 郑州大学 | Gesture recognition method and system based on virtual reality |
CN114627561A (en) * | 2022-05-16 | 2022-06-14 | 南昌虚拟现实研究院股份有限公司 | Dynamic gesture recognition method and device, readable storage medium and electronic equipment |
CN117079493A (en) * | 2023-08-17 | 2023-11-17 | 深圳市盛世基业物联网有限公司 | Intelligent parking management method and system based on Internet of things |
CN117079493B (en) * | 2023-08-17 | 2024-03-19 | 深圳市盛世基业物联网有限公司 | Intelligent parking management method and system based on Internet of things |
Also Published As
Publication number | Publication date |
---|---|
CN107808143B (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808143A (en) | Dynamic gesture identification method based on computer vision | |
CN107168527B (en) | The first visual angle gesture identification and exchange method based on region convolutional neural networks | |
CN109597485B (en) | Gesture interaction system based on double-fingered-area features and working method thereof | |
CN108052946A (en) | A kind of high pressure cabinet switch automatic identifying method based on convolutional neural networks | |
CN107742107A (en) | Facial image sorting technique, device and server | |
CN109902798A (en) | The training method and device of deep neural network | |
CN106650687A (en) | Posture correction method based on depth information and skeleton information | |
CN110059741A (en) | Image-recognizing method based on semantic capsule converged network | |
CN106650630A (en) | Target tracking method and electronic equipment | |
CN107145845A (en) | The pedestrian detection method merged based on deep learning and multi-characteristic points | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN105536205A (en) | Upper limb training system based on monocular video human body action sensing | |
CN108416266A (en) | A kind of video behavior method for quickly identifying extracting moving target using light stream | |
CN105975934A (en) | Dynamic gesture identification method and system for augmented reality auxiliary maintenance | |
CN109558902A (en) | A kind of fast target detection method | |
CN109410168A (en) | For determining the modeling method of the convolutional neural networks model of the classification of the subgraph block in image | |
CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning | |
CN103186775A (en) | Human body motion recognition method based on mixed descriptor | |
CN109684959A (en) | The recognition methods of video gesture based on Face Detection and deep learning and device | |
CN106600595A (en) | Human body characteristic dimension automatic measuring method based on artificial intelligence algorithm | |
CN105069745A (en) | face-changing system based on common image sensor and enhanced augmented reality technology and method | |
CN111178170B (en) | Gesture recognition method and electronic equipment | |
CN109740454A (en) | A kind of human body posture recognition methods based on YOLO-V3 | |
CN105912126A (en) | Method for adaptively adjusting gain, mapped to interface, of gesture movement | |
CN109614990A (en) | A kind of object detecting device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |