CN107808143A - Dynamic gesture identification method based on computer vision - Google Patents

Dynamic gesture identification method based on computer vision Download PDF

Info

Publication number
CN107808143A
CN107808143A CN201711102008.9A CN201711102008A CN107808143A CN 107808143 A CN107808143 A CN 107808143A CN 201711102008 A CN201711102008 A CN 201711102008A CN 107808143 A CN107808143 A CN 107808143A
Authority
CN
China
Prior art keywords
mrow
msub
gesture
frame
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711102008.9A
Other languages
Chinese (zh)
Other versions
CN107808143B (en
Inventor
王爽
焦李成
方帅
王若静
杨孟然
权豆
孙莉
侯彪
马晶晶
刘飞航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201711102008.9A priority Critical patent/CN107808143B/en
Publication of CN107808143A publication Critical patent/CN107808143A/en
Application granted granted Critical
Publication of CN107808143B publication Critical patent/CN107808143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of dynamic gesture identification method based on computer vision.Solves the problems, such as the Dynamic Recognition of the gesture under complex background.Implementation step is:Collection gesture data collection is simultaneously manually marked, the priori frame of cluster acquisition training is carried out to the true frame of image set of mark, structure end to end can simultaneously future position, size and classification convolutional neural networks, training network obtains weight, load weight to be identified to network, input images of gestures, the position coordinates and generic information that the processing of non-maxima suppression method obtains, final recognition result image is obtained, identification information is recorded in real time and obtains dynamic gesture interpretation result.The defects of being carried out instant invention overcomes hand detection in gesture identification in the prior art and classification identification substep, it greatly simplify the process of gesture identification, the degree of accuracy and the speed of identification are improved, enhances the robustness of identifying system, and realizes the function to dynamic gesture interpretation.

Description

Dynamic gesture identification method based on computer vision
Technical field
The invention belongs to technical field of image processing, further relates to the target identification technology of image, is specifically one kind Dynamic gesture identification method based on computer vision.Available in image gesture position detection and state recognition, so as to More accurately information is provided for the application such as the follow-up sign language interpreter of gesture identification, game interactive.
Background technology
In recent years, with the development of the related disciplines such as computer vision and machine learning, human-computer interaction technology (human Computer interaction) just gradually from by " centered on computer " to so that " artificial " center " changes.Made with human body itself Interactive experience more directly perceived, comfortable is provided for operator for the natural user interface of intercommunion platform, is known including face Not, gesture identification and body posture identification etc..Gesture wherein in daily life possesses very as intuitively exchange way naturally Good application prospect:The smart machine in virtual reality is controlled using the gesture provided;As sign language interpreter, solve The communicating questions of deaf-mute;Unmanned automatic identification traffic police gesture.Therefore, gesture identification have critically important researching value and Meaning.
Gesture identification is concentrated mainly on two aspects, and one kind is to be based on sensing equipment (such as:Data glove+position tracking instrument) Gesture identification, another kind is the gesture identification of view-based access control model.Because the gesture identification of view-based access control model can make operator with more Natural mode is added to carry out man-machine interaction, and flexibility is bigger, so having obtained more researchs and concern.Current most gestures Identification is all based on carrying out position detection and identification to the gesture in image, using first detecting hand position, then determines gesture class Other two steps recognition methods.
Paper " the Real-Time Hand Gesture Recognition Using that Zhi-hua Chen et al. are delivered Finger Segmentation”(The scientific world journal,2014(3):267872) one kind is proposed in Method based on hand detection and SHAPE DETECTION.This method extracts hand region and binaryzation first with Background difference, so After be partitioned into finger and palm, recycle finger quantity and content (content refers to the title of finger, such as:Thumb, forefinger, Middle finger etc.) gesture target is classified from original 13 templates.But this method is strict to image background requirement, only Hand position can be just partitioned under single background by having.In addition, the gesture shape of the method identification is single, poor robustness is difficult To promote.
Paper " the A Real-time Hand Gesture Recognition and Human- that Pei Xu are delivered Proposed in Computer Interaction System " (In CVPR, IEEE, 2017) a kind of based on hand detection and CNN The algorithm of identification.This method obtains the binary image for only including hand using the primary image processing method such as filtering, morphology, It is then enter into convolutional neural networks LeNet and carries out feature extraction and identify, improves the degree of accuracy.But this method Need to pre-process image, high is required to background color, and the detection and identification of gesture are carried out in two steps, i.e., first obtain The position of gesture, then current gesture is classified to obtain state, identification step is cumbersome and time-consuming.
The content of the invention
It is an object of the invention to the deficiency for prior art, propose a kind of accuracy rate it is higher, it is more efficient based on The dynamic gesture identification method of computer vision.
The present invention is a kind of dynamic gesture identification method based on computer vision, it is characterised in that includes following step Suddenly:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, respectively to gesture therein Manually marked, obtain the classification and coordinate data of True Data frame;
(2) cluster obtains priori frame:The True Data frame that manually marks is clustered, using the overlapping degree of the area of frame as Loss metric, obtain several preliminary examination priori frames;
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:To change The GoogLeNet networks entered are rolled up end to end as network frame with the loss function structure of constrained objective position simultaneously, classification Product neutral net;
(4) end to end network is trained:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to image, and size selection is 32 multiple, is obtained The images of gestures of reading after scaling;
(4c) carries out size scaling using the method for bilinear interpolation to input picture, zooms to fixed size, obtains energy The image being input in convolutional network;
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are instructed Practice, weight corresponding to the convolutional neural networks built;
(5) weight is loaded:Weight corresponding to the convolutional neural networks that step (4d) is obtained is loaded into step (3) structure In convolutional neural networks;
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, is input to the convolution god for having loaded weight Through being identified in network, while obtain the position coordinates and generic information of gesture target identification to be identified;
(7) redundant prediction frame is removed:The position coordinates obtained and generic letter are handled using non-maxima suppression method Breath, obtains final prediction block:
(7a) arranges the score descending of all prediction blocks, chooses best result and its corresponding frame;
(7b) travels through remaining frame, if being more than certain threshold value with the overlapping area IOU of current best result frame, just by this frame Delete;
(7c) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c), The prediction frame data remained;
(8) visualization of prediction result:Prediction frame data is mapped in artwork, prediction block is drawn in artwork and is marked Go out gesture target generic label;
(9) record and analyze:The classification and positional information of record gesture in real time, the real time data of gained is analyzed, to dynamic Gesture is interpreted, and interpretation result is directly displayed at into screen.
It is of the invention that gesture is identified end to end using depth convolutional neural networks, can not only be real to dynamic gesture When identify, and higher accuracy rate can be kept under complex background.
The present invention has advantages below compared with prior art:
1st, gesture is identified using convolutional neural networks by the present invention, the position detection and identification of gesture target in image One step is completed, and step is succinct, and recognition speed is fast, overcomes two steps in the prior art and separately handles, first detects hand position, then know The defects of real-time can not be ensured during other gesture.Network can extract the feature of images of gestures well simultaneously, in any angle pair The identification of gesture has very high accuracy rate, and the background of image is not required, even also can be accurate under the background of complexity Gesture really is identified, image background in the prior art is overcome and requires the defects of single;
2nd, the present invention is in training convolutional neural networks using the method for random scaling images of gestures size, and every iteration is several times The size that images of gestures will be changed is input in convolutional neural networks.Algorithm uses every 10 batches, and network will be randomly A new dimension of picture is selected, allows network to be attained by a good prediction effect in different input sizes, it is same Network can be detected on different resolution.So that identical network can predict the detection of different resolution, robust Property and generalization are stronger.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the natural scene gesture figure that the present invention uses in emulation experiment;
Fig. 3 is the gesture target identification result figure obtained in emulation experiment;
Fig. 4 is recognition result figure of the present invention to dynamic gesture, and wherein Fig. 4 (a) is semantic for the dynamic of " object " in sign language The a certain frame of state gesture, Fig. 4 (b) are a certain frames of the process testing result;
Fig. 5 is the record figure to dynamic hand gesture recognition process gesture center point coordinate.
Embodiment
The present invention is described in detail below in conjunction with the accompanying drawings.
Embodiment 1
Gesture possesses good application prospect as intuitively exchange way naturally:Using the gesture provided to void The smart machine intended in reality is controlled;As sign language interpreter, solve the communicating questions of deaf-mute;Unmanned automatic identification Traffic police's gesture etc..Conventional method is substantially all used currently for the Gesture Recognition of view-based access control model, i.e., is first partitioned into gesture, then Gesture is classified, this mode requires high to photographic quality, and is difficult to handle the gesture under complex background.Therefore limit The development of gesture identification application.The present invention is directed to above-mentioned present situation, expands research and innovation, proposes that one kind is regarded based on computer The dynamic gesture identification method of feel, referring to Fig. 1, including have the following steps:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, training set, which is used to train, to be rolled up Product neutral net, test set are used for the accuracy rate for calculating the Network Recognition.The gesture in the images of gestures collected is marked, is obtained Most press close to the rectangle frame size and center point coordinate of gesture, and the classification of corresponding gesture.Realization enters pedestrian to gesture therein Work marks, and obtains the classification and coordinate data of True Data frame.
(2) cluster obtains priori frame:Cluster centre number is chosen, the True Data frame manually marked is clustered, by frame The overlapping degree of area is clustered as loss metric, obtains several preliminary examination priori frames.Cluster centre number is set in this example 9 are set to, after using overlapping degree as the cluster of loss metric, obtained 9 preliminary examination priori frames, with this 9 preliminary examination priori Preliminary examination prediction block of the frame as convolutional neural networks, the convergence time of convolutional neural networks can be shortened.Generally, cluster centre number Size depend on the dense degrees of target numbers in picture, and target numbers are more in picture, the middle calculation of the cluster of setting It is more.
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:To change The GoogLeNet networks entered coordinate constrained objective position, size, the loss function structure end of classification simultaneously as network frame To the convolutional neural networks at end.The convolutional neural networks end to end of constrained objective position and classification simultaneously are capable of in design one, The network can predict position, size and the classification of target gesture simultaneously.The convolutional neural networks that the present invention is built make use of together When constrained objective position, size and its classification loss function so that the network possesses while future position, size and class Other function.The network calculations amount is small, and is easy to restrain, can be to 9000 target classifications on ImageNet data sets.
(4) end-to-end convolutional neural networks are trained:In order to strengthen robustness of the convolutional neural networks to picture size, batch After reading in images of gestures, the images of gestures of reading is scaled twice.It is for the first time random from the images of gestures being originally inputted Arbitrary dimension is zoomed to, is to zoom to specified size again from the arbitrary dimension image after scaling for the second time, will finally zoom to It is trained in the images of gestures input convolutional neural networks of specified size, obtains training weight, specifically comprise the following steps:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to the images of gestures of reading so that the gesture figure after scaling As the multiple that size is 32, the images of gestures of the reading after being scaled.This is done to the yardstick for increasing data is more Sample, strengthen the robustness of network, and then improve recognition accuracy.
(4c) carries out size scaling using the method for bilinear interpolation to input picture again, zooms to fixed size, obtains The image that can be input in convolutional network, in this example, the size of fixed size is 672*672.Image scaling is to fixed size It is relevant with the structure of convolutional neural networks.
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are instructed Practice, obtain weight corresponding to convolutional neural networks.
(5) weight is loaded:The network weight that step (4d) obtains is loaded into the convolutional neural networks of step (3) structure In;This weight is network parameter required when predicting.
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, network first contracts the images of gestures of input The size into 4 (c) is put, then is input in the network for loaded weight and is identified, while obtains the position of gesture target identification Put coordinate, size and generic information.
(7) redundant prediction frame is removed:The position seat for obtaining gesture in images of gestures is handled using non-maxima suppression method Mark and generic information, obtain final prediction block.The prediction result of same target is likely to be obtained multiple identification frames, with non-pole Big restrainable algorithms remove the identification frame of redundancy, retain the data of a maximum identification frame of confidence level, and concrete operations are as follows:
(7a) arranges framed confidence score descending, chooses frame corresponding to confidence level best result;
(7b) travels through remaining frame, if being more than certain threshold with the overlapping area IOU of current confidence score highest frame Value, just deletes frame;
(7c) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c), The prediction frame data remained;Position of the data of prediction block including frame, size, classification.
(8) visualization of prediction result:The coordinate data and size of Forecasting recognition frame are with respect under 4 (c) size, also It is the fixed dimension of scaling, the prediction frame data under fixed dimension is mapped in artwork size, artwork size is to be identified Images of gestures size, prediction block is drawn in artwork and marks gesture target generic label.
(9) record and analyze:Identification of the present invention to single photo only needs 0.02 second, up to wanting for real-time gesture identification Ask.Camera, the convolutional neural networks trained with this are called by opencv, the classification and position for recording gesture in real time are believed Breath, the real time data of gained is analyzed, dynamic gesture is interpreted, interpretation result is directly displayed at screen.
The present invention builds convolutional neural networks end to end with the loss function of constrained objective position simultaneously, classification, simultaneously Position, size and the classification of target are predicted, to simplify gesture identification step, improves the speed of identification;In the training stage, random contracting Put in images of gestures feeding convolutional neural networks to be identified and train, enhance the robustness of network, improve the accurate of identification Rate.
Embodiment 2
Based on the dynamic gesture identification method of computer vision with embodiment 1, in step (2) of the present invention to artificial mark True Data frame cluster, specifically include and have the following steps:
(2a) reads the true frame data of the artificial mark of training set and test set sample;
(2b) sets cluster centre number, using k-means clustering algorithms, loss metric d according to the following formula (box, Centroid) clustered, obtain priori frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack, IOU (box, centroid) represents the similarity degree of other frames and Main subrack, that is, the ratio of the overlapping area of two frames, leads to The common factor divided by union for crossing Main subrack and both other frames calculate.
The most representational several priori frames of true frame that the present invention can be gathered manually by cluster, priori frame are For the preliminary examination frame of neural network prediction.The estimation range for being determined to reduce convolutional neural networks of priori frame, accelerates network Convergence.
Embodiment 3
Based on the dynamic gesture identification method of computer vision with embodiment 1-2, the structure convolution in step (3) of the present invention Neutral net, including have the following steps:
(3a) based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G The convolutional neural networks of individual convolutional layer and 5 pond layers, G takes 25 in this example.
The convolutional network of the loss function training structure of (3b) according to the following formula:
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordLost for coordinate Coefficient, 1≤λcoord≤ 5,3 are taken as in this example, this point is to ensure that the positional information of prediction gesture is accurate;S2Represent that picture is drawn The number of subnetting lattice, B represent the number of each grid forecasting frame;When indicating target, j-th of prediction in i-th of grid Whether frame is responsible for the prediction of this target;(xi,yi) the true frame center point coordinate of target is represented,Represent prediction block center Point coordinates.Function Section 2 is lost for prediction frame width is high, (wi,hi) represent that the width of true frame is high,Represent prediction block It is wide high.Function Section 3 and Section 4 are that the probability comprising target loses in prediction block, wherein λnoobjWhen expression does not include target Loss coefficient, 0.1≤λcoord1 is taken in≤1 this example, to ensure that convolutional neural networks can distinguish target and background block; When expression does not contain target, whether j-th of prediction block in i-th of grid is responsible for the prediction of this target;CiExpression includes mesh Target true probability,Represent the probability that prediction includes target.Function Section 5 is prediction class probability loss,Represent i-th Individual grid contains target's center's point;pi(c) real goal classification is represented,Represent the target classification of prediction;C represents classification Number.
The position detection of gesture and classification identify that a step is completed in the embodiment of the present invention.Using convolutional neural networks to original Images of gestures carries out feature extraction, then loses training network by reducing position loss and classification, makes network in detection gesture Gesture species is identified while position.
Embodiment 4
Based on the dynamic gesture identification method of computer vision with embodiment 1-3, the use described in step (4b) of the present invention Bilinear interpolation method is scaled at random to image, and size selection is 32 multiple, the input picture after being scaled, Carry out as follows:
4b1:Read in a switch cubicle image to be identified.
4b2:Image is scaled at random using bilinear interpolation method, size selection is 32 multiple, is obtained Input picture after scaling.
The pending high-voltage board switch image inputted in the embodiment of the present invention as shown in Figure 2, switchs the pixel of image Scope is [600-1000], and the selection of picture size size is 32 multiple { 480,512 ... 832 } after scaling, minimum 480*480, Maximum 832*832, the input picture after being scaled.
The present invention scales images of gestures size in training convolutional neural networks at random, to increase convolutional neural networks to figure As the robustness of size.Algorithm uses every 10 batches, just randomly scales images of gestures, allows network in different input chis A good prediction effect is attained by very little, consolidated network can be detected on different resolution.So that identical net Network can predict the images of gestures of different resolution, and robustness and generalization are stronger.
Below in conjunction with the accompanying drawings, providing a more complete example, the present invention will be further described.
Embodiment 5
Based on the dynamic gesture identification method of computer vision with embodiment 1-4.Referring to accompanying drawing 1, specific implementation step bag Include:
Step 1:Images of gestures is gathered, images of gestures is shot with camera, includes:" stone ", " scissors ", " cloth ", " rod ", " OK ", " love " etc., referring to Fig. 2 (a)-(f).Fig. 2 (a) is that clench fist gesture, Fig. 2 (b) of positive and negative is positive and negative " scissors " gesture, Fig. 2 (c) is positive and negative palm hand gesture, and Fig. 2 (d) is tree thumb gesture, and Fig. 2 (e) is " OK " gesture, and Fig. 2 (f) is " love " hand Gesture.Some complex backgrounds are also included in per secondary images of gestures, and same gesture possesses a variety of anglecs of rotation.By collection Images of gestures is divided into training set and test set, and the gesture in the images of gestures of collection is manually marked respectively, obtains true The classification and coordinate data of real frame.
The natural scene images of gestures collection of collection totally 2500 width, 6 kinds of representative gestures are chosen in this example, uniformly point The test set of training set and 500 width for 2000 width, referring to accompanying drawing 2.The shooting of image set uses 12,000,000 mobile phone camera, Screening and artificial mark are carried out to the image of shooting
Step 2:Cluster obtains priori frame.
Read the true frame data of training set and test set sample.
In the present embodiment, the true frame of training set and test set sample is the target frame coordinate and class manually marked in image Other information.
Using k-means clustering algorithms, loss metric d (box, centroid) according to the following formula is clustered, and is obtained first Test frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack, IOU (box, centroid) represents the similarity degree of other frames and Main subrack, is calculated by the common factor divided by union of the two.
The cluster centre frame number chosen in this example is that 5, IOU (box, centroid) calculates acquisition according to the following formula:
Wherein, ∩ represents the intersection area area of two frames of centroid and box, and ∪ represents centroid and box two The union refion area of frame.
Step 3:Build convolutional neural networks.
Based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G volume The convolutional neural networks of lamination and 5 pond layers, G takes 23 in this example.
The convolutional network of loss function training structure according to the following formula:
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordLost for coordinate Coefficient, 5 are taken as in this example;Function Section 3 and Section 4 are that the probability comprising target loses in prediction block, wherein λnoobjRepresent Loss coefficient during not comprising target, 0.5 is taken in this example.
Even same gesture, different shooting angle can also obtain different images.It is difficult in existing method Accomplish to identify the stable of the different angle of same gesture, but the convolutional neural networks that the present invention is built can overcome same gesture Possess the problem of multi-rotation angle is difficult to, there is good stability to gesture identification.
Step 4:Training network.
Batch reads in the images of gestures of training set sample.In the present embodiment, the training set image that network is read in per batch is 64 width.
Image is scaled at random using bilinear interpolation method, the images of gestures size selection after scaling is 32 Multiple, the input picture after being scaled.
As shown in Figure 2, the pixel coverage of images of gestures is [500- to the pending images of gestures inputted in the present embodiment 800], the multiple { 480,512 ... 732 } that the selection of picture size size is 32 after scaling, minimum 480*480, maximum 732*732, Images of gestures after being scaled.
Size scaling is carried out to the images of gestures after scaling using the method for bilinear interpolation again, zoomed to fixed big It is small, obtain the image that can be input in convolutional network.In this example, the size that images of gestures zooms to fixed size is 608*608.
The convolutional neural networks that structure is input to using the images of gestures of fixed size are trained, and obtain convolutional Neural net Network weight, weight are exactly the parameter of convolutional neural networks, are used as when testing.Using training set sample training network, iteration 2 Obtain weight for ten thousand times, training is completed.
Step 5:The network weight that step 4 is obtained i.e. parameter is loaded into the convolutional neural networks that step 3 is built, to survey Have a fling at preparation.
Step 6:Images of gestures to be identified in test set is read in, is input in the network for loaded weight and is identified, Size, position coordinates and the generic information of gesture target identification are obtained, referring to Fig. 3, Fig. 3 (a)-(f) is that the present invention is right Answer Fig. 2 (a)-(f) recognition result.
Step 7:The position obtained and generic information are handled using non-maxima suppression method, obtain final prediction Frame.
All prediction blocks are arranged according to confidence score descending, choose best result and its corresponding frame;
Remaining prediction block is traveled through, if being more than certain threshold with the overlapping area IOU of current confidence score highest frame Value, just deletes frame;
Continue to select highest scoring from untreated frame, repeat said process, the prediction block remained Data;
Step 8:Prediction frame data is mapped in artwork, the classification and positional information of gesture is obtained, is drawn in artwork Prediction block and target generic label is marked, referring to accompanying drawing 3, Fig. 3 (a) -3 (f), each prediction block upper left corner of every width figure The gesture class label as predicted.
Step 9:The classification and positional information of record gesture in real time, referring to accompanying drawing 4, the real time data of gained is analyzed, to dynamic State gesture is interpreted, and interpretation result is directly displayed at into screen, referring to table 1.
The real-time testing result of the dynamic hand gesture recognition of table 1
Predict gesture central point abscissa Predict gesture central point ordinate Gesture classification
1164 371 Scissor
318 372 Scissor
1152 373 Scissor
364 384 Scissor
1097 380 Scissor
388 388 Scissor
1061 381 Scissor
1027 383 Scissor
430 409 Scissor
452 395 Scissor
1001 380 Scissor
465 397 Scissor
989 381 Scissor
510 395 Scissor
960 381 Scissor
524 392 Scissor
951 384 Scissor
557 395 Scissor
918 394 Scissor
561 396 Scissor
The data of table 1 are the portions for the dynamic process that the present invention inwardly moves horizontally to two gestures represented by Fig. 4 from both sides Member record data.Fig. 4 (a) is a certain frame of the semantic dynamic gesture for " object " in sign language, and Fig. 4 (b) is the dynamic gesture mistake The a certain frame of journey testing result.From the data analysis of table 1, gesture keeps the state of " scissors " constant.To the number of coordinates of table 1 According to visualization, figure expression is converted to, abscissa represents abscissa of the gesture central point in current frame image in as Fig. 5, Fig. 5, Ordinate represents ordinate of the gesture central point in current frame image.Point in Fig. 5 represents gesture central point in current frame image Coordinate, be two " scissors " gestures, the coordinate record of dynamic mobile from outside to inside.As can be known from Fig. 5, the dynamic shown in figure The central point ordinate of gesture is basically unchanged, and abscissa changes greatly, and it is that two " scissors " gesture levels are drawn close to illustrate the process, The implication of " object " in corresponding sign language, referring to Fig. 4.
In the embodiment of the present invention, by calculating the distribution histogram of movement locus, to judge the motion conditions of gesture, then tie The change of gesture state in resultant motion, to judge that gesture expresses implication in whole dynamic process, the gesture of static state is both contained Identification, contains dynamic gesture interpretation analysis again.
The technique effect of the present invention is explained again with reference to emulation.
Embodiment 6
Based on the dynamic gesture identification method of computer vision with embodiment 1-5.
Emulation experiment condition:
The hardware platform of emulation experiment of the present invention is:Dell Computer Intel (R) Core5 processors, dominant frequency 3.20GHz, Internal memory 64GB;Simulation Software Platform is:Visual Studio softwares (2015) version.
Emulation experiment content and interpretation of result:
The emulation experiment of the present invention is specifically divided into two emulation experiments.
The data set position coordinate and categorical data of first manual markings collection, and it is fabricated to PASCAL VOC formatted datas The 80% of collection, wherein data set is used as training set sample, and 20% is used as test set sample.
Emulation experiment 1:The contrast of the present invention and prior art:Using the present invention with the prior art based on hand detection and The method of SHAPE DETECTION, based on hand detection and CNN know method for distinguishing, be trained respectively with identical training set sample, then use Same test collection sample is evaluated various methods.Evaluation result is as shown in table 2, and the Alg1 in table 2 represents the side of the present invention Method, Alg2 represent the method based on hand detection and SHAPE DETECTION, and Alg3 represents to know method for distinguishing based on hand detection and CNN.
2 three kinds of method emulation experiment test set accuracys rate of table
Test image Alg1 Alg2 Alg3
Accuracy rate (%) 98.0 31.3 78.6
Every width time (s) 0.02 0.13 0.94
From Table 2, it can be seen that the present invention is detected compared to the method based on hand detection and SHAPE DETECTION, based on hand Know method for distinguishing with CNN, gesture identification accuracy rate has obvious advantage, and discrimination is respectively increased nearly 67% and 20%, identification speed Degree is also faster than 6 times and 47 times respectively relative to other two methods.Discrimination of the present invention, which is higher than the reason for other two kinds of algorithms, is, The present invention can ensure very high discrimination to the multiple angles of complex background, gesture.Recognition speed of the present invention is higher than other The reason for two kinds of algorithms is that the present invention constructs a convolutional neural networks end to end, can predict the position of gesture simultaneously And classification, without being divided to two progress.Simulation result shows, the present invention have when carrying out gesture target identification discrimination height, The better performances such as speed is fast, particularly under complex background condition.
Embodiment 7
Based on the dynamic gesture identification method of computer vision with embodiment 1-5, simulated conditions and content with embodiment 6.
Emulation experiment 2:Using the inventive method, different switch image scaling size conducts is used respectively on test set The input of network, test evaluation result are as shown in table 2.
The heterogeneous networks of table 3 input the recognition result of size
From table 3 it is observed that the present invention when input picture zooms to certain size, target identification accuracy rate there is no Significant change, so comprehensive discrimination and recognition rate etc. consider that it is 608*608 size images of gestures conducts to select fixed dimension The optimum size of convolutional neural networks.
It is proposed by the present invention that gesture target identification can be obtained more preferably based on the dynamic gesture identification method of computer vision Recognition accuracy, and real-time gesture identification can be carried out.
In summary, a kind of dynamic gesture identification method based on computer vision disclosed by the invention.Solve multiple The Dynamic Recognition problem of gesture under miscellaneous background.Its step is:Collection gesture data collection is simultaneously manually marked;To the image of mark Collect true frame and carry out the priori frame that cluster obtains training;Structure end to end can simultaneously future position, size and classification Convolutional neural networks;Training network obtains weight;Weight is loaded to network;Input images of gestures is identified;Non- maximum suppression The position coordinates and generic information that method processing processed obtains;Obtain final recognition result image;The letter of record identification in real time Breath obtains dynamic gesture interpretation result.Instant invention overcomes hand detection in gesture identification in the prior art and classification identification substep The defects of progress, the process of gesture identification is greatly simplify, improve the degree of accuracy and the speed of identification, enhance identifying system Robustness, and realize to dynamic gesture interpretation function.Present invention can apply to the man-machine interaction in virtual reality, The fields such as sign language interpreter, unmanned traffic police's gesture automatic identification.

Claims (4)

1. a kind of dynamic gesture identification method based on computer vision, it is characterised in that including having the following steps:
(1) images of gestures is gathered:The images of gestures of collection is divided into training set and test set, gesture therein carried out respectively Artificial mark, obtains the classification and coordinate data of True Data frame;
(2) cluster obtains priori frame:The True Data frame manually marked is clustered, loss is used as using the overlapping degree of the area of frame Measurement, obtains several preliminary examination priori frames;
(3) convolutional neural networks of the position that can predict target gesture simultaneously end to end, size and classification are built:With improved GoogLeNet networks are as network frame, and with the loss function structure of constrained objective position simultaneously, classification, convolution is refreshing end to end Through network;
(4) end-to-end convolutional neural networks are trained:In order to strengthen robustness of the convolutional neural networks to picture size, batch is read in After images of gestures, the images of gestures of reading is scaled twice.For the first time scaled at random from the images of gestures being originally inputted To arbitrary dimension, it is to zoom to specified size again from the arbitrary dimension image after scaling for the second time, will finally zooms to specified It is trained in the images of gestures input convolutional neural networks of size, obtains training weight, specifically comprise the following steps:
(4a) batch reads in the images of gestures of training set sample;
(4b) is scaled at random using bilinear interpolation method to image, and size selection is 32 multiple, is scaled The images of gestures of reading afterwards;
Images of gestures after the scaling that (4c) is obtained using the method for bilinear interpolation to step 4 (b) carries out size scaling again, Zoom to fixed size, obtain the image that can be input in convolutional network;
The convolutional neural networks that the fixed size image that (4d) is obtained using step (4c) is built to step (3) are trained, and are obtained To weight corresponding to convolutional neural networks;
(5) weight is loaded:The network weight that step (4d) obtains is loaded into the convolutional neural networks of step (3) structure;
(6) position and the classification of gesture are predicted:Images of gestures to be identified is read in, is input in the network for loaded weight and carries out Identification, while obtain the position coordinates and generic information of gesture target identification;
(7) redundant prediction frame is removed:The position coordinates obtained and generic information are handled using non-maxima suppression method, obtained Obtain prediction block finally:
(7a) arranges framed score descending, chooses best result and its corresponding frame;
(7b) continues to select highest scoring from untreated frame, repeats said process, that is, performs (7a) and arrive (7c), obtain The prediction frame data remained;
(7c) continues to select highest scoring from untreated frame, repeats said process, the prediction block remained Data;
(8) visualization of prediction result:Prediction frame data is mapped in artwork, prediction block is drawn in artwork and marks hand Gesture target generic label;
(9) record and analyze:The classification and positional information of record gesture in real time, the real time data of gained is analyzed, to dynamic gesture It is interpreted, interpretation result is directly displayed at screen.
2. the dynamic gesture identification method according to claim 1 based on computer vision, it is characterised in that wherein step (2) being clustered to the True Data frame that manually marks described in, specifically includes and has the following steps:
(2a) reads the true frame data of images of gestures training set and test set sample;
(2b) uses k-means clustering algorithms, and loss metric d (box, centroid) according to the following formula is clustered, and obtains first Test frame:
D (box, centroid)=1-IOU (box, centroid)
Wherein, centroid represents the cluster centre frame randomly selected, and box represents other true frames in addition to Main subrack, IOU (box, centroid) represents the similarity degree of other frames and Main subrack, is calculated by the common factor divided by union of the two.
3. the dynamic gesture identification method according to claim 1 based on computer vision, it is characterised in that wherein step (3) the structure convolutional neural networks described in, including have the following steps:
(3a) based on GoogLeNet convolutional neural networks, using simple 1*1 and 3*3 convolution kernels, structure includes G volume The convolutional neural networks of lamination and 5 pond layers;
The convolutional network of the loss function training structure of (3b) according to the following formula:
<mrow> <mi>l</mi> <mi>o</mi> <mi>s</mi> <mi>s</mi> <mo>=</mo> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>c</mi> <mi>o</mi> <mi>o</mi> <mi>r</mi> <mi>d</mi> </mrow> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>B</mi> </munderover> <msubsup> <mi>I</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mover> <mi>x</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow>
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>c</mi> <mi>o</mi> <mi>o</mi> <mi>r</mi> <mi>d</mi> </mrow> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>B</mi> </munderover> <msubsup> <mi>I</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <mo>&amp;lsqb;</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>w</mi> <mi>i</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>w</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msqrt> <msub> <mi>h</mi> <mi>i</mi> </msub> </msqrt> <mo>-</mo> <msqrt> <msub> <mover> <mi>h</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> </msqrt> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>&amp;rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>B</mi> </munderover> <msubsup> <mi>I</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mover> <mi>C</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mrow> <mi>n</mi> <mi>o</mi> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>0</mn> </mrow> <mi>B</mi> </munderover> <msubsup> <mi>I</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mrow> <mi>n</mi> <mi>o</mi> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mover> <mi>C</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <msubsup> <mi>I</mi> <mi>i</mi> <mrow> <mi>o</mi> <mi>b</mi> <mi>j</mi> </mrow> </msubsup> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>c</mi> <mo>&amp;Element;</mo> <mi>c</mi> <mi>l</mi> <mi>a</mi> <mi>s</mi> <mi>s</mi> <mi>e</mi> <mi>s</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>(</mo> <mi>c</mi> <mo>)</mo> <mo>-</mo> <msub> <mover> <mi>p</mi> <mo>^</mo> </mover> <mi>i</mi> </msub> <mo>(</mo> <mi>c</mi> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein, the Section 1 of loss function is lost for the center point coordinate of prediction target frame, wherein λcoordFor coordinate loss coefficient, 5 are taken as in this example;S2The number of picture grid division is represented, B represents the number of each grid forecasting frame;Indicate target When, whether j-th of prediction block in i-th of grid is responsible for the prediction of this target;(xi,yi) represent the true frame central point of target Coordinate,Represent prediction block center point coordinate.Function Section 2 is lost for prediction frame width is high, (wi,hi) represent true frame Width it is high,Represent that the width of prediction block is high.Function Section 3 and Section 4 are that the probability comprising target damages in prediction block Lose, wherein λnoobjLoss coefficient when representing not including target, takes 0.5 herein;When expression does not contain target, i-th Whether j-th of prediction block in grid is responsible for the prediction of this target;CiThe true probability for including target is represented,Represent prediction Probability comprising target.Function Section 5 is prediction class probability loss,Represent that i-th of grid contains target's center's point;pi (c) real goal classification is represented,Represent the target classification of prediction;C represents classification number.
4. described in the dynamic gesture identification method according to claim 1 based on computer vision, wherein step (4b) Image is scaled at random using bilinear interpolation method, the selection of images of gestures size is 32 multiple, is scaled Input picture afterwards, carry out as follows:
4b1:Read in an images of gestures to be identified;
4b2:Images of gestures is scaled at random using bilinear interpolation method, size selection is 32 multiple, is obtained The images of gestures of reading after scaling.
CN201711102008.9A 2017-11-10 2017-11-10 Dynamic gesture recognition method based on computer vision Active CN107808143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711102008.9A CN107808143B (en) 2017-11-10 2017-11-10 Dynamic gesture recognition method based on computer vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711102008.9A CN107808143B (en) 2017-11-10 2017-11-10 Dynamic gesture recognition method based on computer vision

Publications (2)

Publication Number Publication Date
CN107808143A true CN107808143A (en) 2018-03-16
CN107808143B CN107808143B (en) 2021-06-01

Family

ID=61592035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711102008.9A Active CN107808143B (en) 2017-11-10 2017-11-10 Dynamic gesture recognition method based on computer vision

Country Status (1)

Country Link
CN (1) CN107808143B (en)

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117806A (en) * 2018-08-22 2019-01-01 歌尔科技有限公司 A kind of gesture identification method and device
CN109145756A (en) * 2018-07-24 2019-01-04 湖南万为智能机器人技术有限公司 Object detection method based on machine vision and deep learning
CN109165555A (en) * 2018-07-24 2019-01-08 广东数相智能科技有限公司 Man-machine finger-guessing game method, apparatus and storage medium based on image recognition
CN109325454A (en) * 2018-09-28 2019-02-12 合肥工业大学 A kind of static gesture real-time identification method based on YOLOv3
CN109583456A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object detection method based on Fusion Features and dense connection
CN109697407A (en) * 2018-11-13 2019-04-30 北京物灵智能科技有限公司 A kind of image processing method and device
CN109815876A (en) * 2019-01-17 2019-05-28 西安电子科技大学 Gesture identification method based on address events stream feature
CN109934184A (en) * 2019-03-19 2019-06-25 网易(杭州)网络有限公司 Gesture identification method and device, storage medium, processor
CN109948480A (en) * 2019-03-05 2019-06-28 中国电子科技集团公司第二十八研究所 A kind of non-maxima suppression method for arbitrary quadrilateral
CN109948690A (en) * 2019-03-14 2019-06-28 西南交通大学 A kind of high-speed rail scene perception method based on deep learning and structural information
CN110135398A (en) * 2019-05-28 2019-08-16 厦门瑞为信息技术有限公司 Both hands off-direction disk detection method based on computer vision
CN110135408A (en) * 2019-03-26 2019-08-16 北京捷通华声科技股份有限公司 Text image detection method, network and equipment
CN110135237A (en) * 2019-03-24 2019-08-16 北京化工大学 A kind of gesture identification method
CN110163048A (en) * 2018-07-10 2019-08-23 腾讯科技(深圳)有限公司 Identification model training method, recognition methods and the equipment of hand key point
CN110287771A (en) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 Image palm area extracting method and device
CN110348323A (en) * 2019-06-19 2019-10-18 广东工业大学 A kind of wearable device gesture identification method based on Neural Network Optimization
CN110363158A (en) * 2019-07-17 2019-10-22 浙江大学 A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision
WO2019201029A1 (en) * 2018-04-19 2019-10-24 华为技术有限公司 Candidate box update method and apparatus
CN110414402A (en) * 2019-07-22 2019-11-05 北京达佳互联信息技术有限公司 A kind of gesture data mask method, device, electronic equipment and storage medium
CN110458059A (en) * 2019-07-30 2019-11-15 北京科技大学 A kind of gesture identification method based on computer vision and identification device
CN110796018A (en) * 2019-09-30 2020-02-14 武汉科技大学 Hand motion recognition method based on depth image and color image
CN110837766A (en) * 2018-08-17 2020-02-25 北京市商汤科技开发有限公司 Gesture recognition method, gesture processing method and device
CN111050266A (en) * 2019-12-20 2020-04-21 朱凤邹 Method and system for performing function control based on earphone detection action
CN111061367A (en) * 2019-12-05 2020-04-24 神思电子技术股份有限公司 Method for realizing gesture mouse of self-service equipment
CN111104820A (en) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 Gesture recognition method based on deep learning
CN111127457A (en) * 2019-12-25 2020-05-08 上海找钢网信息科技股份有限公司 Reinforcing steel bar number statistical model training method, statistical method, device and equipment
CN111275010A (en) * 2020-02-25 2020-06-12 福建师范大学 Pedestrian re-identification method based on computer vision
CN111310800A (en) * 2020-01-20 2020-06-19 世纪龙信息网络有限责任公司 Image classification model generation method and device, computer equipment and storage medium
CN111382643A (en) * 2018-12-30 2020-07-07 广州市百果园信息技术有限公司 Gesture detection method, device, equipment and storage medium
CN111476084A (en) * 2020-02-25 2020-07-31 福建师范大学 Deep learning-based parking lot dynamic parking space condition identification method
CN111597888A (en) * 2020-04-09 2020-08-28 上海容易网电子商务股份有限公司 Gesture recognition method combining Gaussian mixture model and CNN
CN111639740A (en) * 2020-05-09 2020-09-08 武汉工程大学 Steel bar counting method based on multi-scale convolution neural network
CN111653103A (en) * 2020-05-07 2020-09-11 浙江大华技术股份有限公司 Target object identification method and device
CN111880661A (en) * 2020-07-31 2020-11-03 Oppo广东移动通信有限公司 Gesture recognition method and device
CN112232282A (en) * 2020-11-04 2021-01-15 苏州臻迪智能科技有限公司 Gesture recognition method and device, storage medium and electronic equipment
CN112416128A (en) * 2020-11-23 2021-02-26 森思泰克河北科技有限公司 Gesture recognition method and terminal equipment
CN112464860A (en) * 2020-12-10 2021-03-09 深圳市优必选科技股份有限公司 Gesture recognition method and device, computer equipment and storage medium
CN112487913A (en) * 2020-11-24 2021-03-12 北京市地铁运营有限公司运营四分公司 Labeling method and device based on neural network and electronic equipment
CN113297956A (en) * 2021-05-22 2021-08-24 温州大学 Gesture recognition method and system based on vision
CN113627265A (en) * 2021-07-13 2021-11-09 深圳市创客火科技有限公司 Unmanned aerial vehicle control method and device and computer readable storage medium
CN114035687A (en) * 2021-11-12 2022-02-11 郑州大学 Gesture recognition method and system based on virtual reality
CN114627561A (en) * 2022-05-16 2022-06-14 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and device, readable storage medium and electronic equipment
WO2022120669A1 (en) * 2020-12-10 2022-06-16 深圳市优必选科技股份有限公司 Gesture recognition method, computer device and storage medium
CN113312973B (en) * 2021-04-25 2023-06-02 北京信息科技大学 Gesture recognition key point feature extraction method and system
CN117079493A (en) * 2023-08-17 2023-11-17 深圳市盛世基业物联网有限公司 Intelligent parking management method and system based on Internet of things
US12051236B2 (en) 2018-09-21 2024-07-30 Bigo Technology Pte. Ltd. Method for recognizing video action, and device and storage medium thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160334975A1 (en) * 2015-05-12 2016-11-17 Konica Minolta, Inc. Information processing device, non-transitory computer-readable recording medium storing an information processing program, and information processing method
US20170046568A1 (en) * 2012-04-18 2017-02-16 Arb Labs Inc. Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis
CN106960036A (en) * 2017-03-09 2017-07-18 杭州电子科技大学 A kind of database building method for gesture identification
CN107168527A (en) * 2017-04-25 2017-09-15 华南理工大学 The first visual angle gesture identification and exchange method based on region convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046568A1 (en) * 2012-04-18 2017-02-16 Arb Labs Inc. Systems and methods of identifying a gesture using gesture data compressed by principal joint variable analysis
US20160334975A1 (en) * 2015-05-12 2016-11-17 Konica Minolta, Inc. Information processing device, non-transitory computer-readable recording medium storing an information processing program, and information processing method
CN106960036A (en) * 2017-03-09 2017-07-18 杭州电子科技大学 A kind of database building method for gesture identification
CN107168527A (en) * 2017-04-25 2017-09-15 华南理工大学 The first visual angle gesture identification and exchange method based on region convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAO JIANG ET AL.: "A Dynamic Gesture Recognition Method Based on Computer Vision", 《2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP)》 *
关然 等: "基于计算机视觉的手势检测识别技术", 《计算机应用与软件》 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390344A (en) * 2018-04-19 2019-10-29 华为技术有限公司 Alternative frame update method and device
WO2019201029A1 (en) * 2018-04-19 2019-10-24 华为技术有限公司 Candidate box update method and apparatus
CN110163048A (en) * 2018-07-10 2019-08-23 腾讯科技(深圳)有限公司 Identification model training method, recognition methods and the equipment of hand key point
CN110163048B (en) * 2018-07-10 2023-06-02 腾讯科技(深圳)有限公司 Hand key point recognition model training method, hand key point recognition method and hand key point recognition equipment
CN109145756A (en) * 2018-07-24 2019-01-04 湖南万为智能机器人技术有限公司 Object detection method based on machine vision and deep learning
CN109165555A (en) * 2018-07-24 2019-01-08 广东数相智能科技有限公司 Man-machine finger-guessing game method, apparatus and storage medium based on image recognition
CN110837766B (en) * 2018-08-17 2023-05-05 北京市商汤科技开发有限公司 Gesture recognition method, gesture processing method and device
CN110837766A (en) * 2018-08-17 2020-02-25 北京市商汤科技开发有限公司 Gesture recognition method, gesture processing method and device
CN109117806B (en) * 2018-08-22 2020-11-27 歌尔科技有限公司 Gesture recognition method and device
CN109117806A (en) * 2018-08-22 2019-01-01 歌尔科技有限公司 A kind of gesture identification method and device
US12051236B2 (en) 2018-09-21 2024-07-30 Bigo Technology Pte. Ltd. Method for recognizing video action, and device and storage medium thereof
CN109325454A (en) * 2018-09-28 2019-02-12 合肥工业大学 A kind of static gesture real-time identification method based on YOLOv3
CN109325454B (en) * 2018-09-28 2020-05-22 合肥工业大学 Static gesture real-time recognition method based on YOLOv3
CN111104820A (en) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 Gesture recognition method based on deep learning
CN109697407A (en) * 2018-11-13 2019-04-30 北京物灵智能科技有限公司 A kind of image processing method and device
CN109583456B (en) * 2018-11-20 2023-04-28 西安电子科技大学 Infrared surface target detection method based on feature fusion and dense connection
CN109583456A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object detection method based on Fusion Features and dense connection
CN111382643B (en) * 2018-12-30 2023-04-14 广州市百果园信息技术有限公司 Gesture detection method, device, equipment and storage medium
CN111382643A (en) * 2018-12-30 2020-07-07 广州市百果园信息技术有限公司 Gesture detection method, device, equipment and storage medium
CN109815876A (en) * 2019-01-17 2019-05-28 西安电子科技大学 Gesture identification method based on address events stream feature
CN109948480A (en) * 2019-03-05 2019-06-28 中国电子科技集团公司第二十八研究所 A kind of non-maxima suppression method for arbitrary quadrilateral
CN109948690A (en) * 2019-03-14 2019-06-28 西南交通大学 A kind of high-speed rail scene perception method based on deep learning and structural information
CN109934184A (en) * 2019-03-19 2019-06-25 网易(杭州)网络有限公司 Gesture identification method and device, storage medium, processor
CN110135237B (en) * 2019-03-24 2021-11-26 北京化工大学 Gesture recognition method
CN110135237A (en) * 2019-03-24 2019-08-16 北京化工大学 A kind of gesture identification method
CN110135408B (en) * 2019-03-26 2021-02-19 北京捷通华声科技股份有限公司 Text image detection method, network and equipment
CN110135408A (en) * 2019-03-26 2019-08-16 北京捷通华声科技股份有限公司 Text image detection method, network and equipment
CN110287771A (en) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 Image palm area extracting method and device
CN110135398A (en) * 2019-05-28 2019-08-16 厦门瑞为信息技术有限公司 Both hands off-direction disk detection method based on computer vision
CN110348323B (en) * 2019-06-19 2022-12-16 广东工业大学 Wearable device gesture recognition method based on neural network optimization
CN110348323A (en) * 2019-06-19 2019-10-18 广东工业大学 A kind of wearable device gesture identification method based on Neural Network Optimization
CN110363158B (en) * 2019-07-17 2021-05-25 浙江大学 Millimeter wave radar and visual cooperative target detection and identification method based on neural network
CN110363158A (en) * 2019-07-17 2019-10-22 浙江大学 A kind of millimetre-wave radar neural network based cooperates with object detection and recognition method with vision
CN110414402A (en) * 2019-07-22 2019-11-05 北京达佳互联信息技术有限公司 A kind of gesture data mask method, device, electronic equipment and storage medium
CN110414402B (en) * 2019-07-22 2022-03-25 北京达佳互联信息技术有限公司 Gesture data labeling method and device, electronic equipment and storage medium
CN110458059A (en) * 2019-07-30 2019-11-15 北京科技大学 A kind of gesture identification method based on computer vision and identification device
CN110458059B (en) * 2019-07-30 2022-02-08 北京科技大学 Gesture recognition method and device based on computer vision
CN110796018A (en) * 2019-09-30 2020-02-14 武汉科技大学 Hand motion recognition method based on depth image and color image
CN111061367B (en) * 2019-12-05 2023-04-07 神思电子技术股份有限公司 Method for realizing gesture mouse of self-service equipment
CN111061367A (en) * 2019-12-05 2020-04-24 神思电子技术股份有限公司 Method for realizing gesture mouse of self-service equipment
CN111050266A (en) * 2019-12-20 2020-04-21 朱凤邹 Method and system for performing function control based on earphone detection action
CN111127457A (en) * 2019-12-25 2020-05-08 上海找钢网信息科技股份有限公司 Reinforcing steel bar number statistical model training method, statistical method, device and equipment
CN111310800B (en) * 2020-01-20 2023-10-10 天翼数字生活科技有限公司 Image classification model generation method, device, computer equipment and storage medium
CN111310800A (en) * 2020-01-20 2020-06-19 世纪龙信息网络有限责任公司 Image classification model generation method and device, computer equipment and storage medium
CN111476084A (en) * 2020-02-25 2020-07-31 福建师范大学 Deep learning-based parking lot dynamic parking space condition identification method
CN111275010A (en) * 2020-02-25 2020-06-12 福建师范大学 Pedestrian re-identification method based on computer vision
CN111597888A (en) * 2020-04-09 2020-08-28 上海容易网电子商务股份有限公司 Gesture recognition method combining Gaussian mixture model and CNN
CN111653103A (en) * 2020-05-07 2020-09-11 浙江大华技术股份有限公司 Target object identification method and device
CN111639740A (en) * 2020-05-09 2020-09-08 武汉工程大学 Steel bar counting method based on multi-scale convolution neural network
CN111880661A (en) * 2020-07-31 2020-11-03 Oppo广东移动通信有限公司 Gesture recognition method and device
CN112232282A (en) * 2020-11-04 2021-01-15 苏州臻迪智能科技有限公司 Gesture recognition method and device, storage medium and electronic equipment
CN112416128A (en) * 2020-11-23 2021-02-26 森思泰克河北科技有限公司 Gesture recognition method and terminal equipment
CN112487913A (en) * 2020-11-24 2021-03-12 北京市地铁运营有限公司运营四分公司 Labeling method and device based on neural network and electronic equipment
WO2022120669A1 (en) * 2020-12-10 2022-06-16 深圳市优必选科技股份有限公司 Gesture recognition method, computer device and storage medium
CN112464860A (en) * 2020-12-10 2021-03-09 深圳市优必选科技股份有限公司 Gesture recognition method and device, computer equipment and storage medium
CN113312973B (en) * 2021-04-25 2023-06-02 北京信息科技大学 Gesture recognition key point feature extraction method and system
CN113297956A (en) * 2021-05-22 2021-08-24 温州大学 Gesture recognition method and system based on vision
CN113297956B (en) * 2021-05-22 2023-12-08 温州大学 Gesture recognition method and system based on vision
CN113627265A (en) * 2021-07-13 2021-11-09 深圳市创客火科技有限公司 Unmanned aerial vehicle control method and device and computer readable storage medium
CN114035687A (en) * 2021-11-12 2022-02-11 郑州大学 Gesture recognition method and system based on virtual reality
CN114035687B (en) * 2021-11-12 2023-07-25 郑州大学 Gesture recognition method and system based on virtual reality
CN114627561A (en) * 2022-05-16 2022-06-14 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and device, readable storage medium and electronic equipment
CN117079493A (en) * 2023-08-17 2023-11-17 深圳市盛世基业物联网有限公司 Intelligent parking management method and system based on Internet of things
CN117079493B (en) * 2023-08-17 2024-03-19 深圳市盛世基业物联网有限公司 Intelligent parking management method and system based on Internet of things

Also Published As

Publication number Publication date
CN107808143B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN107808143A (en) Dynamic gesture identification method based on computer vision
CN107168527B (en) The first visual angle gesture identification and exchange method based on region convolutional neural networks
CN109597485B (en) Gesture interaction system based on double-fingered-area features and working method thereof
CN108052946A (en) A kind of high pressure cabinet switch automatic identifying method based on convolutional neural networks
CN107742107A (en) Facial image sorting technique, device and server
CN109902798A (en) The training method and device of deep neural network
CN106650687A (en) Posture correction method based on depth information and skeleton information
CN110059741A (en) Image-recognizing method based on semantic capsule converged network
CN106650630A (en) Target tracking method and electronic equipment
CN107145845A (en) The pedestrian detection method merged based on deep learning and multi-characteristic points
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN105536205A (en) Upper limb training system based on monocular video human body action sensing
CN108416266A (en) A kind of video behavior method for quickly identifying extracting moving target using light stream
CN105975934A (en) Dynamic gesture identification method and system for augmented reality auxiliary maintenance
CN109558902A (en) A kind of fast target detection method
CN109410168A (en) For determining the modeling method of the convolutional neural networks model of the classification of the subgraph block in image
CN107808376A (en) A kind of detection method of raising one&#39;s hand based on deep learning
CN103186775A (en) Human body motion recognition method based on mixed descriptor
CN109684959A (en) The recognition methods of video gesture based on Face Detection and deep learning and device
CN106600595A (en) Human body characteristic dimension automatic measuring method based on artificial intelligence algorithm
CN105069745A (en) face-changing system based on common image sensor and enhanced augmented reality technology and method
CN111178170B (en) Gesture recognition method and electronic equipment
CN109740454A (en) A kind of human body posture recognition methods based on YOLO-V3
CN105912126A (en) Method for adaptively adjusting gain, mapped to interface, of gesture movement
CN109614990A (en) A kind of object detecting device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant