CN108629288A - A kind of gesture identification model training method, gesture identification method and system - Google Patents
A kind of gesture identification model training method, gesture identification method and system Download PDFInfo
- Publication number
- CN108629288A CN108629288A CN201810314455.9A CN201810314455A CN108629288A CN 108629288 A CN108629288 A CN 108629288A CN 201810314455 A CN201810314455 A CN 201810314455A CN 108629288 A CN108629288 A CN 108629288A
- Authority
- CN
- China
- Prior art keywords
- gesture
- light
- yolo
- networks
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 21
- 238000013138 pruning Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 230000005284 excitation Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses a kind of gesture identification model training method, gesture identification method and systems, wherein training method includes the gesture picture sample acquired under several scenes, random cropping is carried out in gesture sample, new gesture sample is obtained, using gesture picture sample and new gesture sample as sample set;Light YOLO networks are built, Light YOLO networks are trained using sample set, obtain the first Light YOLO networks;Add dropout layers of a selective behind each layer of convolutional layer of the first Light YOLO networks, obtain the 2nd Light YOLO networks, it is trained convergence using the 2nd Light YOLO networks of sample set pair, is then cut into row of channels, and then obtain gesture identification model.The present invention improves the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method accuracy rate of the present invention is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can be optimized end to end.
Description
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of gesture identification model training method, hand
Gesture recognition methods and system.
Background technology
For gesture as one of most natural body language, interactive process can be made more in field of human-computer interaction by being applied
Naturally, being wherein the emphasis of current field of human-computer interaction research to the identification of human hand.Domestic and foreign scholars are to the hand of view-based access control model
Gesture identification technology expands many researchs.
Traditional gesture recognition system generally first carries out Hand Gesture Segmentation and obtains gesture area, then extracts gesture feature, most
Classified afterwards using gesture feature.Traditional method needs artificial design features, such as color characteristic, HOG features etc., this
The generalization ability of a little features is poor, needs to design different features for different tasks.Resist since artificial neural network has
The features such as interference, self-organizing, strong self study and noise resisting ability, is more and more used on gesture classification.With base
In the development of the target detection network of neural network, the Gesture Recognition based on target detection network starts to develop.However base
Accuracy rate is risen than traditional machine learning method in the Gesture Recognition of neural network, but there is also network calculations amounts
Greatly, the shortcomings of model is complicated, real-time is not strong.
It can be seen that there are the technologies that accuracy rate is low and real-time is poor to ask for existing gesture identification method under complex scene
Topic.
Invention content
For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of gesture identification model training sides
Method, gesture identification method and system, thus solve that there are accuracys rate is low and real for existing gesture identification method under complex scene
The technical problem of when property difference.
To achieve the above object, according to one aspect of the present invention, a kind of gesture identification model training method is provided, is wrapped
It includes:
(1) the gesture picture sample under several scenes is acquired, the hand gesture location and gesture class in gesture picture sample are marked
Not, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new gesture sample
As sample set;
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed,
The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number
The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light
YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal
Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made
For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks
Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair
YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain
To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification
Model.
Further, step (1) includes:
The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample are marked,
Gesture database is obtained, the gesture picture sample in gesture database is divided into training set and test set;Hand in training set
Carry out random cropping in gesture picture sample, obtain new gesture sample, by training set gesture picture sample and new gesture
Sample is as sample set.
Further, step (2) includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and YOLOv2 targets are examined
The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in survey grid network, then recycles a port number and the 15th layer
The characteristic pattern that the identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, and will be down-sampled after feature
The characteristic pattern for scheming to export with the 15th layer of convolutional layer is attached, and thus obtains Light YOLO networks;
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts
Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks
Light YOLO networks are trained, and obtain initial Light YOLO networks;
(2-3) tests initial Light YOLO networks using test set, and initial Light YOLO networks are exported
Have the candidate frame of maximum confidence as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6,
Think that identification is correct, otherwise it is assumed that identification mistake obtains the first Light when recognition correct rate is more than or equal to recognition threshold
YOLO networks and its network parameter.
Further, step (3) includes:
Sample set is inputted the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks by (3-1)
Characteristic pattern is exported, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light
YOLO network backpropagations obtain derivative of the object function to characteristic pattern, by the corresponding derivative phase of the excitation value of characteristic pattern
Multiply to get to the Taylor expansion value of all characteristic patterns, Taylor expansion is selected to be worth A minimum characteristic pattern as feature to be cut
Figure;
(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks,
It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO
Network;
(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences,
The corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks to cut, and is then moved
Except selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light after cutting
YOLO networks are trained to restore network performance;
(3-4) is less than B times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained
In 3rd Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO to completing to cut
Network is trained to restorability, obtains gesture identification model.
It is another aspect of this invention to provide that providing a kind of gesture identification model, the gesture identification model is by the present invention
A kind of gesture identification model training method train to obtain.
It is another aspect of this invention to provide that a kind of gesture identification method is provided, including:
Using a kind of gesture identification model that gesture identification model training method is trained of the present invention to be identified
Image carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.
It is another aspect of this invention to provide that a kind of gesture recognition system is provided, including:
Sample collection module marks the hand in gesture picture sample for acquiring the gesture picture sample under several scenes
Gesture position and gesture classification, then carry out random cropping in gesture sample, obtain new gesture sample, by gesture picture sample
With new gesture sample as sample set;
Network training module, for being based on YOLOv2 target detection networks, remove its last one maximum pond layer and
The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original one by the 6th group of convolutional layer group
Half, the output characteristic pattern of the 8th layer of convolutional layer is subjected to drop using port number convolutional layer identical with the 15th layer of convolutional layer and is adopted
Sample encodes, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light
YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
Network pruning module, for sample set to be inputted the first Light YOLO networks, the first Light YOLO networks
Each layer of convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, selects importance most
A low characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks
Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair
Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks
Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into
And obtain gesture identification model;
Gesture recognition module obtains waiting knowing for carrying out gesture identification to image to be identified using gesture identification model
Hand gesture location in other image and gesture classification.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show
Beneficial effect:
(1) present invention is when building Light YOLO networks, in order to increase the semantic letter of gesture on network top characteristic pattern
Breath removes its last one maximum pond layer and the 6th group of convolutional layer group to reduce network step-length, and by the 14th in network,
15, the port number of 17 layers of convolutional layer is kept to original half to prevent over-fitting.In addition, in the way of high low layer Fusion Features
Build the more rich top-level feature figure of semantic information.It can retain more spatial informations using convolutional layer is down-sampled, and can encode
At specified port number.When optimizing Light YOLO networks, in order to which each iteration can cut more characteristic pattern without influencing
Network performance all adds a selective-dropout layers of progress network instruction behind each layer of convolutional layer of Light YOLO
Practice, this layer, which is only treated, cuts characteristic pattern execution dropout operations, to reduce the dependence that network handles cut characteristic pattern;By
In these improvement of the present invention, solving under complex scene existing gesture identification method, there are accuracy rate is low and real-time is poor
The technical issues of.Improve the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method of the present invention is accurate
True rate is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can carry out end-to-end
Optimization.
(2) the gesture identification model that present invention training obtains is cropped to 4MB from 55MB, and forward direction infers speed from 28FPS
Accelerate to 125FPS.Absolutely prove that real-time gesture identification, and network may be implemented in the gesture identification model that present invention training obtains
Model is compressed into 4MB, and computation amount is conveniently transplanted on embedded platform.
Description of the drawings
A kind of Fig. 1 flow charts of gesture identification model training method provided in an embodiment of the present invention;
The structure chart of Fig. 2 Light YOLO provided in an embodiment of the present invention;
Fig. 3 selective-dropout network prunings algorithm flow charts provided in an embodiment of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
It does not constitute a conflict with each other and can be combined with each other.
As shown in Figure 1, a kind of gesture identification model training method, including:
(1) the gesture picture sample under several scenes is acquired, includes mainly simple background, complex background, colour of skin background, people
Hand channel crosses face, has scenes, the pickers such as other non-predetermined adopted gestures to be about 2 to 3 meters with a distance from camera.Mark gesture figure
Hand gesture location in piece sample and gesture classification obtain gesture database, and database includes 5738 samples, by gesture database
In gesture picture sample be divided into training set and test set by 1: 1;It is carried out in gesture picture sample in training set random
It cuts, obtains new gesture sample, until reaching 350 per a kind of training samples number, by the gesture picture in training set
Sample and new gesture sample are as sample set.
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed,
The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number
The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light
YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;It specifically includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, by the 14th, 15 layer in network
The port number of convolutional layer is kept to 512 dimensions, and the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is subtracted
For original half, then recycle a convolution kernel size identical as the 15th layer of convolutional layer for 1 × 1, step-length 2, port number
The convolutional layer characteristic pattern that exports the 8th layer of convolutional layer carry out down-sampled coding, and will be down-sampled after characteristic pattern and the 15th layer
The characteristic pattern of convolutional layer output is attached, and thus obtains Light YOLO networks.As shown in Fig. 2, Light YOLO networks are total
It is made of altogether the maximum pond layers of 18 convolutional layers and 4, and all connects that there are one the layers and one that batch standardizes behind preceding 17 convolutional layers
A Leaky Relu layers.Specific each layer of parameter is described as follows:
Conv1:Convolution kernel size is 3 × 3, and input channel number is 3, and output channel number is 32.
Maxpooling1:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv2:Convolution kernel size is 3 × 3, and input channel number is 32, and output channel number is 64.
Maxpooling2:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv3:Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.
Conv4:Convolution kernel size is 1 × 1, and input channel number is 128, and output channel number is 64.
Conv5:Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.
Maxpooling3:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv6:Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.
Conv7:Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 128.
Conv8:Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.
Maxpooling4:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv9:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv10:Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.
Conv11:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv12:Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.
Conv13:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv14:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.
Conv15:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.
Conv16:Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 512, step-length 2.
Conv17:Convolution kernel size is 3 × 3, and input channel number is 1024, and output channel number is 512.
Conv18:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 75.
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts
Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks
Light YOLO networks are trained, and obtain initial Light YOLO networks;Light YOLO networks are trained, target
Function is made of error of coordinate, confidence level error and error in classification three parts:
Wherein, λobj, λnoobjIt is that target candidate frame confidence level error and non-targeted candidate frame confidence level error are respectively
Number, xi, yi, wi, hi, CiIt is top left co-ordinate, width, height and the confidence level of candidate frame,It is indicia framing
Top left co-ordinate, width, height and confidence level, pi(c) refer to the probability that neural network forecast this candidate frame is classification c,Refer to
Candidate frame is the true probability of classification c.Indicate that candidate frame i includes target,Indicate that candidate frame i does not include target.
The decaying rule of learning rate is:It is reduced to 10 first 100 times-5Warm up training is carried out, then returns to 10-4, respectively at 20 weeks
It is reduced to 5e when phase and 150 period-5With 10-5。
(2-3) tests initial Light YOLO networks using test set, and initial Light YOLO networks are exported
Have the candidate frame of maximum confidence as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6,
Think that identification is correct, otherwise it is assumed that identification mistake obtains the first Light when recognition correct rate is more than or equal to recognition threshold
YOLO networks and its network parameter.
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal
Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made
For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks
Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair
YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain
To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification
Model.As shown in figure 3, specifically including:
Sample set is inputted the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks by (3-1)
Characteristic pattern is exported, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light
YOLO network backpropagations obtain derivative of the object function to characteristic pattern, by the corresponding derivative phase of the excitation value of characteristic pattern
Multiply to get to the Taylor expansion value of all characteristic patterns, Taylor expansion is selected to be worth A minimum characteristic pattern as feature to be cut
Figure;
Beta pruning process is regarded as an optimization process by us, and the purpose of optimization is the network found after optimal beta pruning
Parameter so that the change of loss function is minimum before and after beta pruning:|ΔL(hi) |=| L (D | W ')-L (D | W) |.Wherein, D is sample
Collection, W, W ' are respectively the parameter before and after LightYOLO network beta prunings.It is considered that the parameter of convolution kernel is calculated with by parameter
Characteristic pattern be equivalently to depend on loss function, in order to indicate convenient, we are indicated as follows:L (D, fi)=L (D |
wi).Then to any one characteristic pattern fiBeta pruning is carried out, brings the variation of loss function that can be expressed as:
|ΔL(fi) |=| L (D, fi=0)-L (D, fi)|
Wherein L (D, fi=0) characteristic pattern f is representediLoss function value after cropped, can be regarded as L (D, fi) in fi=0
The Taylor expansion at place.We are unfolded above formula using first order Taylor expansion formula, since higher order term can bring a large amount of meter
It calculates, so being unfolded only with first order Taylor, and neglects single order remainder, finally obtain formula:
Wherein,It is object function to the derivative of characteristic pattern.
(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks,
It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO
Network;For convolutional layer l, primitive character map number is K, and effective feature map number is C to selective-dropout later,
In order to ensure that it is constant that the numerical value of the input of next layer of neuron ensures, need, by characteristic pattern divided by C/K, to be shown below:
Wherein,Represent the excitation value of k-th of characteristic pattern of l layers of convolutional layer.
(3-3) makes the 2nd Light YOLO networks receive 10 times using the 2nd Light YOLO network trainings of sample set pair
It holds back, learning rate 10-5, the corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks
It is cut, then removes selective-dropout layers, obtain the 3rd Light YOLO networks, using sample set to cutting
The 3rd Light YOLO networks afterwards are trained 10 times to restore network performance, learning rate 10-5。
(3-4) is less than 20 times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained
In 3rd Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO to completing to cut
Network carries out 20 training to restorability, obtains gesture identification model.
The gesture identification model that the present invention trains is cropped to 4MB from 55MB, and forward direction infers that speed accelerates from 28FPS
To 125FPS.
A kind of gesture recognition system, including:
Sample collection module marks the hand in gesture picture sample for acquiring the gesture picture sample under several scenes
Gesture position and gesture classification, then carry out random cropping in gesture sample, obtain new gesture sample, by gesture picture sample
With new gesture sample as sample set;
Network training module, for being based on YOLOv2 target detection networks, remove its last one maximum pond layer and
The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original one by the 6th group of convolutional layer group
Half, the output characteristic pattern of the 8th layer of convolutional layer is subjected to drop using port number convolutional layer identical with the 15th layer of convolutional layer and is adopted
Sample encodes, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light
YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
Network pruning module, for sample set to be inputted the first Light YOLO networks, the first Light YOLO networks
Each layer of convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, selects importance most
A low characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks
Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair
Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks
Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into
And obtain gesture identification model;
Gesture recognition module obtains waiting knowing for carrying out gesture identification to image to be identified using gesture identification model
Hand gesture location in other image and gesture classification.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include
Within protection scope of the present invention.
Claims (7)
1. a kind of gesture identification model training method, which is characterized in that including:
(1) the gesture picture sample under several scenes is acquired, marks hand gesture location and gesture classification in gesture picture sample, so
Random cropping is carried out in gesture sample afterwards, obtains new gesture sample, using gesture picture sample and new gesture sample as
Sample set;
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, it will
The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in YOLOv2 target detection networks, using a port number with
The characteristic pattern that the 15th layer of identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light
YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks exports
Characteristic pattern carries out importance assessment using first order Taylor expansion to characteristic pattern, and A characteristic pattern for selecting importance minimum is as waiting for
Characteristic pattern is cut, one selective-dropout layers are added behind each layer of convolutional layer of the first Light YOLO networks, is obtained
To the 2nd Light YOLO networks, it is trained to the 2nd Light YOLO nets using the 2nd Light YOLO networks of sample set pair
Network is restrained, and is cut to the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, is obtained third
Light YOLO networks are trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification model.
2. a kind of gesture identification model training method as described in claim 1, which is characterized in that the step (1) includes:
The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample is marked, obtains
Gesture picture sample in gesture database is divided into training set and test set by gesture database;Gesture figure in training set
Carry out random cropping on piece sample, obtain new gesture sample, by training set gesture picture sample and new gesture sample
As sample set.
3. a kind of gesture identification model training method as claimed in claim 2, which is characterized in that the step (2) includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and by YOLOv2 target detection nets
The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in network, then recycles a port number and the 15th layer of convolution
The characteristic pattern that 8th layer of convolutional layer export by the identical convolutional layer of layer carries out down-sampled coding, and will be down-sampled after characteristic pattern and
The characteristic pattern of 15th layer of convolutional layer output is attached, and thus obtains Light YOLO networks;
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameters as Light
Sample set is inputted Light YOLO networks, using stochastic gradient descent method to Light by the initial network parameter of YOLO networks
YOLO networks are trained, and obtain initial Light YOLO networks;
(2-3) tests initial Light YOLO networks using test set, by having for initial Light YOLO networks output
The candidate frame of maximum confidence is as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, then it is assumed that
Identification is correct, otherwise it is assumed that identification mistake obtains the first Light YOLO nets when recognition correct rate is more than or equal to recognition threshold
Network and its network parameter.
4. a kind of gesture identification model training method as claimed in claim 1 or 2, which is characterized in that step (3) packet
It includes:
Sample set is inputted the first Light YOLO networks by (3-1), and each layer of convolutional layer of the first Light YOLO networks is defeated
Go out characteristic pattern, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO
Network backpropagation obtains derivative of the object function to characteristic pattern, the corresponding derivative of the excitation value of characteristic pattern is multiplied, i.e.,
The Taylor expansion value of all characteristic patterns is obtained, Taylor expansion is selected to be worth A minimum characteristic pattern as characteristic pattern to be cut;
(3-2) is described behind each layer of convolutional layer of the first Light YOLO networks plus one selective-dropout layers
It only treats for selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO nets
Network;
(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences, to receiving
Convolutional layer generates the corresponding convolution kernel of characteristic pattern to be cut and is cut in the 2nd Light YOLO networks held back, and then removes
Selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light YOLO after cutting
Network is trained to restore network performance;
(3-4) is less than B times if cutting number, the third after the recovery network performance that sample set input step (3-3) is obtained
In Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO networks to completing to cut
It is trained to restorability, obtains gesture identification model.
5. a kind of gesture identification model, which is characterized in that the gesture identification model is by any described one kind of claim 1-4
Gesture identification model training method trains to obtain.
6. a kind of gesture identification method, which is characterized in that including:
Utilize a kind of any gesture identification models pair that gesture identification model training method is trained of claim 1-4
Image to be identified carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.
7. a kind of gesture recognition system, which is characterized in that including:
Sample collection module marks the gesture position in gesture picture sample for acquiring the gesture picture sample under several scenes
Set with gesture classification, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new
Gesture sample as sample set;
Network training module removes its last one maximum pond layer and the 6th for being based on YOLOv2 target detection networks
Group convolutional layer group, original half is kept to by the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks, profit
The output characteristic pattern of the 8th layer of convolutional layer is subjected to down-sampled volume with port number convolutional layer identical with the 15th layer of convolutional layer
Code, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO
Network is trained Light YOLO networks using sample set, obtains the first Light YOLO networks;
Network pruning module, for by sample set input the first Light YOLO networks, the first Light YOLO networks it is each
Layer convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, the A for selecting importance minimum
A characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks
Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair
Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks
Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into
And obtain gesture identification model;
Gesture recognition module obtains to be identified for carrying out gesture identification to image to be identified using gesture identification model
Hand gesture location in image and gesture classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810314455.9A CN108629288B (en) | 2018-04-09 | 2018-04-09 | Gesture recognition model training method, gesture recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810314455.9A CN108629288B (en) | 2018-04-09 | 2018-04-09 | Gesture recognition model training method, gesture recognition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108629288A true CN108629288A (en) | 2018-10-09 |
CN108629288B CN108629288B (en) | 2020-05-19 |
Family
ID=63705035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810314455.9A Expired - Fee Related CN108629288B (en) | 2018-04-09 | 2018-04-09 | Gesture recognition model training method, gesture recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108629288B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447034A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Traffic mark detection method in automatic Pilot based on YOLOv3 network |
CN109828578A (en) * | 2019-02-22 | 2019-05-31 | 南京天创电子技术有限公司 | A kind of instrument crusing robot optimal route planing method based on YOLOv3 |
CN109885677A (en) * | 2018-12-26 | 2019-06-14 | 中译语通科技股份有限公司 | A kind of multi-faceted big data acquisition clearing system and method |
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN110033453A (en) * | 2019-04-18 | 2019-07-19 | 国网山西省电力公司电力科学研究院 | Based on the power transmission and transformation line insulator Aerial Images fault detection method for improving YOLOv3 |
CN110032925A (en) * | 2019-02-22 | 2019-07-19 | 广西师范大学 | A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
CN110135398A (en) * | 2019-05-28 | 2019-08-16 | 厦门瑞为信息技术有限公司 | Both hands off-direction disk detection method based on computer vision |
CN111046796A (en) * | 2019-12-12 | 2020-04-21 | 哈尔滨拓博科技有限公司 | Low-cost space gesture control method and system based on double-camera depth information |
CN113167495A (en) * | 2018-12-12 | 2021-07-23 | 三菱电机株式会社 | Air conditioner control device and air conditioner control method |
CN113191243A (en) * | 2021-04-25 | 2021-07-30 | 华中科技大学 | Human hand three-dimensional attitude estimation model establishment method based on camera distance and application thereof |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930514A (en) * | 2012-09-27 | 2013-02-13 | 西安电子科技大学 | Rapid image defogging method based on atmospheric physical scattering model |
US9286524B1 (en) * | 2015-04-15 | 2016-03-15 | Toyota Motor Engineering & Manufacturing North America, Inc. | Multi-task deep convolutional neural networks for efficient and robust traffic lane detection |
CN106355248A (en) * | 2016-08-26 | 2017-01-25 | 深圳先进技术研究院 | Deep convolution neural network training method and device |
CN106529578A (en) * | 2016-10-20 | 2017-03-22 | 中山大学 | Vehicle brand model fine identification method and system based on depth learning |
CN106779068A (en) * | 2016-12-05 | 2017-05-31 | 北京深鉴智能科技有限公司 | The method and apparatus for adjusting artificial neural network |
CN107368885A (en) * | 2017-07-13 | 2017-11-21 | 北京智芯原动科技有限公司 | Network model compression method and device based on more granularity beta prunings |
CN107463965A (en) * | 2017-08-16 | 2017-12-12 | 湖州易有科技有限公司 | Fabric attribute picture collection and recognition methods and identifying system based on deep learning |
CN107590449A (en) * | 2017-08-31 | 2018-01-16 | 电子科技大学 | A kind of gesture detecting method based on weighted feature spectrum fusion |
CN107590432A (en) * | 2017-07-27 | 2018-01-16 | 北京联合大学 | A kind of gesture identification method based on circulating three-dimensional convolutional neural networks |
CN107688850A (en) * | 2017-08-08 | 2018-02-13 | 北京深鉴科技有限公司 | A kind of deep neural network compression method |
CN107729854A (en) * | 2017-10-25 | 2018-02-23 | 南京阿凡达机器人科技有限公司 | A kind of gesture identification method of robot, system and robot |
-
2018
- 2018-04-09 CN CN201810314455.9A patent/CN108629288B/en not_active Expired - Fee Related
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930514A (en) * | 2012-09-27 | 2013-02-13 | 西安电子科技大学 | Rapid image defogging method based on atmospheric physical scattering model |
US9286524B1 (en) * | 2015-04-15 | 2016-03-15 | Toyota Motor Engineering & Manufacturing North America, Inc. | Multi-task deep convolutional neural networks for efficient and robust traffic lane detection |
CN106355248A (en) * | 2016-08-26 | 2017-01-25 | 深圳先进技术研究院 | Deep convolution neural network training method and device |
CN106529578A (en) * | 2016-10-20 | 2017-03-22 | 中山大学 | Vehicle brand model fine identification method and system based on depth learning |
CN106779068A (en) * | 2016-12-05 | 2017-05-31 | 北京深鉴智能科技有限公司 | The method and apparatus for adjusting artificial neural network |
CN107368885A (en) * | 2017-07-13 | 2017-11-21 | 北京智芯原动科技有限公司 | Network model compression method and device based on more granularity beta prunings |
CN107590432A (en) * | 2017-07-27 | 2018-01-16 | 北京联合大学 | A kind of gesture identification method based on circulating three-dimensional convolutional neural networks |
CN107688850A (en) * | 2017-08-08 | 2018-02-13 | 北京深鉴科技有限公司 | A kind of deep neural network compression method |
CN107463965A (en) * | 2017-08-16 | 2017-12-12 | 湖州易有科技有限公司 | Fabric attribute picture collection and recognition methods and identifying system based on deep learning |
CN107590449A (en) * | 2017-08-31 | 2018-01-16 | 电子科技大学 | A kind of gesture detecting method based on weighted feature spectrum fusion |
CN107729854A (en) * | 2017-10-25 | 2018-02-23 | 南京阿凡达机器人科技有限公司 | A kind of gesture identification method of robot, system and robot |
Non-Patent Citations (2)
Title |
---|
CHEN L C等: "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", 《COMPUTER SCIENCE》 * |
杨红玲等: "基于卷积神经网络的手势识别", 《计算机技术与发展》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447034A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Traffic mark detection method in automatic Pilot based on YOLOv3 network |
CN113167495A (en) * | 2018-12-12 | 2021-07-23 | 三菱电机株式会社 | Air conditioner control device and air conditioner control method |
CN109885677A (en) * | 2018-12-26 | 2019-06-14 | 中译语通科技股份有限公司 | A kind of multi-faceted big data acquisition clearing system and method |
CN109828578B (en) * | 2019-02-22 | 2020-06-16 | 南京天创电子技术有限公司 | Instrument inspection robot optimal route planning method based on YOLOv3 |
CN109828578A (en) * | 2019-02-22 | 2019-05-31 | 南京天创电子技术有限公司 | A kind of instrument crusing robot optimal route planing method based on YOLOv3 |
CN110032925A (en) * | 2019-02-22 | 2019-07-19 | 广西师范大学 | A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm |
CN109978069A (en) * | 2019-04-02 | 2019-07-05 | 南京大学 | The method for reducing ResNeXt model over-fitting in picture classification |
CN109978069B (en) * | 2019-04-02 | 2020-10-09 | 南京大学 | Method for reducing overfitting phenomenon of ResNeXt model in image classification |
CN110096968A (en) * | 2019-04-10 | 2019-08-06 | 西安电子科技大学 | A kind of ultrahigh speed static gesture identification method based on depth model optimization |
CN110096968B (en) * | 2019-04-10 | 2023-02-07 | 西安电子科技大学 | Ultra-high-speed static gesture recognition method based on depth model optimization |
CN110033453A (en) * | 2019-04-18 | 2019-07-19 | 国网山西省电力公司电力科学研究院 | Based on the power transmission and transformation line insulator Aerial Images fault detection method for improving YOLOv3 |
CN110033453B (en) * | 2019-04-18 | 2023-02-24 | 国网山西省电力公司电力科学研究院 | Power transmission and transformation line insulator aerial image fault detection method based on improved YOLOv3 |
CN110135398A (en) * | 2019-05-28 | 2019-08-16 | 厦门瑞为信息技术有限公司 | Both hands off-direction disk detection method based on computer vision |
CN111046796A (en) * | 2019-12-12 | 2020-04-21 | 哈尔滨拓博科技有限公司 | Low-cost space gesture control method and system based on double-camera depth information |
CN113191243A (en) * | 2021-04-25 | 2021-07-30 | 华中科技大学 | Human hand three-dimensional attitude estimation model establishment method based on camera distance and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN108629288B (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629288A (en) | A kind of gesture identification model training method, gesture identification method and system | |
CN110781838B (en) | Multi-mode track prediction method for pedestrians in complex scene | |
CN109145939B (en) | Semantic segmentation method for small-target sensitive dual-channel convolutional neural network | |
CN109902677A (en) | A kind of vehicle checking method based on deep learning | |
CN108549893A (en) | A kind of end-to-end recognition methods of the scene text of arbitrary shape | |
CN110263833A (en) | Based on coding-decoding structure image, semantic dividing method | |
CN107818302A (en) | Non-rigid multiple dimensioned object detecting method based on convolutional neural networks | |
CN110532859A (en) | Remote Sensing Target detection method based on depth evolution beta pruning convolution net | |
CN106127204A (en) | A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks | |
CN110188720A (en) | A kind of object detection method and system based on convolutional neural networks | |
CN108564097A (en) | A kind of multiscale target detection method based on depth convolutional neural networks | |
CN106372597B (en) | CNN Vehicle Detection method based on adaptive contextual information | |
CN107229904A (en) | A kind of object detection and recognition method based on deep learning | |
CN107529650A (en) | The structure and closed loop detection method of network model, related device and computer equipment | |
CN107423398A (en) | Exchange method, device, storage medium and computer equipment | |
CN110472542A (en) | A kind of infrared image pedestrian detection method and detection system based on deep learning | |
CN114842208A (en) | Power grid harmful bird species target detection method based on deep learning | |
CN109145836A (en) | Ship target video detection method based on deep learning network and Kalman filtering | |
CN109948707A (en) | Model training method, device, terminal and storage medium | |
CN114360005B (en) | Micro-expression classification method based on AU region and multi-level transducer fusion module | |
CN110598586A (en) | Target detection method and system | |
CN110210462A (en) | A kind of bionical hippocampus cognitive map construction method based on convolutional neural networks | |
CN110163069A (en) | Method for detecting lane lines for assisting driving | |
Yang et al. | Fruit target detection based on BCo-YOLOv5 model | |
CN109753853A (en) | One kind being completed at the same time pedestrian detection and pedestrian knows method for distinguishing again |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200519 |