CN108629288A - A kind of gesture identification model training method, gesture identification method and system - Google Patents

A kind of gesture identification model training method, gesture identification method and system Download PDF

Info

Publication number
CN108629288A
CN108629288A CN201810314455.9A CN201810314455A CN108629288A CN 108629288 A CN108629288 A CN 108629288A CN 201810314455 A CN201810314455 A CN 201810314455A CN 108629288 A CN108629288 A CN 108629288A
Authority
CN
China
Prior art keywords
gesture
light
yolo
networks
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810314455.9A
Other languages
Chinese (zh)
Other versions
CN108629288B (en
Inventor
桑农
倪子涵
陈佳
高常鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810314455.9A priority Critical patent/CN108629288B/en
Publication of CN108629288A publication Critical patent/CN108629288A/en
Application granted granted Critical
Publication of CN108629288B publication Critical patent/CN108629288B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a kind of gesture identification model training method, gesture identification method and systems, wherein training method includes the gesture picture sample acquired under several scenes, random cropping is carried out in gesture sample, new gesture sample is obtained, using gesture picture sample and new gesture sample as sample set;Light YOLO networks are built, Light YOLO networks are trained using sample set, obtain the first Light YOLO networks;Add dropout layers of a selective behind each layer of convolutional layer of the first Light YOLO networks, obtain the 2nd Light YOLO networks, it is trained convergence using the 2nd Light YOLO networks of sample set pair, is then cut into row of channels, and then obtain gesture identification model.The present invention improves the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method accuracy rate of the present invention is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can be optimized end to end.

Description

A kind of gesture identification model training method, gesture identification method and system
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of gesture identification model training method, hand Gesture recognition methods and system.
Background technology
For gesture as one of most natural body language, interactive process can be made more in field of human-computer interaction by being applied Naturally, being wherein the emphasis of current field of human-computer interaction research to the identification of human hand.Domestic and foreign scholars are to the hand of view-based access control model Gesture identification technology expands many researchs.
Traditional gesture recognition system generally first carries out Hand Gesture Segmentation and obtains gesture area, then extracts gesture feature, most Classified afterwards using gesture feature.Traditional method needs artificial design features, such as color characteristic, HOG features etc., this The generalization ability of a little features is poor, needs to design different features for different tasks.Resist since artificial neural network has The features such as interference, self-organizing, strong self study and noise resisting ability, is more and more used on gesture classification.With base In the development of the target detection network of neural network, the Gesture Recognition based on target detection network starts to develop.However base Accuracy rate is risen than traditional machine learning method in the Gesture Recognition of neural network, but there is also network calculations amounts Greatly, the shortcomings of model is complicated, real-time is not strong.
It can be seen that there are the technologies that accuracy rate is low and real-time is poor to ask for existing gesture identification method under complex scene Topic.
Invention content
For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of gesture identification model training sides Method, gesture identification method and system, thus solve that there are accuracys rate is low and real for existing gesture identification method under complex scene The technical problem of when property difference.
To achieve the above object, according to one aspect of the present invention, a kind of gesture identification model training method is provided, is wrapped It includes:
(1) the gesture picture sample under several scenes is acquired, the hand gesture location and gesture class in gesture picture sample are marked Not, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new gesture sample As sample set;
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification Model.
Further, step (1) includes:
The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample are marked, Gesture database is obtained, the gesture picture sample in gesture database is divided into training set and test set;Hand in training set Carry out random cropping in gesture picture sample, obtain new gesture sample, by training set gesture picture sample and new gesture Sample is as sample set.
Further, step (2) includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and YOLOv2 targets are examined The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in survey grid network, then recycles a port number and the 15th layer The characteristic pattern that the identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, and will be down-sampled after feature The characteristic pattern for scheming to export with the 15th layer of convolutional layer is attached, and thus obtains Light YOLO networks;
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks Light YOLO networks are trained, and obtain initial Light YOLO networks;
(2-3) tests initial Light YOLO networks using test set, and initial Light YOLO networks are exported Have the candidate frame of maximum confidence as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, Think that identification is correct, otherwise it is assumed that identification mistake obtains the first Light when recognition correct rate is more than or equal to recognition threshold YOLO networks and its network parameter.
Further, step (3) includes:
Sample set is inputted the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks by (3-1) Characteristic pattern is exported, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO network backpropagations obtain derivative of the object function to characteristic pattern, by the corresponding derivative phase of the excitation value of characteristic pattern Multiply to get to the Taylor expansion value of all characteristic patterns, Taylor expansion is selected to be worth A minimum characteristic pattern as feature to be cut Figure;
(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks, It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO Network;
(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences, The corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks to cut, and is then moved Except selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light after cutting YOLO networks are trained to restore network performance;
(3-4) is less than B times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained In 3rd Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO to completing to cut Network is trained to restorability, obtains gesture identification model.
It is another aspect of this invention to provide that providing a kind of gesture identification model, the gesture identification model is by the present invention A kind of gesture identification model training method train to obtain.
It is another aspect of this invention to provide that a kind of gesture identification method is provided, including:
Using a kind of gesture identification model that gesture identification model training method is trained of the present invention to be identified Image carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.
It is another aspect of this invention to provide that a kind of gesture recognition system is provided, including:
Sample collection module marks the hand in gesture picture sample for acquiring the gesture picture sample under several scenes Gesture position and gesture classification, then carry out random cropping in gesture sample, obtain new gesture sample, by gesture picture sample With new gesture sample as sample set;
Network training module, for being based on YOLOv2 target detection networks, remove its last one maximum pond layer and The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original one by the 6th group of convolutional layer group Half, the output characteristic pattern of the 8th layer of convolutional layer is subjected to drop using port number convolutional layer identical with the 15th layer of convolutional layer and is adopted Sample encodes, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
Network pruning module, for sample set to be inputted the first Light YOLO networks, the first Light YOLO networks Each layer of convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, selects importance most A low characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into And obtain gesture identification model;
Gesture recognition module obtains waiting knowing for carrying out gesture identification to image to be identified using gesture identification model Hand gesture location in other image and gesture classification.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect:
(1) present invention is when building Light YOLO networks, in order to increase the semantic letter of gesture on network top characteristic pattern Breath removes its last one maximum pond layer and the 6th group of convolutional layer group to reduce network step-length, and by the 14th in network, 15, the port number of 17 layers of convolutional layer is kept to original half to prevent over-fitting.In addition, in the way of high low layer Fusion Features Build the more rich top-level feature figure of semantic information.It can retain more spatial informations using convolutional layer is down-sampled, and can encode At specified port number.When optimizing Light YOLO networks, in order to which each iteration can cut more characteristic pattern without influencing Network performance all adds a selective-dropout layers of progress network instruction behind each layer of convolutional layer of Light YOLO Practice, this layer, which is only treated, cuts characteristic pattern execution dropout operations, to reduce the dependence that network handles cut characteristic pattern;By In these improvement of the present invention, solving under complex scene existing gesture identification method, there are accuracy rate is low and real-time is poor The technical issues of.Improve the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method of the present invention is accurate True rate is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can carry out end-to-end Optimization.
(2) the gesture identification model that present invention training obtains is cropped to 4MB from 55MB, and forward direction infers speed from 28FPS Accelerate to 125FPS.Absolutely prove that real-time gesture identification, and network may be implemented in the gesture identification model that present invention training obtains Model is compressed into 4MB, and computation amount is conveniently transplanted on embedded platform.
Description of the drawings
A kind of Fig. 1 flow charts of gesture identification model training method provided in an embodiment of the present invention;
The structure chart of Fig. 2 Light YOLO provided in an embodiment of the present invention;
Fig. 3 selective-dropout network prunings algorithm flow charts provided in an embodiment of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below It does not constitute a conflict with each other and can be combined with each other.
As shown in Figure 1, a kind of gesture identification model training method, including:
(1) the gesture picture sample under several scenes is acquired, includes mainly simple background, complex background, colour of skin background, people Hand channel crosses face, has scenes, the pickers such as other non-predetermined adopted gestures to be about 2 to 3 meters with a distance from camera.Mark gesture figure Hand gesture location in piece sample and gesture classification obtain gesture database, and database includes 5738 samples, by gesture database In gesture picture sample be divided into training set and test set by 1: 1;It is carried out in gesture picture sample in training set random It cuts, obtains new gesture sample, until reaching 350 per a kind of training samples number, by the gesture picture in training set Sample and new gesture sample are as sample set.
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;It specifically includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, by the 14th, 15 layer in network The port number of convolutional layer is kept to 512 dimensions, and the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is subtracted For original half, then recycle a convolution kernel size identical as the 15th layer of convolutional layer for 1 × 1, step-length 2, port number The convolutional layer characteristic pattern that exports the 8th layer of convolutional layer carry out down-sampled coding, and will be down-sampled after characteristic pattern and the 15th layer The characteristic pattern of convolutional layer output is attached, and thus obtains Light YOLO networks.As shown in Fig. 2, Light YOLO networks are total It is made of altogether the maximum pond layers of 18 convolutional layers and 4, and all connects that there are one the layers and one that batch standardizes behind preceding 17 convolutional layers A Leaky Relu layers.Specific each layer of parameter is described as follows:
Conv1:Convolution kernel size is 3 × 3, and input channel number is 3, and output channel number is 32.
Maxpooling1:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv2:Convolution kernel size is 3 × 3, and input channel number is 32, and output channel number is 64.
Maxpooling2:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv3:Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.
Conv4:Convolution kernel size is 1 × 1, and input channel number is 128, and output channel number is 64.
Conv5:Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.
Maxpooling3:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv6:Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.
Conv7:Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 128.
Conv8:Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.
Maxpooling4:Pond window size is 2 × 2, and step-length is 2 × 2.
Conv9:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv10:Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.
Conv11:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv12:Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.
Conv13:Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.
Conv14:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.
Conv15:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.
Conv16:Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 512, step-length 2.
Conv17:Convolution kernel size is 3 × 3, and input channel number is 1024, and output channel number is 512.
Conv18:Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 75.
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks Light YOLO networks are trained, and obtain initial Light YOLO networks;Light YOLO networks are trained, target Function is made of error of coordinate, confidence level error and error in classification three parts:
Wherein, λobj, λnoobjIt is that target candidate frame confidence level error and non-targeted candidate frame confidence level error are respectively Number, xi, yi, wi, hi, CiIt is top left co-ordinate, width, height and the confidence level of candidate frame,It is indicia framing Top left co-ordinate, width, height and confidence level, pi(c) refer to the probability that neural network forecast this candidate frame is classification c,Refer to Candidate frame is the true probability of classification c.Indicate that candidate frame i includes target,Indicate that candidate frame i does not include target.
The decaying rule of learning rate is:It is reduced to 10 first 100 times-5Warm up training is carried out, then returns to 10-4, respectively at 20 weeks It is reduced to 5e when phase and 150 period-5With 10-5
(2-3) tests initial Light YOLO networks using test set, and initial Light YOLO networks are exported Have the candidate frame of maximum confidence as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, Think that identification is correct, otherwise it is assumed that identification mistake obtains the first Light when recognition correct rate is more than or equal to recognition threshold YOLO networks and its network parameter.
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification Model.As shown in figure 3, specifically including:
Sample set is inputted the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks by (3-1) Characteristic pattern is exported, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO network backpropagations obtain derivative of the object function to characteristic pattern, by the corresponding derivative phase of the excitation value of characteristic pattern Multiply to get to the Taylor expansion value of all characteristic patterns, Taylor expansion is selected to be worth A minimum characteristic pattern as feature to be cut Figure;
Beta pruning process is regarded as an optimization process by us, and the purpose of optimization is the network found after optimal beta pruning Parameter so that the change of loss function is minimum before and after beta pruning:|ΔL(hi) |=| L (D | W ')-L (D | W) |.Wherein, D is sample Collection, W, W ' are respectively the parameter before and after LightYOLO network beta prunings.It is considered that the parameter of convolution kernel is calculated with by parameter Characteristic pattern be equivalently to depend on loss function, in order to indicate convenient, we are indicated as follows:L (D, fi)=L (D | wi).Then to any one characteristic pattern fiBeta pruning is carried out, brings the variation of loss function that can be expressed as:
|ΔL(fi) |=| L (D, fi=0)-L (D, fi)|
Wherein L (D, fi=0) characteristic pattern f is representediLoss function value after cropped, can be regarded as L (D, fi) in fi=0 The Taylor expansion at place.We are unfolded above formula using first order Taylor expansion formula, since higher order term can bring a large amount of meter It calculates, so being unfolded only with first order Taylor, and neglects single order remainder, finally obtain formula:
Wherein,It is object function to the derivative of characteristic pattern.
(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks, It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO Network;For convolutional layer l, primitive character map number is K, and effective feature map number is C to selective-dropout later, In order to ensure that it is constant that the numerical value of the input of next layer of neuron ensures, need, by characteristic pattern divided by C/K, to be shown below:
Wherein,Represent the excitation value of k-th of characteristic pattern of l layers of convolutional layer.
(3-3) makes the 2nd Light YOLO networks receive 10 times using the 2nd Light YOLO network trainings of sample set pair It holds back, learning rate 10-5, the corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks It is cut, then removes selective-dropout layers, obtain the 3rd Light YOLO networks, using sample set to cutting The 3rd Light YOLO networks afterwards are trained 10 times to restore network performance, learning rate 10-5
(3-4) is less than 20 times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained In 3rd Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO to completing to cut Network carries out 20 training to restorability, obtains gesture identification model.
The gesture identification model that the present invention trains is cropped to 4MB from 55MB, and forward direction infers that speed accelerates from 28FPS To 125FPS.
A kind of gesture recognition system, including:
Sample collection module marks the hand in gesture picture sample for acquiring the gesture picture sample under several scenes Gesture position and gesture classification, then carry out random cropping in gesture sample, obtain new gesture sample, by gesture picture sample With new gesture sample as sample set;
Network training module, for being based on YOLOv2 target detection networks, remove its last one maximum pond layer and The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original one by the 6th group of convolutional layer group Half, the output characteristic pattern of the 8th layer of convolutional layer is subjected to drop using port number convolutional layer identical with the 15th layer of convolutional layer and is adopted Sample encodes, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
Network pruning module, for sample set to be inputted the first Light YOLO networks, the first Light YOLO networks Each layer of convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, selects importance most A low characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into And obtain gesture identification model;
Gesture recognition module obtains waiting knowing for carrying out gesture identification to image to be identified using gesture identification model Hand gesture location in other image and gesture classification.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include Within protection scope of the present invention.

Claims (7)

1. a kind of gesture identification model training method, which is characterized in that including:
(1) the gesture picture sample under several scenes is acquired, marks hand gesture location and gesture classification in gesture picture sample, so Random cropping is carried out in gesture sample afterwards, obtains new gesture sample, using gesture picture sample and new gesture sample as Sample set;
(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, it will The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in YOLOv2 target detection networks, using a port number with The characteristic pattern that the 15th layer of identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks;
(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks exports Characteristic pattern carries out importance assessment using first order Taylor expansion to characteristic pattern, and A characteristic pattern for selecting importance minimum is as waiting for Characteristic pattern is cut, one selective-dropout layers are added behind each layer of convolutional layer of the first Light YOLO networks, is obtained To the 2nd Light YOLO networks, it is trained to the 2nd Light YOLO nets using the 2nd Light YOLO networks of sample set pair Network is restrained, and is cut to the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, is obtained third Light YOLO networks are trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification model.
2. a kind of gesture identification model training method as described in claim 1, which is characterized in that the step (1) includes:
The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample is marked, obtains Gesture picture sample in gesture database is divided into training set and test set by gesture database;Gesture figure in training set Carry out random cropping on piece sample, obtain new gesture sample, by training set gesture picture sample and new gesture sample As sample set.
3. a kind of gesture identification model training method as claimed in claim 2, which is characterized in that the step (2) includes:
(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and by YOLOv2 target detection nets The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in network, then recycles a port number and the 15th layer of convolution The characteristic pattern that 8th layer of convolutional layer export by the identical convolutional layer of layer carries out down-sampled coding, and will be down-sampled after characteristic pattern and The characteristic pattern of 15th layer of convolutional layer output is attached, and thus obtains Light YOLO networks;
(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameters as Light Sample set is inputted Light YOLO networks, using stochastic gradient descent method to Light by the initial network parameter of YOLO networks YOLO networks are trained, and obtain initial Light YOLO networks;
(2-3) tests initial Light YOLO networks using test set, by having for initial Light YOLO networks output The candidate frame of maximum confidence is as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, then it is assumed that Identification is correct, otherwise it is assumed that identification mistake obtains the first Light YOLO nets when recognition correct rate is more than or equal to recognition threshold Network and its network parameter.
4. a kind of gesture identification model training method as claimed in claim 1 or 2, which is characterized in that step (3) packet It includes:
Sample set is inputted the first Light YOLO networks by (3-1), and each layer of convolutional layer of the first Light YOLO networks is defeated Go out characteristic pattern, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO Network backpropagation obtains derivative of the object function to characteristic pattern, the corresponding derivative of the excitation value of characteristic pattern is multiplied, i.e., The Taylor expansion value of all characteristic patterns is obtained, Taylor expansion is selected to be worth A minimum characteristic pattern as characteristic pattern to be cut;
(3-2) is described behind each layer of convolutional layer of the first Light YOLO networks plus one selective-dropout layers It only treats for selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO nets Network;
(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences, to receiving Convolutional layer generates the corresponding convolution kernel of characteristic pattern to be cut and is cut in the 2nd Light YOLO networks held back, and then removes Selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light YOLO after cutting Network is trained to restore network performance;
(3-4) is less than B times if cutting number, the third after the recovery network performance that sample set input step (3-3) is obtained In Light YOLO networks, step (3-1) is then executed;Otherwise, it completes to cut, and the Light YOLO networks to completing to cut It is trained to restorability, obtains gesture identification model.
5. a kind of gesture identification model, which is characterized in that the gesture identification model is by any described one kind of claim 1-4 Gesture identification model training method trains to obtain.
6. a kind of gesture identification method, which is characterized in that including:
Utilize a kind of any gesture identification models pair that gesture identification model training method is trained of claim 1-4 Image to be identified carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.
7. a kind of gesture recognition system, which is characterized in that including:
Sample collection module marks the gesture position in gesture picture sample for acquiring the gesture picture sample under several scenes Set with gesture classification, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new Gesture sample as sample set;
Network training module removes its last one maximum pond layer and the 6th for being based on YOLOv2 target detection networks Group convolutional layer group, original half is kept to by the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks, profit The output characteristic pattern of the 8th layer of convolutional layer is subjected to down-sampled volume with port number convolutional layer identical with the 15th layer of convolutional layer Code, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO Network is trained Light YOLO networks using sample set, obtains the first Light YOLO networks;
Network pruning module, for by sample set input the first Light YOLO networks, the first Light YOLO networks it is each Layer convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, the A for selecting importance minimum A characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into And obtain gesture identification model;
Gesture recognition module obtains to be identified for carrying out gesture identification to image to be identified using gesture identification model Hand gesture location in image and gesture classification.
CN201810314455.9A 2018-04-09 2018-04-09 Gesture recognition model training method, gesture recognition method and system Expired - Fee Related CN108629288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810314455.9A CN108629288B (en) 2018-04-09 2018-04-09 Gesture recognition model training method, gesture recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810314455.9A CN108629288B (en) 2018-04-09 2018-04-09 Gesture recognition model training method, gesture recognition method and system

Publications (2)

Publication Number Publication Date
CN108629288A true CN108629288A (en) 2018-10-09
CN108629288B CN108629288B (en) 2020-05-19

Family

ID=63705035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810314455.9A Expired - Fee Related CN108629288B (en) 2018-04-09 2018-04-09 Gesture recognition model training method, gesture recognition method and system

Country Status (1)

Country Link
CN (1) CN108629288B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN109828578A (en) * 2019-02-22 2019-05-31 南京天创电子技术有限公司 A kind of instrument crusing robot optimal route planing method based on YOLOv3
CN109885677A (en) * 2018-12-26 2019-06-14 中译语通科技股份有限公司 A kind of multi-faceted big data acquisition clearing system and method
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN110033453A (en) * 2019-04-18 2019-07-19 国网山西省电力公司电力科学研究院 Based on the power transmission and transformation line insulator Aerial Images fault detection method for improving YOLOv3
CN110032925A (en) * 2019-02-22 2019-07-19 广西师范大学 A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
CN110096968A (en) * 2019-04-10 2019-08-06 西安电子科技大学 A kind of ultrahigh speed static gesture identification method based on depth model optimization
CN110135398A (en) * 2019-05-28 2019-08-16 厦门瑞为信息技术有限公司 Both hands off-direction disk detection method based on computer vision
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN113167495A (en) * 2018-12-12 2021-07-23 三菱电机株式会社 Air conditioner control device and air conditioner control method
CN113191243A (en) * 2021-04-25 2021-07-30 华中科技大学 Human hand three-dimensional attitude estimation model establishment method based on camera distance and application thereof

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930514A (en) * 2012-09-27 2013-02-13 西安电子科技大学 Rapid image defogging method based on atmospheric physical scattering model
US9286524B1 (en) * 2015-04-15 2016-03-15 Toyota Motor Engineering & Manufacturing North America, Inc. Multi-task deep convolutional neural networks for efficient and robust traffic lane detection
CN106355248A (en) * 2016-08-26 2017-01-25 深圳先进技术研究院 Deep convolution neural network training method and device
CN106529578A (en) * 2016-10-20 2017-03-22 中山大学 Vehicle brand model fine identification method and system based on depth learning
CN106779068A (en) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 The method and apparatus for adjusting artificial neural network
CN107368885A (en) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 Network model compression method and device based on more granularity beta prunings
CN107463965A (en) * 2017-08-16 2017-12-12 湖州易有科技有限公司 Fabric attribute picture collection and recognition methods and identifying system based on deep learning
CN107590449A (en) * 2017-08-31 2018-01-16 电子科技大学 A kind of gesture detecting method based on weighted feature spectrum fusion
CN107590432A (en) * 2017-07-27 2018-01-16 北京联合大学 A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
CN107688850A (en) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 A kind of deep neural network compression method
CN107729854A (en) * 2017-10-25 2018-02-23 南京阿凡达机器人科技有限公司 A kind of gesture identification method of robot, system and robot

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930514A (en) * 2012-09-27 2013-02-13 西安电子科技大学 Rapid image defogging method based on atmospheric physical scattering model
US9286524B1 (en) * 2015-04-15 2016-03-15 Toyota Motor Engineering & Manufacturing North America, Inc. Multi-task deep convolutional neural networks for efficient and robust traffic lane detection
CN106355248A (en) * 2016-08-26 2017-01-25 深圳先进技术研究院 Deep convolution neural network training method and device
CN106529578A (en) * 2016-10-20 2017-03-22 中山大学 Vehicle brand model fine identification method and system based on depth learning
CN106779068A (en) * 2016-12-05 2017-05-31 北京深鉴智能科技有限公司 The method and apparatus for adjusting artificial neural network
CN107368885A (en) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 Network model compression method and device based on more granularity beta prunings
CN107590432A (en) * 2017-07-27 2018-01-16 北京联合大学 A kind of gesture identification method based on circulating three-dimensional convolutional neural networks
CN107688850A (en) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 A kind of deep neural network compression method
CN107463965A (en) * 2017-08-16 2017-12-12 湖州易有科技有限公司 Fabric attribute picture collection and recognition methods and identifying system based on deep learning
CN107590449A (en) * 2017-08-31 2018-01-16 电子科技大学 A kind of gesture detecting method based on weighted feature spectrum fusion
CN107729854A (en) * 2017-10-25 2018-02-23 南京阿凡达机器人科技有限公司 A kind of gesture identification method of robot, system and robot

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN L C等: "Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs", 《COMPUTER SCIENCE》 *
杨红玲等: "基于卷积神经网络的手势识别", 《计算机技术与发展》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447034A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Traffic mark detection method in automatic Pilot based on YOLOv3 network
CN113167495A (en) * 2018-12-12 2021-07-23 三菱电机株式会社 Air conditioner control device and air conditioner control method
CN109885677A (en) * 2018-12-26 2019-06-14 中译语通科技股份有限公司 A kind of multi-faceted big data acquisition clearing system and method
CN109828578B (en) * 2019-02-22 2020-06-16 南京天创电子技术有限公司 Instrument inspection robot optimal route planning method based on YOLOv3
CN109828578A (en) * 2019-02-22 2019-05-31 南京天创电子技术有限公司 A kind of instrument crusing robot optimal route planing method based on YOLOv3
CN110032925A (en) * 2019-02-22 2019-07-19 广西师范大学 A kind of images of gestures segmentation and recognition methods based on improvement capsule network and algorithm
CN109978069A (en) * 2019-04-02 2019-07-05 南京大学 The method for reducing ResNeXt model over-fitting in picture classification
CN109978069B (en) * 2019-04-02 2020-10-09 南京大学 Method for reducing overfitting phenomenon of ResNeXt model in image classification
CN110096968A (en) * 2019-04-10 2019-08-06 西安电子科技大学 A kind of ultrahigh speed static gesture identification method based on depth model optimization
CN110096968B (en) * 2019-04-10 2023-02-07 西安电子科技大学 Ultra-high-speed static gesture recognition method based on depth model optimization
CN110033453A (en) * 2019-04-18 2019-07-19 国网山西省电力公司电力科学研究院 Based on the power transmission and transformation line insulator Aerial Images fault detection method for improving YOLOv3
CN110033453B (en) * 2019-04-18 2023-02-24 国网山西省电力公司电力科学研究院 Power transmission and transformation line insulator aerial image fault detection method based on improved YOLOv3
CN110135398A (en) * 2019-05-28 2019-08-16 厦门瑞为信息技术有限公司 Both hands off-direction disk detection method based on computer vision
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN113191243A (en) * 2021-04-25 2021-07-30 华中科技大学 Human hand three-dimensional attitude estimation model establishment method based on camera distance and application thereof

Also Published As

Publication number Publication date
CN108629288B (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN108629288A (en) A kind of gesture identification model training method, gesture identification method and system
CN110781838B (en) Multi-mode track prediction method for pedestrians in complex scene
CN109145939B (en) Semantic segmentation method for small-target sensitive dual-channel convolutional neural network
CN109902677A (en) A kind of vehicle checking method based on deep learning
CN108549893A (en) A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN110263833A (en) Based on coding-decoding structure image, semantic dividing method
CN107818302A (en) Non-rigid multiple dimensioned object detecting method based on convolutional neural networks
CN110532859A (en) Remote Sensing Target detection method based on depth evolution beta pruning convolution net
CN106127204A (en) A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN110188720A (en) A kind of object detection method and system based on convolutional neural networks
CN108564097A (en) A kind of multiscale target detection method based on depth convolutional neural networks
CN106372597B (en) CNN Vehicle Detection method based on adaptive contextual information
CN107229904A (en) A kind of object detection and recognition method based on deep learning
CN107529650A (en) The structure and closed loop detection method of network model, related device and computer equipment
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN110472542A (en) A kind of infrared image pedestrian detection method and detection system based on deep learning
CN114842208A (en) Power grid harmful bird species target detection method based on deep learning
CN109145836A (en) Ship target video detection method based on deep learning network and Kalman filtering
CN109948707A (en) Model training method, device, terminal and storage medium
CN114360005B (en) Micro-expression classification method based on AU region and multi-level transducer fusion module
CN110598586A (en) Target detection method and system
CN110210462A (en) A kind of bionical hippocampus cognitive map construction method based on convolutional neural networks
CN110163069A (en) Method for detecting lane lines for assisting driving
Yang et al. Fruit target detection based on BCo-YOLOv5 model
CN109753853A (en) One kind being completed at the same time pedestrian detection and pedestrian knows method for distinguishing again

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200519