CN108629288A

CN108629288A - A kind of gesture identification model training method, gesture identification method and system

Info

Publication number: CN108629288A
Application number: CN201810314455.9A
Authority: CN
Inventors: 桑农; 倪子涵; 陈佳; 高常鑫
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-04-09
Filing date: 2018-04-09
Publication date: 2018-10-09
Anticipated expiration: 2038-04-09
Also published as: CN108629288B

Abstract

The invention discloses a kind of gesture identification model training method, gesture identification method and systems, wherein training method includes the gesture picture sample acquired under several scenes, random cropping is carried out in gesture sample, new gesture sample is obtained, using gesture picture sample and new gesture sample as sample set；Light YOLO networks are built, Light YOLO networks are trained using sample set, obtain the first Light YOLO networks；Add dropout layers of a selective behind each layer of convolutional layer of the first Light YOLO networks, obtain the 2nd Light YOLO networks, it is trained convergence using the 2nd Light YOLO networks of sample set pair, is then cut into row of channels, and then obtain gesture identification model.The present invention improves the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method accuracy rate of the present invention is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can be optimized end to end.

Description

A kind of gesture identification model training method, gesture identification method and system

Technical field

The invention belongs to technical field of computer vision, more particularly, to a kind of gesture identification model training method, hand Gesture recognition methods and system.

Background technology

For gesture as one of most natural body language, interactive process can be made more in field of human-computer interaction by being applied Naturally, being wherein the emphasis of current field of human-computer interaction research to the identification of human hand.Domestic and foreign scholars are to the hand of view-based access control model Gesture identification technology expands many researchs.

Traditional gesture recognition system generally first carries out Hand Gesture Segmentation and obtains gesture area, then extracts gesture feature, most Classified afterwards using gesture feature.Traditional method needs artificial design features, such as color characteristic, HOG features etc., this The generalization ability of a little features is poor, needs to design different features for different tasks.Resist since artificial neural network has The features such as interference, self-organizing, strong self study and noise resisting ability, is more and more used on gesture classification.With base In the development of the target detection network of neural network, the Gesture Recognition based on target detection network starts to develop.However base Accuracy rate is risen than traditional machine learning method in the Gesture Recognition of neural network, but there is also network calculations amounts Greatly, the shortcomings of model is complicated, real-time is not strong.

It can be seen that there are the technologies that accuracy rate is low and real-time is poor to ask for existing gesture identification method under complex scene Topic.

Invention content

For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of gesture identification model training sides Method, gesture identification method and system, thus solve that there are accuracys rate is low and real for existing gesture identification method under complex scene The technical problem of when property difference.

To achieve the above object, according to one aspect of the present invention, a kind of gesture identification model training method is provided, is wrapped It includes：

(1) the gesture picture sample under several scenes is acquired, the hand gesture location and gesture class in gesture picture sample are marked Not, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new gesture sample As sample set；

(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks；

(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification Model.

Further, step (1) includes：

The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample are marked, Gesture database is obtained, the gesture picture sample in gesture database is divided into training set and test set；Hand in training set Carry out random cropping in gesture picture sample, obtain new gesture sample, by training set gesture picture sample and new gesture Sample is as sample set.

Further, step (2) includes：

(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and YOLOv2 targets are examined The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in survey grid network, then recycles a port number and the 15th layer The characteristic pattern that the identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, and will be down-sampled after feature The characteristic pattern for scheming to export with the 15th layer of convolutional layer is attached, and thus obtains Light YOLO networks；

(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks Light YOLO networks are trained, and obtain initial Light YOLO networks；

(2-3) tests initial Light YOLO networks using test set, and initial Light YOLO networks are exported Have the candidate frame of maximum confidence as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, Think that identification is correct, otherwise it is assumed that identification mistake obtains the first Light when recognition correct rate is more than or equal to recognition threshold YOLO networks and its network parameter.

Further, step (3) includes：

Sample set is inputted the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks by (3-1) Characteristic pattern is exported, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO network backpropagations obtain derivative of the object function to characteristic pattern, by the corresponding derivative phase of the excitation value of characteristic pattern Multiply to get to the Taylor expansion value of all characteristic patterns, Taylor expansion is selected to be worth A minimum characteristic pattern as feature to be cut Figure；

(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks, It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO Network；

(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences, The corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks to cut, and is then moved Except selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light after cutting YOLO networks are trained to restore network performance；

(3-4) is less than B times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained In 3rd Light YOLO networks, step (3-1) is then executed；Otherwise, it completes to cut, and the Light YOLO to completing to cut Network is trained to restorability, obtains gesture identification model.

It is another aspect of this invention to provide that providing a kind of gesture identification model, the gesture identification model is by the present invention A kind of gesture identification model training method train to obtain.

It is another aspect of this invention to provide that a kind of gesture identification method is provided, including：

Using a kind of gesture identification model that gesture identification model training method is trained of the present invention to be identified Image carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.

It is another aspect of this invention to provide that a kind of gesture recognition system is provided, including：

Sample collection module marks the hand in gesture picture sample for acquiring the gesture picture sample under several scenes Gesture position and gesture classification, then carry out random cropping in gesture sample, obtain new gesture sample, by gesture picture sample With new gesture sample as sample set；

Network training module, for being based on YOLOv2 target detection networks, remove its last one maximum pond layer and The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original one by the 6th group of convolutional layer group Half, the output characteristic pattern of the 8th layer of convolutional layer is subjected to drop using port number convolutional layer identical with the 15th layer of convolutional layer and is adopted Sample encodes, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks；

Network pruning module, for sample set to be inputted the first Light YOLO networks, the first Light YOLO networks Each layer of convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, selects importance most A low characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into And obtain gesture identification model；

Gesture recognition module obtains waiting knowing for carrying out gesture identification to image to be identified using gesture identification model Hand gesture location in other image and gesture classification.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, can obtain down and show Beneficial effect：

(1) present invention is when building Light YOLO networks, in order to increase the semantic letter of gesture on network top characteristic pattern Breath removes its last one maximum pond layer and the 6th group of convolutional layer group to reduce network step-length, and by the 14th in network, 15, the port number of 17 layers of convolutional layer is kept to original half to prevent over-fitting.In addition, in the way of high low layer Fusion Features Build the more rich top-level feature figure of semantic information.It can retain more spatial informations using convolutional layer is down-sampled, and can encode At specified port number.When optimizing Light YOLO networks, in order to which each iteration can cut more characteristic pattern without influencing Network performance all adds a selective-dropout layers of progress network instruction behind each layer of convolutional layer of Light YOLO Practice, this layer, which is only treated, cuts characteristic pattern execution dropout operations, to reduce the dependence that network handles cut characteristic pattern；By In these improvement of the present invention, solving under complex scene existing gesture identification method, there are accuracy rate is low and real-time is poor The technical issues of.Improve the detection performance of the network gesture smaller to resolution ratio.So that the gesture identification method of the present invention is accurate True rate is high and real-time is good.Enable present system directly to obtain recognition result from picture simultaneously, can carry out end-to-end Optimization.

(2) the gesture identification model that present invention training obtains is cropped to 4MB from 55MB, and forward direction infers speed from 28FPS Accelerate to 125FPS.Absolutely prove that real-time gesture identification, and network may be implemented in the gesture identification model that present invention training obtains Model is compressed into 4MB, and computation amount is conveniently transplanted on embedded platform.

Description of the drawings

A kind of Fig. 1 flow charts of gesture identification model training method provided in an embodiment of the present invention；

The structure chart of Fig. 2 Light YOLO provided in an embodiment of the present invention；

Fig. 3 selective-dropout network prunings algorithm flow charts provided in an embodiment of the present invention.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below It does not constitute a conflict with each other and can be combined with each other.

As shown in Figure 1, a kind of gesture identification model training method, including：

(1) the gesture picture sample under several scenes is acquired, includes mainly simple background, complex background, colour of skin background, people Hand channel crosses face, has scenes, the pickers such as other non-predetermined adopted gestures to be about 2 to 3 meters with a distance from camera.Mark gesture figure Hand gesture location in piece sample and gesture classification obtain gesture database, and database includes 5738 samples, by gesture database In gesture picture sample be divided into training set and test set by 1: 1；It is carried out in gesture picture sample in training set random It cuts, obtains new gesture sample, until reaching 350 per a kind of training samples number, by the gesture picture in training set Sample and new gesture sample are as sample set.

(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, The port number of 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is kept to original half, utilizes a port number The characteristic pattern that convolutional layer identical with the 15th layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks；It specifically includes：

(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, by the 14th, 15 layer in network The port number of convolutional layer is kept to 512 dimensions, and the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks is subtracted For original half, then recycle a convolution kernel size identical as the 15th layer of convolutional layer for 1 × 1, step-length 2, port number The convolutional layer characteristic pattern that exports the 8th layer of convolutional layer carry out down-sampled coding, and will be down-sampled after characteristic pattern and the 15th layer The characteristic pattern of convolutional layer output is attached, and thus obtains Light YOLO networks.As shown in Fig. 2, Light YOLO networks are total It is made of altogether the maximum pond layers of 18 convolutional layers and 4, and all connects that there are one the layers and one that batch standardizes behind preceding 17 convolutional layers A Leaky Relu layers.Specific each layer of parameter is described as follows：

Conv1：Convolution kernel size is 3 × 3, and input channel number is 3, and output channel number is 32.

Maxpooling1：Pond window size is 2 × 2, and step-length is 2 × 2.

Conv2：Convolution kernel size is 3 × 3, and input channel number is 32, and output channel number is 64.

Maxpooling2：Pond window size is 2 × 2, and step-length is 2 × 2.

Conv3：Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.

Conv4：Convolution kernel size is 1 × 1, and input channel number is 128, and output channel number is 64.

Conv5：Convolution kernel size is 3 × 3, and input channel number is 64, and output channel number is 128.

Maxpooling3：Pond window size is 2 × 2, and step-length is 2 × 2.

Conv6：Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.

Conv7：Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 128.

Conv8：Convolution kernel size is 3 × 3, and input channel number is 128, and output channel number is 256.

Maxpooling4：Pond window size is 2 × 2, and step-length is 2 × 2.

Conv9：Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.

Conv10：Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.

Conv11：Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.

Conv12：Convolution kernel size is 1 × 1, and input channel number is 512, and output channel number is 256.

Conv13：Convolution kernel size is 3 × 3, and input channel number is 256, and output channel number is 512.

Conv14：Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.

Conv15：Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 512.

Conv16：Convolution kernel size is 1 × 1, and input channel number is 256, and output channel number is 512, step-length 2.

Conv17：Convolution kernel size is 3 × 3, and input channel number is 1024, and output channel number is 512.

Conv18：Convolution kernel size is 3 × 3, and input channel number is 512, and output channel number is 75.

(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameter conducts Sample set is inputted Light YOLO networks, uses stochastic gradient descent method pair by the initial network parameter of Light YOLO networks Light YOLO networks are trained, and obtain initial Light YOLO networks；Light YOLO networks are trained, target Function is made of error of coordinate, confidence level error and error in classification three parts：

Wherein, λ_obj, λ_noobjIt is that target candidate frame confidence level error and non-targeted candidate frame confidence level error are respectively Number, x_i, y_i, w_i, h_i, C_iIt is top left co-ordinate, width, height and the confidence level of candidate frame,It is indicia framing Top left co-ordinate, width, height and confidence level, p_i(c) refer to the probability that neural network forecast this candidate frame is classification c,Refer to Candidate frame is the true probability of classification c.Indicate that candidate frame i includes target,Indicate that candidate frame i does not include target.

The decaying rule of learning rate is：It is reduced to 10 first 100 times^-5Warm up training is carried out, then returns to 10^-4, respectively at 20 weeks It is reduced to 5e when phase and 150 period^-5With 10^-5。

(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks is equal Characteristic pattern is exported, importance assessment is carried out to characteristic pattern using first order Taylor expansion, A characteristic pattern for selecting importance minimum is made For characteristic pattern to be cut, a selective-dropout is added behind each layer of convolutional layer of the first Light YOLO networks Layer, is obtained the 2nd Light YOLO networks, is trained to the 2nd Light using the 2nd Light YOLO networks of sample set pair YOLO network convergences cut the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, obtain To the 3rd Light YOLO networks, it is trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification Model.As shown in figure 3, specifically including：

Beta pruning process is regarded as an optimization process by us, and the purpose of optimization is the network found after optimal beta pruning Parameter so that the change of loss function is minimum before and after beta pruning：|ΔL(h_i) |=| L (D | W ')-L (D | W) |.Wherein, D is sample Collection, W, W ' are respectively the parameter before and after LightYOLO network beta prunings.It is considered that the parameter of convolution kernel is calculated with by parameter Characteristic pattern be equivalently to depend on loss function, in order to indicate convenient, we are indicated as follows：L (D, f_i)=L (D | w_i).Then to any one characteristic pattern f_iBeta pruning is carried out, brings the variation of loss function that can be expressed as：

|ΔL(f_i) |=| L (D, f_i=0)-L (D, f_i)|

Wherein L (D, f_i=0) characteristic pattern f is represented_iLoss function value after cropped, can be regarded as L (D, f_i) in f_i=0 The Taylor expansion at place.We are unfolded above formula using first order Taylor expansion formula, since higher order term can bring a large amount of meter It calculates, so being unfolded only with first order Taylor, and neglects single order remainder, finally obtain formula：

Wherein,It is object function to the derivative of characteristic pattern.

(3-2) adds one selective-dropout layers behind each layer of convolutional layer of the first Light YOLO networks, It only treats for described selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO Network；For convolutional layer l, primitive character map number is K, and effective feature map number is C to selective-dropout later, In order to ensure that it is constant that the numerical value of the input of next layer of neuron ensures, need, by characteristic pattern divided by C/K, to be shown below：

Wherein,Represent the excitation value of k-th of characteristic pattern of l layers of convolutional layer.

(3-3) makes the 2nd Light YOLO networks receive 10 times using the 2nd Light YOLO network trainings of sample set pair It holds back, learning rate 10^-5, the corresponding convolution kernel of characteristic pattern to be cut is generated to convolutional layer in convergent 2nd Light YOLO networks It is cut, then removes selective-dropout layers, obtain the 3rd Light YOLO networks, using sample set to cutting The 3rd Light YOLO networks afterwards are trained 10 times to restore network performance, learning rate 10^-5。

(3-4) is less than 20 times if cutting number, after the recovery network performance that sample set input step (3-3) is obtained In 3rd Light YOLO networks, step (3-1) is then executed；Otherwise, it completes to cut, and the Light YOLO to completing to cut Network carries out 20 training to restorability, obtains gesture identification model.

The gesture identification model that the present invention trains is cropped to 4MB from 55MB, and forward direction infers that speed accelerates from 28FPS To 125FPS.

A kind of gesture recognition system, including：

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include Within protection scope of the present invention.

Claims

1. a kind of gesture identification model training method, which is characterized in that including：

(1) the gesture picture sample under several scenes is acquired, marks hand gesture location and gesture classification in gesture picture sample, so Random cropping is carried out in gesture sample afterwards, obtains new gesture sample, using gesture picture sample and new gesture sample as Sample set；

(2) YOLOv2 target detection networks are based on, its last one maximum pond layer and the 6th group of convolutional layer group are removed, it will The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in YOLOv2 target detection networks, using a port number with The characteristic pattern that the 15th layer of identical convolutional layer of convolutional layer exports the 8th layer of convolutional layer carries out down-sampled coding, obtains Light YOLO networks are trained Light YOLO networks using sample set, obtain the first Light YOLO networks；

(3) sample set is inputted into the first Light YOLO networks, each layer of convolutional layer of the first Light YOLO networks exports Characteristic pattern carries out importance assessment using first order Taylor expansion to characteristic pattern, and A characteristic pattern for selecting importance minimum is as waiting for Characteristic pattern is cut, one selective-dropout layers are added behind each layer of convolutional layer of the first Light YOLO networks, is obtained To the 2nd Light YOLO networks, it is trained to the 2nd Light YOLO nets using the 2nd Light YOLO networks of sample set pair Network is restrained, and is cut to the corresponding convolution kernel of characteristic pattern to be cut in convergent 2nd Light YOLO networks, is obtained third Light YOLO networks are trained using the 3rd Light YOLO networks of sample set pair, and then obtain gesture identification model.

2. a kind of gesture identification model training method as described in claim 1, which is characterized in that the step (1) includes：

The gesture picture sample under several scenes is acquired, hand gesture location and gesture classification in gesture picture sample is marked, obtains Gesture picture sample in gesture database is divided into training set and test set by gesture database；Gesture figure in training set Carry out random cropping on piece sample, obtain new gesture sample, by training set gesture picture sample and new gesture sample As sample set.

3. a kind of gesture identification model training method as claimed in claim 2, which is characterized in that the step (2) includes：

(2-1) removes its last one maximum pond layer and the 6th group of convolutional layer group first, and by YOLOv2 target detection nets The port number of the 14th, 15,17 layer of convolutional layer is kept to original half in network, then recycles a port number and the 15th layer of convolution The characteristic pattern that 8th layer of convolutional layer export by the identical convolutional layer of layer carries out down-sampled coding, and will be down-sampled after characteristic pattern and The characteristic pattern of 15th layer of convolutional layer output is attached, and thus obtains Light YOLO networks；

(2-2) utilizes ImageNet database training YOLOv2 target detection networks, obtains YOLOv2 network parameters as Light Sample set is inputted Light YOLO networks, using stochastic gradient descent method to Light by the initial network parameter of YOLO networks YOLO networks are trained, and obtain initial Light YOLO networks；

(2-3) tests initial Light YOLO networks using test set, by having for initial Light YOLO networks output The candidate frame of maximum confidence is as prediction gesture box, if prediction gesture box rushes yield with true gesture box is more than 0.6, then it is assumed that Identification is correct, otherwise it is assumed that identification mistake obtains the first Light YOLO nets when recognition correct rate is more than or equal to recognition threshold Network and its network parameter.

4. a kind of gesture identification model training method as claimed in claim 1 or 2, which is characterized in that step (3) packet It includes：

Sample set is inputted the first Light YOLO networks by (3-1), and each layer of convolutional layer of the first Light YOLO networks is defeated Go out characteristic pattern, the excitation value of characteristic pattern is obtained by the forward pass of the first Light YOLO networks, then pass through the first Light YOLO Network backpropagation obtains derivative of the object function to characteristic pattern, the corresponding derivative of the excitation value of characteristic pattern is multiplied, i.e., The Taylor expansion value of all characteristic patterns is obtained, Taylor expansion is selected to be worth A minimum characteristic pattern as characteristic pattern to be cut；

(3-2) is described behind each layer of convolutional layer of the first Light YOLO networks plus one selective-dropout layers It only treats for selective-dropout layers and cuts characteristic pattern execution dropout operations, thus obtain the 2nd Light YOLO nets Network；

(3-3) is trained using the 2nd Light YOLO networks of sample set pair to the 2nd Light YOLO network convergences, to receiving Convolutional layer generates the corresponding convolution kernel of characteristic pattern to be cut and is cut in the 2nd Light YOLO networks held back, and then removes Selective-dropout layers, the 3rd Light YOLO networks are obtained, using sample set to the 3rd Light YOLO after cutting Network is trained to restore network performance；

(3-4) is less than B times if cutting number, the third after the recovery network performance that sample set input step (3-3) is obtained In Light YOLO networks, step (3-1) is then executed；Otherwise, it completes to cut, and the Light YOLO networks to completing to cut It is trained to restorability, obtains gesture identification model.

5. a kind of gesture identification model, which is characterized in that the gesture identification model is by any described one kind of claim 1-4 Gesture identification model training method trains to obtain.

6. a kind of gesture identification method, which is characterized in that including：

Utilize a kind of any gesture identification models pair that gesture identification model training method is trained of claim 1-4 Image to be identified carries out gesture identification, obtains the hand gesture location in image to be identified and gesture classification.

7. a kind of gesture recognition system, which is characterized in that including：

Sample collection module marks the gesture position in gesture picture sample for acquiring the gesture picture sample under several scenes Set with gesture classification, random cropping is then carried out in gesture sample, obtains new gesture sample, by gesture picture sample and new Gesture sample as sample set；

Network training module removes its last one maximum pond layer and the 6th for being based on YOLOv2 target detection networks Group convolutional layer group, original half is kept to by the port number of the 14th, 15,17 layer of convolutional layer in YOLOv2 target detection networks, profit The output characteristic pattern of the 8th layer of convolutional layer is subjected to down-sampled volume with port number convolutional layer identical with the 15th layer of convolutional layer Code, and will be down-sampled after the output characteristic pattern of characteristic pattern and the 15th layer of convolutional layer be attached, thus obtain Light YOLO Network is trained Light YOLO networks using sample set, obtains the first Light YOLO networks；

Network pruning module, for by sample set input the first Light YOLO networks, the first Light YOLO networks it is each Layer convolutional layer exports characteristic pattern, carries out importance assessment to characteristic pattern using first order Taylor expansion, the A for selecting importance minimum A characteristic pattern is used as characteristic pattern to be cut, and one is added behind each layer of convolutional layer of the first Light YOLO networks Selective-dropout layers, the 2nd Light YOLO networks are obtained, are carried out using the 2nd Light YOLO networks of sample set pair Training is to the 2nd Light YOLO network convergences, to the corresponding volume of characteristic pattern to be cut in convergent 2nd Light YOLO networks Product core is cut, and is obtained the 3rd Light YOLO networks, is trained using the 3rd Light YOLO networks of sample set pair, into And obtain gesture identification model；

Gesture recognition module obtains to be identified for carrying out gesture identification to image to be identified using gesture identification model Hand gesture location in image and gesture classification.