CN108614995A

CN108614995A - Gesture data collection acquisition method, gesture identification method and device for YOLO networks

Info

Publication number: CN108614995A
Application number: CN201810280674.XA
Authority: CN
Inventors: 陈虎; 谷也; 盛卫华
Original assignee: Shenzhen Intelligent Robot Research Institute
Current assignee: Shenzhen Intelligent Robot Research Institute
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2018-10-02

Abstract

The invention discloses gesture data collection acquisition method, gesture identification method and the devices for YOLO networks.The acquisition method includes obtaining the picture containing training gesture, Image Acquisition window picture is saved as into trained gesture picture, use the simple threshold values skin color segmentation algorithm based on YCbCr color spaces, training gesture picture background is filtered, the gesture training picture of diversity background and the gesture training picture composition gesture data collection using diversity background are obtained；The gesture identification method is to be trained and gesture identification using the gesture data set pair YOLO networks of gained；Described device includes memory and processor.The present invention can make YOLO networks have the identification model to gesture in picture or picture, identification process strong to complex background noises rejection abilities such as face present in picture, class colour of skin walls.The present invention is applied to image recognition processing technical field.

Description

Gesture data collection acquisition method, gesture identification method and device for YOLO networks

Technical field

The present invention relates to image recognition processing technical fields, in particular for the gesture data collection acquisition side of YOLO networks Method, gesture identification method and device.

Background technology

Term is explained

YOLO (YouOnlyLookOnce) is a kind of object detection method based on deep learning, can be used for training Neural network can be used for solving the regression problem of target area prediction and class prediction by the YOLO neural networks trained, Its advantage is that can ensure higher detection speed and accuracy rate simultaneously.

Gesture instruction is a kind of efficient man-machine interaction mode, and computer, mobile phone or other intelligent terminals are used by obtaining The gesture at family to which analysis obtains corresponding gesture instruction, and does corresponding operation or feedback according to gesture instruction.Intelligent terminal The gesture of user can be obtained by modes such as button, touch screen or camera shootings, wherein pass through button or touch The mode of screen needs user to be contacted with machine, and other than reducing operating experience, also strongly limit user gesture can Therefore operability shoots the picture of the hand of user by camera, then identify that the mode of user gesture will be not from picture The mainstream come.The technology of existing shooting identification user gesture is mainly a kind of Hand Gesture Segmentation method based on skin color segmentation, By analyzing the colour of skin of hand and the difference of background color in picture, to which the hand portion in picture be separated with other parts Come, realize the identification of gesture, still, this method presses down the complex background noises such as face present in picture, class colour of skin wall Ability processed is poor, and particularly with the picture centered on reality scene, the methods of traditional connected domain denoising can not be well Solve the problems, such as that noise suppressed, the above-mentioned this Hand Gesture Segmentation method based on skin color segmentation have very big drawback.

Invention content

In order to solve the above-mentioned technical problem, the first object of the present invention is to provide a kind of gesture number for YOLO networks According to collection acquisition method, second is designed to provide a kind of gesture identification method based on YOLO, and third is designed to provide one kind Gesture identifying device based on YOLO.

The first technical solution for being taken of the present invention is：

A kind of gesture data collection acquisition method for YOLO networks includes the following steps：

S1a. the picture containing training gesture is obtained, makes that gesture is trained to be included in sliding window；The sliding window is one can be The window moved within the scope of picture, the position of the sliding window and size can adjust；

S2a. Image Acquisition window picture is saved as into trained gesture picture, while records corresponding calibration frame position and big Small and gesture class label；Described image acquisition window picture is the picture in sliding window, and the calibration frame position and size are It is the gesture for including trained gesture to preserve corresponding sliding window position and size, the gesture class label when training gesture picture Type information；

S3a. use the simple threshold values skin color segmentation algorithm based on YCbCr color spaces, to training gesture picture background into Row filtering, the gesture to obtain diversity background train picture；

S4a. the gesture training picture and corresponding calibration frame position, size and gesture classification mark of diversity background are utilized Label composition gesture data collection.

Further, the step S3a specifically includes following steps：

S31a. the skin color segmentation model based on YCbCr color spaces is used to carry out binary conversion treatment to training gesture picture, To which each pixel in gesture picture will be trained to be classified as colour of skin point or non-colour of skin point respectively；

S32a. the part being made of all colour of skin points is determined as training gesture area part, it will be by all non-colour of skin points The part of composition is determined as background parts.

Further, the step S31a specifically includes following steps：

S311a. according to the rgb value of each pixel in training gesture picture, each pixel in trained gesture picture is calculated The Cb values and Cr values of point；

S312a. according to the Cb values and Cr values of each pixel in training gesture picture, each pixel is classified as the colour of skin respectively Point or non-colour of skin point.

Further, the step S312a is specially：

Whether the Cb values and Cr values of each pixel meet in training of judgement gesture pictureIf so, will Corresponding pixel points are classified as colour of skin point, conversely, corresponding pixel points are classified as non-colour of skin point.

The second technical solution for being taken of the present invention is：

A kind of gesture identification method based on YOLO, includes the following steps：

S1b. gesture number to be trained independently is acquired using any one of the claim 1-4 gesture data collection acquisition methods According to collection；

S2b. the gesture identification using gesture data collection for YOLO networks is trained, to obtain gesture identification mould Type；

S3b. picture to be identified is captured by photographic device, uses the simple threshold values colour of skin based on YCbCr color spaces point It cuts algorithm and filtering background is carried out to picture to be identified；

S4b. real-time gesture class prediction is carried out to picture to be identified using trained gesture identification model and gesture is fixed Position；

S5b. gesture recognition result is carried out, smoothly with amendment, determining final gesture identification result.

Further, the iterations of the YOLO Web vector graphics stochastic gradient descent method, the YOLO networks are 40000 It is secondary.

Further, the step S5b specifically includes following steps：

S51b. picture to be identified is acquired in moment t_iAnd moment t_iT at the time of before_i-1、t_i-2And t_i-3It is corresponding Frame, wherein i are the serial number at moment；

S52b. each frame is input to respectively and is identified using in the YOLO networks after training, to output time t_i、 t_i-1、t_i-2And t_i-3Corresponding recognition result undetermined；

S53b. weighted sum judgment method is utilized, summation is weighted to each recognition result undetermined, to be added Summed result is weighed, according to weighted sum as a result, judging moment t_iGesture identification result.

Further, the weighted sum judgment method specifically includes：

Each recognition result undetermined is denoted asWherein, i is the corresponding moment Serial number；

Equalization result is calculated using following formula：Wherein, X remembers for gesture-type Number, i is the serial number at corresponding moment, and k is summation serial number,For weighted sum result；

IfThen with moment t_iT at the time of corresponding recognition result undetermined is as required obtain_iGesture identification As a result, conversely, t at the time of previously to acquire_i-1Gesture identification result as it is required at the time of t_iGesture identification knot Fruit.

The third technical solution taken of the present invention is：

A kind of gesture identifying device based on YOLO, including：

Memory, for storing at least one program；

Processor requires the described one kind of any one of 5-8 to be based on for loading at least one program with perform claim The gesture identification method of YOLO.

The beneficial effects of the invention are as follows：Training method through the invention can be trained YOLO networks so that it can With with the recognition capability to gesture in picture or picture.Using the YOLO networks after training, present in picture The complex background noises rejection abilities such as face, class colour of skin wall are strong, can handle the picture or picture to be identified inputted, and And quickly and accurately export recognition result.

Description of the drawings

Fig. 1 is the flow chart of training method of the present invention；

Fig. 2 is a picture containing training gesture and the schematic diagram of sliding window；

Fig. 3 is the training gesture picture before a skin color segmentation；

Fig. 4 is the training gesture picture after a skin color segmentation.

Specific implementation mode

Embodiment 1

In the present embodiment, it is used for the gesture data collection acquisition method of YOLO networks, as shown in Figure 1, including the following steps：

S2a. Image Acquisition window picture is saved as into trained gesture picture, while records corresponding calibration frame position and big Small and gesture class label；The calibration frame position and size are corresponding sliding window position and big when preserving training gesture picture Small, the gesture class label is the gesture-type information for including trained gesture；

S3a. use the simple threshold values skin color segmentation algorithm based on YCbCr color spaces, to training gesture picture background into Row filtering forms the gesture training picture of black background；

S4a. the gesture training picture and corresponding calibration frame position, size and gesture class label of black background are utilized Composition data collection；

In the present embodiment, step S1a-S4a is for obtaining related gesture data and by step S4a composition data collection.

Fig. 2 show the picture containing training gesture described in step S1a, can be the picture of static state, can also be Dynamic video pictures.In sliding window such as Fig. 2 shown in the part of label 1, it is preferable that sliding window is a rectangular window.It is adjusting manually Whole or automatically control down, the size of sliding window can adjust, and the position of sliding window can also change, that is, sliding window can be in picture It is moved in range.Under the mode automatically controlled, the size of sliding window can change or change according to predetermined rule, sliding window at random Move mode in picture can refer to the move mode of convolution kernel in existing CNN networks so that sliding window can traverse entirely Picture.

In step S2a, saved picture included in sliding window as training gesture picture, while recording corresponding Demarcate frame position and size and gesture class label.It is equivalent to cut out the part included by sliding window box and is used as training Gesture picture.The position of the calibration frame and size refer to position and size of the sliding window in picture.The gesture class label Include the gesture-type information of trained gesture, that is, trainer's input, indicate gesture shown by trained gesture picture Type, such as " stone ", " scissors " and " cloth ", gesture-type shown in Fig. 2 are " scissors " gesture.

In step S3a, the background parts of training gesture picture are filtered, can get rid of background parts, Can be that background parts are substituted for other colors.

Step S1a-S3a is performed a plurality of times.It executes every time, trainer can convert different gesture-types, sliding window is moved Different positions is moved, or is adjusted to different sizes；Trainer can also adjust its hand at a distance from camera, camera shooting The imaging focal length etc. of head, so that its gesture has different imaging effects in picture；Trainer can also move its hand Position so that when preserving training gesture picture, trainer is done, and gesture is completely or partially included in sliding window, regardless of Sliding window is moved to any position of picture, and the training gesture that trainer makes can obtain most when being included completely in sliding window Good training effect.By the combination of the above different variables, a large amount of training gesture area pictures, every trained gesture area can be obtained Domain picture is all corresponding with calibration frame position, size and gesture class label.

For multiple the training gesture area pictures being performed a plurality of times obtained by step S1a-S3a, can also continue to be overturn, It cuts and pictures processing work, these picture processing works such as calibration will generate new training gesture area picture, these are new Training gesture area picture composition data collection, Ke Yiti together with multiple training gesture area pictures obtained by step S1a-S3a The diversity of high data set enhances training effect.

By these training gesture area pictures and corresponding calibration frame position, size and gesture class label in step S4a Composition data collection.

It is further used as preferred embodiment, the step S3a specifically includes following steps：

Training gesture picture obtained by step S1a also includes background portion other than the gesture done comprising trainer Point, and generally there are the background excessively single and poor problem of picture diversity, directly use it for training, be susceptible to Fitting.Therefore, it is possible to use step S3a handles training gesture picture.In step S3a, using empty based on YCbCr colors Between skin color segmentation model to training gesture picture carry out binary conversion treatment, realization the colour of skin of different threshold values is split, can To be classified as training gesture area part and background parts.For background parts, difference can also be not processed or carried out The filtering of degree, in this way, multiple training gestures with identical trained gesture area part, different background part can be obtained Picture increases the diversity of picture.

It is further used as preferred embodiment, the step S31a specifically includes following steps：

The rgb value of each pixel can be converted to YUV values by following formula first in training gesture picture：

Then by the conversion of YUV values and YCrCb values, YCrCb values can just be obtained.Y value in YCrCb values indicates brightness, And gesture picture is trained to be influenced very little by brightness, therefore Y value can be ignored, only consider Cr values and Cb values.

It is further used as preferred embodiment, the step S312a is specially：

It is illustrated in figure 3 a trained gesture picture, is illustrated in figure 4 obtained training after executing step S3b and S4b Gesture area picture.

Embodiment 2

The present embodiment gesture identification method can be used through 1 collected gesture data collection of embodiment, use YOLO networks To train gesture identification model.

S1b. gesture number to be trained independently is acquired using the step S1a-S4a of picture collection method described in embodiment 1 According to collection；

S2b. the gesture identification using data set for YOLO networks is trained to obtain gesture identification model；

Step S1b-S2b is the training method to YOLO networks.Step S3b-S5b is with trained YOLO networks mould Type carries out gesture identification.

It is further used as preferred embodiment, in the step S2b, uses the carry out gesture identification of YOLO networks and fixed The model training of position.

It for the training of YOLO networks, can be carried out using the prior art, it is preferable that the ginseng of training YOLO Web vector graphics Number is：Using stochastic gradient descent method, learning rate 0.1, weight decays to 0.0005, momentum 0.9,40000 knots of iteration Shu Xunlian.

It is further used as preferred embodiment, the step S5b is specially to know using the YOLO network handles after training Other picture carries out gesture identification, and the step S5b specifically includes following steps：

In the gesture applied to identification video pictures, actually certain frames in video pictures are identified, by During video capture, it is fuzzy to be easy the image caused by hand exercise or imaging are unintelligible etc., if only for video A wherein frame for picture is individually identified, and it is incorrect to be easy to cause identification.

In order to improve the accuracy for the gesture identification for being directed to video pictures, the knowledge to continuous multiple frames picture can be considered Not as a result, to determine the recognition result to wherein a certain frame picture.

Before executing step S51b, obtains and moment t is determined_i-1Frame gesture identification result.

In step S51b, in order to moment t_iFrame carry out gesture identification, can be with continuous acquisition moment t_iAt the time of before t_i-1、t_i-2And t_i-3Corresponding frame.Then this 4 frames are input in YOLO networks and are identified, export 4 knowledges undetermined Other result.Using weighted sum judgment method, weight is assigned to this 4 recognition results undetermined, and according to according to weighted sum knot Fruit determines moment t_iGesture identification result.

It is further used as preferred embodiment, the weighted sum judgment method specifically includes：

Gesture-type X can be " scissors ", " stone " or " cloth " etc..And according to weighted sum resultValue, come true It is fixed with it is identified in this identification process at the time of t_iCorresponding recognition result undetermined, or with institute in last time identification process T at the time of identifying_i-1Gesture identification result as it is required at the time of t_iGesture identification result.

Embodiment 3

A kind of gesture identifying device based on YOLO, including：

Memory, for storing at least one program；

Processor, for loading at least one program to execute a kind of gesture knowledge based on YOLO described in embodiment 2 Other method.

It is to be illustrated to the preferable implementation of the present invention, but the implementation is not limited to the invention above Example, those skilled in the art can also make various equivalent variations or be replaced under the premise of without prejudice to spirit of that invention It changes, these equivalent deformations or replacement are all contained in the application claim limited range.

Claims

1. a kind of gesture data collection acquisition method for YOLO networks, which is characterized in that include the following steps：

S1a. the picture containing training gesture is obtained, makes that gesture is trained to be included in sliding window；The sliding window can be in picture for one The window moved in range, the position of the sliding window and size can adjust；

S2a. Image Acquisition window picture is saved as into trained gesture picture, at the same record corresponding calibration frame position and size with And gesture class label；Described image acquisition window picture is the picture in sliding window, and the calibration frame position and size are to preserve Corresponding sliding window position and size, the gesture class label are the gesture-type for including trained gesture when training gesture picture Information；

S3a. the simple threshold values skin color segmentation algorithm based on YCbCr color spaces is used, training gesture picture background was carried out Filter, the gesture to obtain diversity background train picture；

S4a. the gesture training picture and corresponding calibration frame position, size and gesture class label group of diversity background are utilized At gesture data collection.

2. a kind of gesture data collection acquisition method for YOLO networks according to claim 1, which is characterized in that described Step S3a specifically includes following steps：

S31a. the skin color segmentation model based on YCbCr color spaces is used to carry out binary conversion treatment to training gesture picture, to Each pixel in training gesture picture is classified as colour of skin point or non-colour of skin point respectively；

S32a. the part being made of all colour of skin points is determined as training gesture area part, will be made of all non-colour of skin points Part be determined as background parts.

3. a kind of gesture data collection acquisition method for YOLO networks according to claim 2, which is characterized in that described Step S31a specifically includes following steps：

S311a. according to the rgb value of each pixel in training gesture picture, each pixel in trained gesture picture is calculated Cb values and Cr values；

S312a. according to training gesture picture in each pixel Cb values and Cr values, by each pixel be classified as respectively colour of skin point or Non- colour of skin point.

4. a kind of gesture data collection acquisition method for YOLO networks according to claim 3, which is characterized in that described Step S312a is specially：

Whether the Cb values and Cr values of each pixel meet in training of judgement gesture pictureIf so, will correspond to Pixel is classified as colour of skin point, conversely, corresponding pixel points are classified as non-colour of skin point.

5. a kind of gesture identification method based on YOLO, which is characterized in that include the following steps：

S1b. gesture data to be trained independently is acquired using any one of the claim 1-4 gesture data collection acquisition methods Collection；

S2b. the gesture identification using gesture data collection for YOLO networks is trained, to obtain gesture identification model；

S3b. picture to be identified is captured by photographic device, is calculated using the simple threshold values skin color segmentation based on YCbCr color spaces Method carries out filtering background to picture to be identified；

S4b. real-time gesture class prediction is carried out to picture to be identified using trained gesture identification model and gesture positions；

6. a kind of gesture identification method based on YOLO according to claim 5, which is characterized in that the YOLO networks make Iterations with stochastic gradient descent method, the YOLO networks are 40000 times.

7. a kind of gesture identification method based on YOLO according to claim 5 or 6, which is characterized in that the step S5b Specifically include following steps：

S51b. picture to be identified is acquired in moment t_iAnd moment t_iT at the time of before_i-1、t_i-2And t_i-3Corresponding frame, Middle i is the serial number at moment；

S52b. each frame is input to respectively and is identified using in the YOLO networks after training, to output time t_i、t_i-1、t_i-2 And t_i-3Corresponding recognition result undetermined；

S53b. weighted sum judgment method is utilized, summation is weighted to each recognition result undetermined, is asked to obtain weighting With as a result, according to weighted sum as a result, judging moment t_iGesture identification result.

8. a kind of gesture identification method based on YOLO according to claim 7, which is characterized in that the weighted sum is sentenced Disconnected method specifically includes：

Each recognition result undetermined is denoted asWherein, i is the serial number at corresponding moment；

Equalization result is calculated using following formula：Wherein, X is gesture-type mark, and i is The serial number at corresponding moment, k are summation serial number,For weighted sum result；

IfThen with moment t_iT at the time of corresponding recognition result undetermined is as required obtain_iGesture identification as a result, Conversely, t at the time of previously to acquire_i-1Gesture identification result as it is required at the time of t_iGesture identification result.

9. a kind of gesture identifying device based on YOLO, which is characterized in that including：

Memory, for storing at least one program；

Processor requires any one of 5-8 described a kind of based on YOLO's for loading at least one program with perform claim Gesture identification method.