CN110147764A

CN110147764A - A kind of static gesture identification method based on machine learning

Info

Publication number: CN110147764A
Application number: CN201910422669.2A
Authority: CN
Inventors: 林丽媛; 陈静瑜; 刘冠军; 程自祥; 周卫斌; 许国; 申川
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2019-05-17
Filing date: 2019-05-17
Publication date: 2019-08-20

Abstract

The present invention discloses a kind of gesture identification method based on machine learning.The system is handled and is classified based on computer vision, by the gesture figure of acquisition, and the classification and identification of gesture are carried out with SVM support vector machines.Detailed process are as follows: gesture cromogram is acquired by colour imagery shot, acquisition picture is divided into training set and test set, by Digital Image Processing, reduces interference of other pixels to gesture identification；Then the Hu square of treated gesture figure is improved into the accuracy rate and speed of gesture identification in conjunction with HOG feature；The gesture figure of acquisition process is divided into training set and test set by the classification and identification of SVM support vector machines progress gesture, is finally completed fast and accurately gesture identification.The method of the present invention is easy, at low cost, has a wide range of application, and to realize that the accuracy of gesture control provides good basis, can be applied to human-computer interaction, with lottery a variety of recreations such as guess.

Description

A kind of static gesture identification method based on machine learning

Technical field

The present invention relates to computer vision fields, are a kind of static gesture identification methods based on machine learning.

Background technique

The currently most popular direction of computer vision field is living things feature recognition, by video or image to personnel identity It is identified, including recognition of face, fingerprint recognition, personal recognition, hand vein recognition, movement posture identification, gesture identification etc..Its Middle movement posture identification technology is related to computer vision, pattern-recognition etc..Action recognition can be applicable to medical treatment, security protection, Xue Shengjiao It educates, the various aspects such as virtual reality, augmented reality.

Language is the communication way of the mankind, and people are using language mainly by limbs behavior and text two ways；And people Limbs behavior is the principal mode that people use language, including oral account sound, gesture and expression.Not with information technology Disconnected development, people also starts to imitate the communication way of language when interacting with a computer, on computer vision development foundation Also there is speech recognition, touch recognition, gesture identification, Expression Recognition and mouse-keyboard in the man-machine interaction mode taken advantage of a situation and given birth to Etc. several identification methods.Gesture identification has extremely strong information representation and transmitting as one of most natural expression way of people Function.For example, a new-energy automobile released in the recent period is with the touch interaction for controlling screen in Gesture Recognition substitution；It is emerging Smart home with the household goods such as gesture control curtain, lamp.

Gesture generally can be divided into static gesture and dynamic gesture, and static gesture refers in the case of hand is motionless, finger and hand The palm makes different movement and shape；Dynamic gesture then refers to the variation with the time, what the location and shape of hand all changed Gesture, and dynamic gesture can express operator's intention richerly.The algorithm of gesture identification mainly has 3 kinds: the side of Dynamic Programming The method of method, the method for vector quantization (Vector Quantization) and hidden markov model (HMM).

Summary of the invention

The present invention proposes to identify based on machine learning static gesture, using computer vision technique, by target detection HOG feature extraction simultaneously extracts gesture feature in conjunction with Hu square, the preferable accuracy rate and speed for improving gesture identification.

A kind of Image Description Methods for realizing human body target detection of HOG feature, this feature is by calculating and statistical picture office The gradient orientation histogram in portion region carrys out constitutive characteristic.Its advantage is that can be kept well not to geometry and optical deformation Deformation has very strong robustness to the variation of environment.The main thought of this feature is: in image the presentation of localized target and Character can be described well this (essence: the statistical information of gradient, and gradient is mainly deposited by the direction density of gradient or edge It is the place at edge).In actual operation, it divides the image into small cell factory (cells), each cell factory calculates One gradient direction (or edge direction) histogram.In order to have better invariance to illumination and shade, need to histogram into The normalization of row contrast, can be by forming bigger block (blocks) for cell factory and normalizing all cells in block Unit is realized.Normalized block descriptor is called HOG description by us.It will test all pieces of the HOG description in window Subgroup is formed final feature vector altogether

The main technical step of the present invention realizes process are as follows:

1): acquisition device is fixed on to the level height of certain position, it is ensured that camera facilitates acquisition, this test Choosing height is the proper height that 1-1.5m is acquired as gesture.Multiple gesture cromograms are acquired, and select collection effect good Gesture picture carry out gray proces.

2): for tested personnel in the multi collect for carrying out different gestures apart from camera 0.5-1m, acquiring gesture includes holding Fist and the gesture for indicating 1,2,3,4,5,6,7,8,9 different digitals, totally 10 kinds of movements.

3): different gesture pictures being kept into progress gray processing processing first, according to each section gray value difference ash Degree image is converted into bianry image.

4): there are speckle noises at binary picture edge, pass through gaussian filtering, the morphological images such as corrosion expansion and closed operation Processing, the influence of noise on image post-processing is reduced with this.

5): carrying out drafting profile, using the bianry image obtained after image smoothing as input picture, use OpenCV In function calculate profile and in the black background under equal pixel using function draw profile.

6): images of gestures is divided into training set and test set two parts.

7): using 10 kinds of different classes of gesture motion images as training set, by Hu square in conjunction with HOG feature, using SVM The characteristic being calculated is classified and is identified, optimal network classifier is obtained.

8): classification processing being carried out to test set image using classifier, finally obtains classification recognition result.

Compared with the prior art, the present invention has the following advantages:

One, the degree of freedom of motion is high, and gathered person is only apprised of general manner of execution, and movement is voluntarily understood and shown, and Position angle is free, can freely play within the scope of certain space；

Two, gesture identification is carried out using the different HOG feature of each image.Finally by merging Hu square and HOG feature, To train SVM.It is compared with the complexity of problem, the sample number that SVM algorithm requires is fewer than other algorithms.What SVM was solved asks Topic and sample dimension are incoherent.And Structural risk minization.Itself also have non-linear, refers to SVM in sample data It also can be good at solving the problems, such as when linearly inseparable.It is by slack variable or to be called punishment variable and core letter For number come what is realized, this is also the core of SVM；

Three, this invention program is simple, and fast response time, robustness is preferable, can make to identify that gesture information is being surveyed in real time The accuracy rate tried in data is high, and can more clearly obtain the outer profile of hand.

Detailed description of the invention

Preferably to illustrate techniqueflow of the invention, simply it is situated between underneath with some attached drawings to technology It continues.

Fig. 1 is that machine learning gesture identification flow diagram is based in the present invention；

Fig. 2 is obtained images of gestures after binary conversion treatment；

Fig. 3 is gesture profile diagram；

Fig. 4 is HOG characteristic pattern

Specific embodiment

The present invention is described with reference to the accompanying drawing.

The flow chart gesture recognition system based on deep learning as shown in Figure 1, mainly includes the following steps that.

Step 1: in the station acquisition colour images of gestures apart from acquisition camera 0.5-1m, choosing acquisition gesture clearly Front photochrome inputs PC machine.

Step 2: colored gesture picture being subjected to gray processing processing, color point is become into gray scale point methods mainly there is several Kind:

(1) floating-point arithmetic: Gray=0.299R+0.587G+0.114B

(2) integer method: Gray=(R*30+G*59+B*11)/100

(3) displacement method: Gray=(R*28+G*151+B*77) > > 8

(4) mean value method: Gray=(R+G+B)/3

(5) green is only taken: Gray=G

After any one method of five kinds of methods above finds out Gray, the R in original image, G, B are replaced with Gray, New color (gray, gray, gray) is converted to, the image after the completion of replacing is exactly grayscale image.

Step A3: treated grayscale image is since its each section gray value is different, and segmented image is according to different gray scale meetings There is clearly demarcated edge to carry out binary conversion treatment, binaryzation is i.e. greyscale image transitions at bianry image.All gray scales are greater than or wait It is judged as belonging to certain objects in the pixel of threshold value, gray value is 255 expressions, and otherwise these pixels are excluded in object Other than body region, gray value 0 indicates the object area of background or exception, that is, whole image is showed significantly Only black and white visual effect.Binaryzation effect picture is as shown in Figure 2.

Step A4: after binary conversion treatment, find the edge of images of gestures and unsmooth, there are some tiny burrs.Make So that edge is become smooth with gaussian filtering, that is, uses each pixel convolution of convolution kernel and input picture, the image after filtering Pixel value is exactly the sum of the convolution of each pixel.Since pixel value difference of the true picture in space changes inviolent, neighbor point Pixel value difference it is little, however random two pixel differences variation can it is very violent, be based on this point, gaussian filtering exists Noise is reduced under conditions of stick signal, but it is invalid at edge for this method, can polish edge.

Step A5: by treated, images of gestures carries out morphological image process, and Morphological scale-space mainly passes through burn into Then expansion and closed operation three step process carry out subtracting operation, reduce the influence of noise on image quality.

The effect of corrosion is the boundary point for eliminating object, makes shrinking of object, and the boundary pixel value of object is that have 0 and 1, rotten These can all become 0 close to the pixel for 1 after erosion operation, so etching operation can eliminate nonsensical point or hole, Boundary is allowed internally to shrink.With corrosion effect on the contrary, expansion effect certainly increase target, fill image in it is micro- Small holes, and the boundary of smooth object, and boundary can be expanded outwardly.Closed operation is exactly first to expand post-etching to image.It Effect be the hole filled up in image, make close boundary the latter's object connect it is upper, can be with while do not change size Smooth boundary.

Step A6: carrying out drafting profile, using the bianry image obtained after image smoothing as input picture, uses Function in OpenCV calculates profile and draws profile using function in the black background under equal pixel.It is illustrated in figure 3 Gesture profile diagram after description.

Step A7: finding out Hu square using profile diagram, if profile is printed in a program, as shown by the equation:

Contour line composed by pixel continuous in this way obtains Hu square by these data.The Hu square phase not to the utmost of each gesture Together, different gestures is thus distinguished, carrys out Training Support Vector Machines SVM in conjunction with HOG feature, if Fig. 4 is HOG characteristic pattern.

HOG characteristic principle are as follows:

1) gray processing, because HOG feature extraction is textural characteristics, colouring information does not work, so existing by cromogram Switch to grayscale image；

2) it normalizes, in order to improve detector to the robustness of the disturbing factors such as illumination, needs to carry out the school Gamma to image Just, to complete normalization to whole image, it is therefore an objective to adjust the contrast of image, reduce local light shine and shade caused by Influence, while the interference of noise can also be reduced；(when r takes 1/2, the value range of pixel just transforms to 0- from 0-255 15.97)。

3) it calculates the gradient of image pixel: the horizontal direction and vertical direction of each pixel is calculated according to the following equation Gradient, and calculate gradient magnitude and the direction of each location of pixels.Horizontal direction of the image at pixel (x, y) and vertical The gradient in direction are as follows:

G_x(x, y) and G_y(x, y) respectively indicates the horizontally and vertically gradient value at current pixel point (x, y).It connects Getting off, we calculate gradient magnitude and gradient direction at pixel (x, y): the gradient magnitude and gradient direction at place:

SVM be training machine and allow it learn a kind of algorithm, be able to solve recurrence and classification problem, and can be used Kernel function carrys out change data, later after converting based on obtained information, found among possible output one it is optimal Boundary or hyperplane.In simple terms, very complicated data conversion is exactly done, later according to predefined label or output To calculate the data for how separating input.

Step A8: finally being classified and identified using the characteristic that SVM is calculated, recognition success rate of the invention Height, training time significantly reduce.

Its confidence risk is related with two amounts, first is that sample size, it is clear that given sample size is bigger, and learning outcome is got over It is possible that correct, confidence risk is smaller at this time；Second is that the VC of classification function is tieed up, it is clear that VC dimension is bigger, and Generalization Ability is poorer, sets Trade wind can nearly become larger.

The formula of extensive error bounds are as follows:

R (w) is exactly real risk in formula, is exactly empiric risk, is exactly confidence risk.The target of statistical learning is from experience Risk minimization has become seeking empiric risk and confidence risk and minimum, i.e. Structural risk minization.SVM is exactly such one Kind makes great efforts to minimize the algorithm of structure risk.

Finally, it is stated that above step is only used to illustrate the technical scheme of the present invention and not to limit it, but art technology Personnel should be appreciated that can make corresponding change to it in the form and details, but the change carried out, should not make phase The essence of technical solution is answered to be detached from the spirit and scope of technical solution of the present invention.

Claims

1. a kind of gesture identification method based on machine learning, which is characterized in that based on computer vision and machine learning, Gesture cromogram is acquired by colour imagery shot, acquisition picture is divided into training set and test set, it, will by Digital Image Processing Hu square and HOG feature R. concomitans SVM algorithm of support vector machine carry out the classification and identification of gesture.

2. gesture motion training set establishes part as described in the appended claim 1, will be moved by the collected human body of colour imagery shot The data that gesture are stored in PC machine with the format of image, then by image preprocessing, are classified to acquisition data, special Sign is:

Step A1, the foundation part of the action training collection predominantly acquire data, save data, complete gesture motion library It establishes；

Step A2, by acquired image data, after image preprocessing, gesture profile diagram；

Step A3, by treated, image data is put into different files by the classification of motion, deletes the inferior image in part, unified Picture is renamed as the filename containing label value, generates lmdb formatted file by all kinds of quantity.

3. Hu square as described in claim 1 has scale, rotation, translation invariance, it is adapted to match, and identifies speed Degree is fast, and Hu square is using second order and three rank centers away from constructing seven not bending moments.By Hu square in conjunction with HOG feature, it is special to extract image Reference breath improves the recognition accuracy of machine training.

4. SVM support vector machines as described in claim 1 be training machine and allow it learn a kind of algorithm, be able to solve It returns and classification problem, and kernel function can be used and carry out change data, later based on the information obtained after converting, can An optimal boundary or hyperplane are found among the output of energy.Using the characteristic that SVM is calculated carry out classification and Identification, recognition success rate of the invention is high, and the training time significantly reduces.