CN109190496A

CN109190496A - A kind of monocular static gesture identification method based on multi-feature fusion

Info

Publication number: CN109190496A
Application number: CN201810900949.5A
Authority: CN
Inventors: 周智恒; 许冰媛
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-08-09
Filing date: 2018-08-09
Publication date: 2019-01-11

Abstract

The invention discloses a kind of monocular static gesture identification method based on multi-feature fusion, step includes: images of gestures acquisition, includes the RGB image of gesture with monocular cam acquisition；Image preprocessing, skin color segmentation is carried out using human body complexion information, using the geometrical characteristic of Morphological scale-space and combination hand, hand is separated with complex background, the positioning centre of the palm is operated by range conversion and removes arm regions existing for hand, obtains gesture bianry image；Gesture feature extracts, and calculates the perimeter and area ratio, Hu square and Fourier descriptor feature of gesture, constitutes gesture feature vector；Gesture identification, input gesture feature vector training BP neural network, realizes static gesture classification.The geometrical characteristic of present invention combination Skin Color Information and hand realizes accurate Hand Gesture Segmentation under monocular vision using Morphological scale-space and range conversion operation；By combination various gestures feature and training BP neural network, the high gesture classifier of strong robustness, accuracy rate is obtained.

Description

A kind of monocular static gesture identification method based on multi-feature fusion

Technical field

The present invention relates to field of image recognition, and in particular to a kind of monocular static gesture based on multi-feature fusion identification side Method.

Background technique

Gesture as it is a kind of from however intuitive interactive mode, gradually development is the research heat of field of human-computer interaction Point, and be widely used in terms of somatic sensation television game, robot control, computer.Compared to the hand based on data glove Gesture identification technology, the Gesture Recognition of view-based access control model has many advantages, such as low for equipment requirements, interaction nature, and becomes gesture and know Other main way.

Hand Gesture Segmentation is the key link in the gesture identification of view-based access control model, the influential effect of segmentation feature extraction, into And influence gesture classification result.In the static gesture identification method based on monocular vision, due to the influence of complex background environment, The result of Hand Gesture Segmentation is not satisfactory.With the appearance of kinect camera, depth information is used for gesture and separates with complex background Research in.It due to kinect camera higher cost, is not widely used, therefore such gesture identification method is unable to get popularization Using.The gesture feature that existing monocular static gesture identification method uses is relatively simple, leads to the robust of gesture recognition system Property is weaker, and recognition accuracy is not high.It is therefore proposed that a kind of can accurately divide under complex background and identify that the monocular of gesture is quiet State gesture identification method is current letter problem to be solved.

Summary of the invention

The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of list based on multi-feature fusion Mesh static gesture identification method.

The purpose of the present invention can be reached by adopting the following technical scheme that:

A kind of monocular static gesture identification method based on multi-feature fusion, the recognition methods include:

Images of gestures acquisition step includes the RGB image of gesture with monocular cam acquisition；

Image preprocessing step, using human body complexion information carry out skin color segmentation, by image the colour of skin and class colour of skin area Domain extracts, and using the geometrical characteristic of Morphological scale-space and combination hand, hand is separated with complex background, passes through distance and becomes It changes the operation positioning centre of the palm and removes arm regions existing for hand, obtain gesture bianry image；

Gesture feature extraction step calculates the perimeter and area ratio, Hu square and Fourier descriptor feature of gesture, constitutes Gesture feature vector；

Gesture identification step passes through training BP nerve using the gesture feature vector of extraction as the input of BP neural network The classification of network implementations static gesture.

Further, the image preprocessing step includes:

The image of input is carried out the conversion of color space by skin color segmentation, is divided by Chroma threshold by the skin in image Color and class area of skin color extract, and obtain bianry image；

Morphological scale-space is carried out Morphological scale-space to the bianry image after skin color segmentation, is first operated using opening operation, then adopt It is operated with closed operation, eliminates the isolated noise in image；

The segmentation of hand geometry realizes that face and hand are separated with complex background, calculates remaining two connected regions The shape complexity C in domain extracts hand bianry image by judging with threshold value T；

By the arm removal in hand bianry image, gesture bianry image is obtained.

Further, the image by input carries out the conversion of color space, is divided by Chroma threshold by image In the colour of skin and class area of skin color extract, it is as follows to obtain bianry image process:

The image of input is transformed into YCr ' Cb ' color space, conversion formula by rgb color space by convert color spaces It is as follows:

Y=0.299 × r+0.587 × g+0.114 × b

Wherein r, g, b are respectively image three components of the red, green, blue in rgb color space, and y, cr', cb' are respectively Luminance component, red chrominance component, chroma blue component of the image in YCr ' Cb ' color space；

Chroma threshold segmentation, if two chromatic components of pixel meet the colour of skin in the threshold value model of cr' and cb' component simultaneously When enclosing, enable the pixel value be 1, otherwise value be 0, thus by image the colour of skin and class area of skin color extract, obtain To bianry image.

Further, hand geometry segmentation realizes that face and hand are separated with complex background, calculates surplus Under two connected regions shape complexity C, by with threshold value T judge, extract hand bianry image process it is as follows:

Area screening calculates the area of different connected regions in image, it is larger to extract area using eight connectivity distinguished number Two connected regions, realize separating for face and hand and complex background；

Shape complexity threshold decision calculates the shape complexity C of remaining two connected regions, if the shape of connected region Shape complexity C value is greater than threshold value T, then it is assumed that the region is non-hand region and removes, to obtain hand bianry image.

Further, the arm by hand bianry image removes, and it is as follows to obtain gesture bianry image process:

Centre of the palm positioning, is operated using range conversion, calculates minimum range of the hand pixel from hand boundary, and distance value takes For preimage vegetarian refreshments value, except remaining region value of hand is 0, the maximum picture of value in the image obtained after range conversion operation Vegetarian refreshments is the centre of the palm, and corresponding value is R₀；

Palm cutting will be less than R at a distance from the centre of the palm₁Pixel value be 0, to remove palm area, wherein R₁ =1.35 × R₀；

Threshold method judges that arm whether there is, and positions the pixel P of maximum value in image, and corresponding value is P_value, Calculate P_value/R₀If value is greater than threshold value T₁, then P point region is arm regions, continues next removal arm behaviour Make, arm regions are otherwise not present in hand bianry image, go to gesture feature extraction step；

Arm is removed, using eight connectivity distinguished number, removes P point region；

XOR operation finally obtains gesture bianry image using the XOR operation between image.

Further, the gesture feature extraction step includes:

The 7 invariant moments of gesture are calculated, Hu moment characteristics are constituted；

Using eight connectivity distinguished number, the perimeter and area of gesture area are calculated, calculates the ratio of perimeter and area；

Calculate the Fourier descriptor feature of gesture profile；

Combine Hu moment characteristics, perimeter and area ratio and Fourier descriptor feature, constitute 18 dimension gesture feature to Amount.

Further, the Fourier descriptor characteristic procedure of the calculating gesture profile is as follows:

By the coordinate { (x of gesture contour edge_k,y_k) complex representation is used, constitute sequence of complex numbers { c_k, c_kIt is expressed as follows:

c_k=x_k+iy_k, k=0,1,2 ..., N-1；

To discrete series { c_kMake Fourier transformation, Fourier coefficient sequence { C (u) } is obtained, formula is as follows:

10 Fourier coefficients that u=1 starts are extracted, to its modulus value and are normalized, Fourier descriptor feature is constituted.

Further, the BP neural network includes input layer, hidden layer and output layer, and input layer has d neuron, It is determined by the dimension of gesture feature vector, output layer has s neuron, is determined by gesture species number, and hidden layer has q nerve Member, the connection weight between h-th of neuron of i-th of neuron of input layer and hidden layer are v_ih, h-th of neuron of hidden layer with Connection weight between j-th of neuron of output layer is w_hj, the threshold value of h-th of neuron of hidden layer isJ-th of output layer The threshold value of neuron is θ_j。

Further, before the gesture identification step, further includes:

BP neural network training step inputs the gesture feature vector training BP neural network of training sample, and process is as follows:

Random initializtion weight and threshold value, the value range for initializing weight is [- 1,1], the value model of initial threshold value It encloses for [- 0.5,0.5]；

Input the gesture feature vector (x of training sample₁,x₂,...,x₁₈)；

Calculate the output data of each layer, wherein the BP neural network is using sigmoid function as each layer neuron Activation primitive, formula is as follows:

The output valve of h-th of neuron of hidden layer is α_h, calculation formula is as follows:

The output valve of j-th of neuron of output layer isCalculation formula is as follows:

Mean square error E is calculated, calculation formula is as follows:

Wherein (y₁,y₂,...,y₈) be training sample class label；

Parameter updates, and when E is greater than setting error, updates the weight and threshold value of network, using gradient descent method with amendment Current BP neural network；When E is less than setting error, deconditioning network obtains optimal model parameter.

Further, the calculation formula of the shape complexity C is as follows:

Wherein, A is the area of connected region, and p is the perimeter of connected region.

The present invention has the following advantages and effects with respect to the prior art:

(1) present invention utilizes the Extraction of Geometrical Features hand region of human body complexion information and hand, is grasped using range conversion It removes existing arm regions, realizes being precisely separating for gesture and complex background；

(2) present invention realizes strong robustness, a standard by combining a variety of effective gesture feature training BP neural networks The high gesture recognition system of true rate；

(3), identification low with equipment cost that the present invention is based on the static gesture identification methods of common monocular cam is accurately The advantages that rate is high, easy to promote and utilize.

Detailed description of the invention

Fig. 1 is the flow chart of monocular static gesture identification method based on multi-feature fusion disclosed in the present invention；

Fig. 2 is image preprocessing stream in monocular static gesture identification method based on multi-feature fusion disclosed in the present invention Cheng Tu；

Fig. 3 is that gesture feature extracts in monocular static gesture identification method based on multi-feature fusion disclosed in the present invention Flow chart；

Fig. 4 is gesture identification process in monocular static gesture identification method based on multi-feature fusion disclosed in the present invention Figure.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Embodiment

As shown in Figure 1, a kind of monocular static gesture identification method based on multi-feature fusion, process are as follows: gesture figure As acquisition step, image preprocessing step, gesture feature extraction step and gesture identification step.

Wherein, S1, images of gestures acquisition step:

It include the RGB image of gesture using monocular cam acquisition, the monocular cam should be located at immediately ahead of human body Position, face and hand are biggish two of area in all colours of skin and class area of skin color in the image of acquisition.

Wherein, S2, image preprocessing step:

As shown in Fig. 2, image preprocessing step process is as follows:

S201, skin color segmentation, detailed process is as follows:

The image of input is transformed into YCr ' Cb ' color space by rgb color space by S2011, convert color spaces, tool Body conversion formula is as follows:

Y=0.299 × r+0.587 × g+0.114 × b

Wherein r, g, b are respectively image three components of the red, green, blue in rgb color space；Y, cr', cb' are respectively Luminance component, red chrominance component, chroma blue component of the image in YCr ' Cb ' color space.

S2012, Chroma threshold segmentation, if two chromatic components of pixel meet the colour of skin in cr' and cb' component simultaneously When threshold range, enable the pixel value be 1, otherwise value be 0, thus by image the colour of skin and class area of skin color extract Come, obtains a secondary bianry image.

S202, Morphological scale-space, detailed process is as follows:

Morphological scale-space is carried out to the bianry image after skin color segmentation, is first operated using opening operation, then is grasped using closed operation Make, a large amount of isolated noise in image can be eliminated.

S203, the segmentation of hand geometry, detailed process is as follows:

S2031, area screening calculate the area of different connected regions in image, extract face using eight connectivity distinguished number Biggish two connected regions of product realize separating for face and hand and complex background.

S2032, shape complexity threshold decision calculate the shape complexity of remaining two connected regions, if connected region The shape complexity C value in domain is greater than threshold value T, then it is assumed that the region is non-hand region and removes, to obtain hand two-value Image, the present invention take T=0.3 best, shape complexity C's specific formula is as follows:

S204, arm removal, specific practice are as follows:

S2041, centre of the palm positioning, are operated using range conversion, calculate minimum range of the hand pixel from hand boundary, away from Replace preimage vegetarian refreshments value from value, except remaining region value of hand is 0, value is most in obtained image after range conversion operation Big pixel is the centre of the palm, and corresponding value is R₀。

S2042, palm cutting, will be less than R at a distance from the centre of the palm₁Pixel value be 0, to remove palm area, The present invention takes R₁=1.35 × R₀Most preferably.

S2043, threshold method judge that arm whether there is, and position the pixel P of maximum value in image, and corresponding value is P_value, calculate P_value/R₀If value is greater than threshold value T₁, then P point region is arm regions, need to carry out removing arm behaviour Make, arm regions are otherwise not present in hand bianry image, can be directly used for extracting gesture feature, the present invention takes T₁=0.35 most It is good.

S2044, arm is removed, using eight connectivity distinguished number, removes P point region.

S2045, XOR operation finally obtain gesture bianry image using the XOR operation between image.

Wherein, S3, gesture feature extraction step:

As shown in figure 3, extracting gesture feature to gesture bianry image, comprising the following steps:

S301, the 7 invariant moments for calculating gesture constitute Hu moment characteristics.

S302, using eight connectivity distinguished number, calculate the perimeter and area of gesture area, calculate the ratio of perimeter and area Value.

S303, the Fourier descriptor feature for calculating gesture profile, specific practice are as follows:

S3031, the coordinate { (x by gesture contour edge_k,y_k) complex representation is used, constitute sequence of complex numbers { c_k, c_kSpecifically It is expressed as follows:

c_k=x_k+iy_k, k=0,1,2 ..., N-1；

S3032, to discrete series { c_kMake Fourier transformation, Fourier coefficient sequence { C (u) } is obtained, specific formula is such as Under:

S3033,10 Fourier coefficients that u=1 starts are extracted, to its modulus value and normalized, constitute Fourier's description Subcharacter.

S304, the ratio and Fourier descriptor feature for combining Hu moment characteristics, perimeter and area, the gesture for constituting 18 dimensions are special Levy vector.

Wherein, S4, gesture identification step:

The present invention is realized using the gesture feature vector of extraction as the input of BP neural network by training BP neural network Static gesture classification.The BP neural network includes input layer, hidden layer and output layer, and input layer has d neuron, by hand The dimension of gesture feature vector determines that output layer has s neuron, determined by gesture species number, and hidden layer has q neuron, defeated Entering the connection weight between h-th of neuron of i-th of neuron of layer and hidden layer is v_ih, h-th of neuron of hidden layer and output Connection weight between j-th of neuron of layer is w_hj, the threshold value of h-th of neuron of hidden layer isJ-th of nerve of output layer The threshold value of member is θ_j.The present invention chooses d=18, q=10, s=8, and the present invention realizes the classification of 8 kinds of static gestures.

As shown in figure 4, before gesture identification step, further includes:

BP neural network training step inputs the gesture feature vector training BP neural network of training sample, detailed process It is as follows:

I. random initializtion weight and threshold value, the value range for initializing weight is [- 1,1], the value of initial threshold value Range is [- 0.5,0.5].

Ii. the gesture feature vector (x of training sample is inputted₁,x₂,...,x₁₈)。

Iii. the output data of each layer is calculated:

BP neural network of the present invention is all made of activation primitive of the sigmoid function as each layer neuron, and specific formula is such as Under:

Iv. mean square error E is calculated, specific formula is as follows:

Wherein (y₁,y₂,...,y₈) be training sample class label.

V. parameter updates, and when E is greater than setting error, updates the weight and threshold value of network, using gradient descent method to repair Proper preceding BP neural network；When E is less than setting error, deconditioning network obtains optimal model parameter.

Sorting phase carries out gesture classification with trained BP neural network model, and specific practice is as follows: input is to be sorted The gesture feature vector of sample, calculates the output data of each layer, obtains the output valve of network, the as classification results of gesture.

In conclusion present embodiment discloses a kind of monocular static gesture identification method based on multi-feature fusion, the party Method utilizes the Extraction of Geometrical Features hand region of human body complexion information and hand, removes existing arm using range conversion operation Region, realization gesture and complex background are precisely separating.This method is by combining a variety of effective gesture feature training BP nerves Network realizes the high gesture recognition system of strong robustness, an accuracy rate.In addition, this method is real using common monocular cam Existing static gesture identification, has many advantages, such as that equipment cost is low, recognition accuracy is high, easy to promote and utilize.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims

1. a kind of monocular static gesture identification method based on multi-feature fusion, which is characterized in that the recognition methods includes:

Image preprocessing step, using human body complexion information carry out skin color segmentation, by image the colour of skin and class area of skin color mention It takes out, using the geometrical characteristic of Morphological scale-space and combination hand, hand is separated with complex background, is grasped by range conversion Make the positioning centre of the palm and remove arm regions existing for hand, obtains gesture bianry image；

Gesture identification step passes through training BP neural network using the gesture feature vector of extraction as the input of BP neural network Realize static gesture classification.

2. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 1, which is characterized in that The image preprocessing step includes:

The image of input is carried out the conversion of color space by skin color segmentation, by Chroma threshold segmentation by image the colour of skin and Class area of skin color extracts, and obtains bianry image；

Morphological scale-space is carried out Morphological scale-space to the bianry image after skin color segmentation, is first operated using opening operation, then used and close Arithmetic operation eliminates the isolated noise in image；

The segmentation of hand geometry realizes that face and hand are separated with complex background, calculates remaining two connected regions Shape complexity C extracts hand bianry image by judging with threshold value T；

By the arm removal in hand bianry image, gesture bianry image is obtained.

3. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 2, which is characterized in that The described image by input carries out the conversion of color space, by Chroma threshold segmentation by image the colour of skin and class colour of skin area Domain extracts, and it is as follows to obtain bianry image process:

The image of input is transformed into YCr ' Cb ' color space by rgb color space by convert color spaces, and conversion formula is as follows:

Y=0.299 × r+0.587 × g+0.114 × b

Wherein r, g, b are respectively image three components of the red, green, blue in rgb color space, and y, cr', cb' are respectively image Luminance component, red chrominance component, chroma blue component in YCr ' Cb ' color space；

Chroma threshold segmentation, if two chromatic components of pixel meet the colour of skin in the threshold range of cr' and cb' component simultaneously When, enable the pixel value be 1, otherwise value be 0, thus by image the colour of skin and class area of skin color extract, obtain Bianry image.

4. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 2, which is characterized in that The hand geometry segmentation, realizes that face and hand are separated with complex background, calculates remaining two connected regions Shape complexity C, by with threshold value T judge, extract hand bianry image process it is as follows:

Area screening calculates the area of different connected regions in image, extracts area biggish two using eight connectivity distinguished number A connected region, realization face and hand are separated with complex background；

Shape complexity threshold decision calculates the shape complexity C of remaining two connected regions, if the shape of connected region is multiple Miscellaneous degree C value is greater than threshold value T, then it is assumed that the region is non-hand region and removes, to obtain hand bianry image.

5. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 2, which is characterized in that Described removes the arm in hand bianry image, and it is as follows to obtain gesture bianry image process:

Centre of the palm positioning, is operated using range conversion, calculates minimum range of the hand pixel from hand boundary, and distance value replaces former Pixel value, except remaining region value of hand is 0, the maximum pixel of value in the image obtained after range conversion operation For the centre of the palm, corresponding value is R₀；

Palm cutting will be less than R at a distance from the centre of the palm₁Pixel value be 0, to remove palm area, wherein R₁= 1.35×R₀；

Threshold method judges that arm whether there is, and positions the pixel P of maximum value in image, and corresponding value is P_value, calculate P_value/R₀If value is greater than threshold value T₁, then P point region is arm regions, continue next removal arm operation, Otherwise arm regions are not present in hand bianry image, go to gesture feature extraction step；

6. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 1, which is characterized in that The gesture feature extraction step includes:

Calculate the Fourier descriptor feature of gesture profile；

The ratio and Fourier descriptor feature of Hu moment characteristics, perimeter and area are combined, the gesture feature vector of 18 dimensions is constituted.

7. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 6, which is characterized in that The Fourier descriptor characteristic procedure of the calculating gesture profile is as follows:

By the coordinate { (x of gesture contour edge_k, y_k) complex representation is used, constitute sequence of complex numbers { c_k, c_kIt is expressed as follows:

c_k=x_k+iy_k, k=0,1,2 ..., N-1；

8. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 1, which is characterized in that The BP neural network includes input layer, hidden layer and output layer, and input layer has d neuron, by gesture feature vector Dimension determines that output layer has s neuron, determined by gesture species number, and hidden layer has q neuron, i-th of nerve of input layer Connection weight between member and h-th of neuron of hidden layer is v_ih, h-th of neuron of hidden layer and j-th of neuron of output layer Between connection weight be w_hj, the threshold value of h-th of neuron of hidden layer isThe threshold value of j-th of neuron of output layer is θ_j。

9. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 8, which is characterized in that Before the gesture identification step, further includes:

Random initializtion weight and threshold value, the value range for initializing weight is [- 1,1], and the value range of initial threshold value is [-0.5,0.5]；

Input the gesture feature vector (x of training sample₁,x₂,...,x₁₈)；

Calculate the output data of each layer, wherein the BP neural network swashing as each layer neuron using sigmoid function Function living, formula are as follows:

Mean square error E is calculated, calculation formula is as follows:

Wherein (y₁,y₂,...,y₈) be training sample class label；

Parameter updates, and when E is greater than setting error, the weight and threshold value of network are updated using gradient descent method, current to correct BP neural network；When E is less than setting error, deconditioning network obtains optimal model parameter.

10. a kind of monocular static gesture identification method based on multi-feature fusion according to claim 4, feature exist In the calculation formula of the shape complexity C is as follows: