CN106909884B

CN106909884B - Hand region detection method and device based on layered structure and deformable part model

Info

Publication number: CN106909884B
Application number: CN201710035087.XA
Authority: CN
Inventors: 丁希仑; 齐静; 徐坤; 杨帆; 郑羿; 陈佳伟
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2020-08-04
Anticipated expiration: 2037-01-17
Also published as: CN106909884A

Abstract

The invention discloses a hand region detection method and a device based on a layered structure and a deformable part model, wherein the method comprises the following steps: step 1: establishing a gesture picture library; step 2: detecting and segmenting an upper body region of a human body from an input gesture image; and step 3: establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body; and 4, step 4: segmenting a skin color similar region from the original RGB image by deleting morphological operations such as small-area region and cavity filling in the binary image and aspect ratio and area threshold of the minimum connected region; and 5: the method comprises the steps of segmenting an original image into skin color similar areas, carrying out size normalization, extracting directional gradient histogram features, using a support vector machine to establish a hand detection model, and then using a trained hand model to detect a hand area.

Description

Hand region detection method and device based on layered structure and deformable part model

Technical Field

The invention relates to the field of specific object detection in computer vision, in particular to a hand region detection method and device based on a layered structure and a deformable part model.

Background

The gesture is one of natural interaction channels, and has important research value and wide application prospect. The first step of gesture recognition (static gesture and dynamic gesture) and hand tracking is also an important step, namely, the hand region is segmented from the image. Hand region segmentation is the basis for gesture (static gesture and dynamic gesture) recognition and hand tracking, and the effect of segmentation directly affects the effect of gesture recognition or gesture tracking, and therefore, it is necessary to study this.

With the wide application of robots, human-computer interaction technology is increasingly receiving attention from people. The gesture recognition based on vision has the advantages of natural interaction and wide application prospect, and is man-machine interactionImportant constituent parts of each other^[2]. Hand detection and segmentation are the basis for gesture recognition and gesture tracking, and the effect of segmentation directly influences the effect of gesture recognition or gesture tracking^[3]。

In the interaction process of the robot and the human body, when the video acquisition equipment arranged on the robot has a certain distance with the human body, the acquired photos comprise the whole human body. Because a large number of backgrounds exist in the pictures, the hand area is only a small part of the pictures, and how to detect the hand from the large number of background areas and divide the hand, a foundation is laid for gesture recognition, and the method is a problem worthy of research.

Disclosure of Invention

The present invention is directed to solve the above problems, and an object of the present invention is to provide a hand region detection method based on a layered structure and a deformable portion model, which is applicable to a gesture image including a whole body of a human body under a complex background, and can quickly and accurately detect a hand region in the gesture image.

A hand region detection method based on a layered structure and a deformable part model, comprising the steps of:

step 1: establishing a gesture picture library;

the gesture picture library includes: predefining x types of gestures by y testers, wherein two hands of each gesture are made in turn, the gesture of each hand is shot at three distances respectively, and the three distances are d from the camera to the shot person₁m、d₂m and d₃m, shooting each person for three times at the same distance, wherein the persons are respectively shot in the center, the left side and the right side of the image in n environments;

step 2: detecting and segmenting an upper body region of a human body from an input gesture image;

firstly, carrying out bilateral filtering on an extracted image, then extracting Haar wavelet characteristics of a face, a neck, shoulders, elbows and hands, using a Viola-Jones detector and a cascade classifier to train to obtain an upper body model of a human body, and then using the trained upper body model of the human body to detect an upper body region of the human body;

and step 3: establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body;

and 4, step 4: segmenting a skin color similar region from the original RGB image by deleting morphological operations such as small-area region and cavity filling in the binary image and aspect ratio and area threshold of the minimum connected region;

and 5: carrying out size normalization on skin-color-like areas segmented from an original image, taking the skin-color-like areas as the input of a deformable part model, training a hand model by using the deformable part model, and detecting the hand areas by using the trained hand model; wherein the deformable part model employs HOG features.

A hand region detection device based on a layered structure and a deformable part model comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;

the gesture library module is used for providing gesture pictures;

the human body detection module carries out bilateral filtering on the input human body whole body gesture image under the complex background, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human body upper body model, and then the trained human body upper body model is used for detecting a human body upper body region;

the skin color detection module extracts a skin color similar region meeting the skin color model after detecting the human body upper body region by the human body detection module;

the area adjusting module is used for adjusting the result of skin color detection and segmenting the skin color-like area from the original RGB image by deleting the minimum area in the binary image, filling the cavity morphological operation, and the length-width ratio and the area threshold of the minimum connected area;

the hand detection module performs size normalization on the skin-color-like region segmented by the region adjustment module, the skin-color-like region is used as the input of a deformable part model, the deformable part model is used for training a hand model, and the trained hand model is used for detecting a hand region; wherein the deformable part model employs HOG features.

The invention has the advantages that:

(1) the hand region detection method based on the layered structure and the deformable part model is based on ROS (Robot Operating system), has good portability and can be used for various Robot systems;

(2) the hand region detection method based on the layered structure and the deformable part model adopts a screening strategy and the layered structure to reduce the regions layer by layer, and the calculated amount is less. Firstly, detecting an upper half body area of a human body, then filtering out a skin color similar area by using skin color conditions, carrying out area adjustment on the skin color similar area, then training a deformable part model of the hand, and detecting the hand area by using the trained deformable part model of the hand;

(3) according to the hand region detection method based on the layered structure and the deformable part model, the layered structure is adopted, the upper half body region of a human body is detected, and the influence of a complex environment on the hand region detection is reduced to a certain extent; the second layer detects the skin color in the upper half of the human body and removes non-skin color areas; the third layer adjusts the detected skin color area, and the fourth layer detects the hand.

(4) The invention provides a hand detection method based on a skin color model, which comprises the steps of firstly carrying out rough detection on skin colors, removing non-skin color areas and reducing calculated amount, then extracting direction gradient histogram features from the detected skin color areas, and establishing a hand detection model by using a support vector machine.

(5) The invention provides a skin color model based on the combination of an RGB-YCbCr color space explicit threshold and a Gaussian model, which comprehensively uses the RGB-YCbCr explicit threshold and a statistical model and can better detect a skin color area. The RGB-YCbCr color space uses a method of combining a threshold value comprehensive consideration experience threshold value with a sample calculation threshold value.

(6) The modular hand region detection device based on the ROS adopts a customized design and a modular structure, functions of all modules are relatively independent, and a user can add, delete or replace some modules according to needs, so that the universality of the device is improved.

Drawings

FIG. 1 is a flow chart of a hand region detection method based on a layered structure and a deformable part model according to the present invention;

FIG. 2 is a hand region detection apparatus based on a layered structure and a deformable portion model according to the present invention;

FIG. 3 is a diagram of 12 gesture images in the gesture library provided by the present invention;

FIG. 4 is a photograph of a gesture library used in embodiments of the present invention;

FIG. 5 is a diagram illustrating a result of detecting an upper body region of a human body according to an embodiment of the present invention;

FIG. 6 shows the result of step 1 in the embodiment of the present invention; wherein (a) is an object in environment 1, (b) is an object in environment 2, and (c) is a human upper body region;

FIG. 7 shows the result of filling holes in an embodiment of the present invention;

FIG. 8 shows the result of marking the minimum connected region in the embodiment of the present invention;

FIG. 9 shows the result of skin-like color region detection in an embodiment of the present invention;

fig. 10 shows the segmentation result of the skin-like color region in the embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The invention provides a hand region detection method based on a layered structure and a deformable part model, which specifically comprises the following steps, as shown in fig. 1:

step 1: establishing a gesture picture library;

because the invention is based on the robot background, the gesture detection lays a solid foundation for the operator to control the robot to move by using the gesture. Therefore, the gesture picture library of the present invention comprises: y testers predefining x gesture types, wherein each gesture is respectively at three distances (the camera is respectively at a distance d from the shot person)₁m,d₂m and d₃m) taking pictures, each person takes pictures three times with the same distance in each hand, and the people respectively take pictures in the center, the left side and the right side of the image and in two environments of a laboratory and a corridor, as shown in the figure3, the number of the collected photos is 2 × 3 × 3 × 2xy 36 xy.. fig. 3 is a part of gesture images provided in the gesture picture library of the present invention, fig. 4 is a photo of two hands of the same gesture provided in the gesture picture library of the present invention at different distances and different positions, and fig. 5 is a photo in the gesture library used in the embodiment of the present invention;

firstly, bilateral filtering is carried out on the extracted image, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human upper body model, and then the trained human upper body model is used for detecting the human upper body region. The advantage of this model is that a priori knowledge of human morphology is used: the robustness of detecting the upper half of the human body is strong by local context information.

And step 3: and establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body.

The invention adopts the skin color model, and aims to take skin color detection as the pretreatment of hand detection, thereby reducing the calculation cost in the hand detection process.

The skin color model used in the invention comprehensively adopts a method of combining an RGB-YCbCr color space explicit threshold value with a Gaussian model. The explicit threshold adopts an explicit threshold of an RGB-YCbCr color space, and the operation speed is high. The Gaussian model is a statistical-based model, and the combination of the Gaussian model and the statistical-based model can better detect skin color.

The method specifically comprises the following steps:

and establishing a skin color model by comprehensively using the RGB-YCbCr explicit threshold and a Gaussian model.

Although the RGB color space is commonly used, the RGB color space is greatly influenced by the light intensity, the skin color is better clustered in the YCbCr color space, and the cross area of the skin color and the non-skin color distribution range is smaller. Therefore, the invention comprehensively uses the explicit threshold value to detect the skin color in the RGB and YCbCr color spaces, and simultaneously combines the Gaussian skin color model to establish the skin color model of the invention in order to better detect the skin color.

(1) Skin tone regions are detected using an RGB color space explicit threshold.

Extracting a hand skin color area of the established gesture library, reading the skin color pixel values, and analyzing the following relationship among RGB three channels of the pixel points:

wherein p is_R,p_G,p_BThe pixel values of three channels of R, G and B of a certain pixel point are respectively;

extracting minimum values of three channels R, G and B of hand skin color samples respectively;

the maximum values of three channels R, G and B of the hand skin color samples are extracted respectively.

respectively taking the minimum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;

the maximum values of the R-G, R-B and G-B channel difference values of the hand skin color samples are respectively.

Since in the literature, explicit skin tone thresholds are used in the RGB color space as follows:

documents [4,5] use the threshold value shown in formula (5),

document [6] uses the threshold shown in equation (6),

therefore, the skin color model of the invention adopts the threshold value shown in the formula (7) in the RGB color space,

skin color pixel point S in RGB color space_rgb(r, g, b) satisfies the following equation (8):

where r (r, g, b) is the value of a pixel in the RGB color space.

(2) Skin tones were detected using YCbCr explicit thresholds.

For YCbCr, where Y is the luminance component; cb refers to the blue component; cr refers to the red chrominance component. According to document [7], the YCbCr color space can be obtained by matrix transformation of the RGB color space by equation (9),

according to document [8], the threshold that a skin color region satisfies in the YCbCr color space is:

(80≤Cb≤120)∩(133≤Cr≤173)(10)

by

Can be obtained by the formula (9)

Wherein the content of the first and second substances,

extracting minimum values of Y, Cb and Cr components of hand skin color samples respectively;

the maximum values of the components of the hand skin color samples Y, Cb and Cr are respectively extracted.

In YCbCr color space, region S satisfying skin color condition_ycbcr(cb, cr) satisfies the condition of the formula (12),

wherein the content of the first and second substances,

the color space of the skin color pixel in YCbCr color space, the lowest threshold value and the highest threshold value of a Cb channel, the lowest threshold value and the highest threshold value of a Cr channel are respectively. c (Cb, Cr) is the value of the Cb, Cr component of a pixel in the YCbCr color space.

(3) Establishing Gaussian skin color model for skin color detection

Due to the statistical distribution of skin tone pixels in the YCbCr color space, it can be approximately seen as a gaussian distribution. Therefore, a gaussian skin color model is established in the YCbCr color space, and skin color detection is performed using the established gaussian skin color model.

The method specifically comprises the following two steps:

step a: firstly, manually cutting out hand skin color samples in a gesture picture library constructed by the invention, converting skin color pixels from RGB to YCbCr color space by using a formula (9), carrying out statistical analysis on the skin color pixels, and obtaining a mean value mu of a Gaussian model through an elliptic Gaussian joint probability density function (pdf) and Bayesian maximum likelihood estimation(μ_cbμ_cr) Sum covariance ∑ (∑)_cb∑_cr)。

The elliptical gaussian joint probability density function is as follows:

where c is the skin color vector, μ is the mean vector, and ∑ is the covariance matrix.

Wherein the content of the first and second substances,

after hand skin color samples in the gesture picture library constructed by the invention are manually cut out and converted from RGB to YCbCr color space (c)_j) The parameter mu (mu) can be obtained by Bayesian maximum likelihood estimation_cbμ_cr) And ∑ (∑)_cb∑_cr)。

Step b: and calculating the similarity degree of the given pixel point and the skin color, namely the skin color likelihood, further obtaining the maximum skin color likelihood of the image to be detected, wherein the ratio of the two is the skin color probability value, thus obtaining a skin color likelihood map, and then thresholding the skin color likelihood map so as to segment a skin color area.

The gray distribution of skin color can be approximately fitted by a one-dimensional Gaussian curve, the peak area of the CbCr space is a skin color part, and other areas with less distribution are non-skin color parts. And establishing a skin color model by using the corresponding relation and using the skin color similarity of the pixel points as the weight of the gray value of the corresponding pixel points.

According to the established Gaussian skin color model, calculating the likelihood degree D (Cb, Cr) of the vector c (Cb, Cr) to be measured and the skin color, wherein the calculation formula is as follows:

D(Cb,Cr)＝exp[-0.5(c-μ)^T∑^-1(c-μ)](16)

and after the similarity D is calculated, the maximum skin color likelihood of the image to be detected can be obtained, the ratio of the maximum skin color likelihood and the maximum skin color likelihood is the skin color probability value, so that a skin color likelihood image can be obtained, and a binary image of the skin color image can be obtained after thresholding the skin color likelihood image.

Skin color region S in skin color model of the invention_skinTo satisfy the skin color Gaussian model S_g(Cb, Cr), YCbCr explicit threshold S_ycbcr(cb, cr) and RGB explicit threshold S_rgbThe intersection of (r, g, b), i.e.

S_skin＝S_g(Cb,Cr)∩S_ycbcr(cb,cr)∩S_rgb(r,g,b) (17)

A region is considered a skin tone region if it satisfies both the RGB display threshold, the YCbCr explicit threshold, and the skin tone gaussian model.

And 4, entering the area meeting the skin color model into step 4, and carrying out morphological processing on the area. Images that do not meet the requirements are not processed.

And 4, step 4: and segmenting the skin-like color region from the original RGB image by deleting morphological operations such as small-area regions, hole filling and the like in the binary image and aspect ratio and area threshold of the minimum connected region.

Performing morphological processing and region adjustment on regions meeting skin color conditions, wherein the morphological processing and region adjustment comprise the following steps: deleting a small area region in the binary image, filling a hole, marking a minimum connected region, and segmenting a skin-color-like region on the original image according to the aspect ratio and the area threshold of the minimum connected region.

And 5: and (4) carrying out size normalization on the skin-like color area (namely the result obtained in the step (4)) segmented from the original image, taking the skin-like color area as the input of a deformable part model, training a hand model by using the deformable part model, and further detecting the hand area by using the trained hand model. The deformable part model adopts HOG characteristics used by Pedro F. Felzenszwalb and the like [9], and a multi-scale deformable template trained by a potential support vector machine classifier is used for hand detection.

Step 5 comprises two parts of model training and hand detection, which are specifically as follows:

(1) model training

And 4, normalizing the skin color area obtained by segmentation, and taking the skin color area as a training sample. The training samples include positive samples and negative samples, specifically, the positive samples are photographs including hand regions, and the negative samples are photographs not including hand regions. And (4) taking the marked positive and negative sample information as the input of a deformable part model, and training a hand detection model by using the deformable part model [9] to generate the hand detection model required by the user.

(2) Hand region detection using deformable part model

And 3, after the skin color area is divided out and normalized, detecting the hand area by using the deformable part model trained in the step 1.

A hand region detection device based on a layered structure and a deformable part model is shown in figure 2 and comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;

the gesture library module is used for providing gesture pictures;

The invention aims at the whole body gesture image of the human body, adopts a layered structure to detect the hand, removes the complex background, and detects the upper half body area of the human body containing the human face on the first layer. Through the detection of the upper half body area of the human body in the first layer, the influence of a complex background is removed, and the picture detection area faced by the hand detection in the second layer is obviously reduced.

Detecting, at the second layer, a skin color-like region by a skin color model based on the detected image of the upper half of the human body;

in the third layer, the detected skin-color-like region is adjusted through morphological operation, aspect ratio limitation and area threshold of the minimum connected region;

in the fourth layer, the detected skin-like color region is normalized, and then the hand region is detected using the deformable portion model.

Example (b):

the invention provides a hand region detection method based on a layered structure and a deformable part model, as shown in figure 1, the hand region detection method based on the layered structure and the deformable part model is based on a hydro version of a Robot Operating System (ROS) installed on L inux Ubuntu 12.04.

The embodiment of the invention takes the figure 5 as an input image and detects the hand area. FIG. 5 is a photograph of a gesture library. As can be seen from fig. 5, the image is a full-body photograph of a human body with a complex background (and a background similar to skin color behind the human body).

The invention provides a modular hand region detection method based on ROS (reactive oxygen species), which specifically comprises the following steps as shown in figure 1:

step 1: detecting and segmenting an upper body region of a human body from an input gesture image;

firstly, bilateral filtering is carried out on an extracted image, then Haar wavelet characteristics of the face, the neck, the shoulders, the arms and the hands are extracted, a cascade classifier waterfall type algorithm is used for training to obtain a human upper body model based on a Viola-Jones detector framework, and then the trained human upper body model is used for detecting the human upper body region. The model has the advantages that the robustness for detecting the upper half of the human body is strong by using the local context information of the connection of the face, the neck, the shoulders, the arms and the hands.

The method of step 1 is adopted to take the photos in the self-created gesture photo library of the invention as an implementation example, the implementation example uses the photos as shown in fig. 4, but certain false detection exists at the same time, namely, in addition to the detection of the upper half body area of the human body as shown in fig. 6(c), some objects in the environment as shown in (a) and (b) in fig. 6 are detected. However, the hand candidate area is greatly reduced compared to the original image (i.e., fig. 5), and the amount of calculation is reduced.

Step 2: extracting a skin color similar region in the upper half body image of the human body by using a skin color model combining an RGB-YCbCr color space explicit threshold and a Gaussian model;

and step 3: and segmenting the skin-like color region from the original RGB image by deleting morphological operations such as small-area regions, hole filling and the like in the binary image and aspect ratio and area threshold of the minimum connected region.

Performing morphological processing and region adjustment on regions meeting skin color conditions, wherein the morphological processing and region adjustment comprise the following steps: deleting a small-area region in the binary image, filling a hole, marking a minimum connected region, and segmenting a skin-color-like region on the original image according to the length-width ratio and the area threshold of the minimum connected region. The result after filling the hole, marking the minimum connected region, detecting a skin-like color region according to the area and the aspect ratio of the minimum connected region, and segmenting the skin-like color region from the original picture, as shown in fig. 7, 8, 9 and 10 respectively.

And 4, step 4: the skin-like color region (i.e. the result obtained in step 3) segmented from the original image is normalized in size, for example, 200 × 200, the deformable part model of the hand is trained by the method described in 4.1, and then the hand is detected by using the trained deformable part model of the hand, that is, the hand region is detected by using the trained hand model after the size normalization of the result obtained in step 3.

Reference documents:

1.Hasan,H.and S.Abdul-Kareem,Human–computer interaction using vision-based hand gesture recognition systems:a survey.Neural Computing andApplications,2014.25(2):p.251-261.

2.Pisharady,P.K.and M.Saerbeck,Recent methods and databases invision-based hand gesture recognition:A review.Computer Vision and ImageUnderstanding,2015.141:p.152-165.

3.Rautaray,S.S.and A.Agrawal,Vision based hand gesture recognitionfor human computer interaction:a survey.Artificial Intelligence Review,2015.43(1):p.1-54.

4.Peer,P.and F.Solina,An automatic human face detection method.1999.

5.Solina,F.,et al.15seconds of fame-an interactive,computer-visionbased art installation.in Control,Automation,Robotics and Vision,2002.ICARCV2002.7th International Conference on.2002.IEEE.

6.Chen,W.-C.and M.-S.Wang,Region-based and content adaptive skindetection in color images.Internationaljournal of pattern recognition andartificial intelligence,2007.21(05):p.831-853.

7.Ban,Y.,et al.,Face detection based on skin color likelihood.PatternRecognition,2014.47(4):p.1573-1585.

8.Basilio,J.A.M.,et al.,Explicit image detection using YCbCr spacecolor model as skin detection.Applications of Mathematics and ComputerEngineering,2011:p.123-128.

9.Felzenszwalb,P.F.,et al.,Object detection with discriminativelytrained part-based models.IEEEtransactions on pattern analysis and machineintelligence,2010.32(9):p.1627-1645.

Claims

1. a hand region detection method based on a layered structure and a deformable part model, comprising the steps of:

step 1: establishing a gesture picture library;

the gesture picture library includes: y testers predefining x gesture types, wherein each gesture is shot at three distances respectively, and the three distances are d from the camera to the shot person₁、d₂And d₃Shooting each person for three times at the same distance, wherein the persons are respectively shot in the center, the left side and the right side of the image in n environments;

the method specifically comprises the following steps:

the method specifically comprises the following steps of establishing a skin color model by adopting an RGB-YCbCr explicit threshold value and a Gaussian model:

(1) detecting a skin color region using an RGB color space explicit threshold;

extracting a hand skin color area of the established gesture library, reading a skin color pixel value, wherein the relationship among RGB three channels of the pixel point is as follows:

extracting maximum values of three channels R, G and B of hand skin color samples respectively;

respectively taking the maximum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;

the threshold of the skin color model in the RGB color space is:

skin color pixel point S in RGB color space_rgb(r, g, b) satisfies the following requirements:

wherein r (r, g, b) is the value of a certain pixel in the RGB color space;

(2) detecting skin tones using YCbCr explicit thresholds;

for YCbCr, where Y is the luminance component, Cb is the blue component, Cr is the red chrominance component, the YCbCr color space is obtained from the RGB color space by matrix transformation as follows:

the threshold value satisfied by the skin color area in the YCbCr color space is as follows:

(80≤Cb≤120)∩(133≤Cr≤173) (10)

by

Obtained by the formula (9)

Wherein the content of the first and second substances,

respectively extracting maximum values of components Y, Cb and Cr of hand skin color samples;

in YCbCr color space, region S satisfying skin color condition_ycbcr(cb, cr) satisfies the following condition,

wherein the content of the first and second substances,

respectively representing the lowest threshold, the highest threshold, the lowest threshold and the highest threshold of a Cr channel of a skin color pixel in a YCbCr color space; c (Cb, Cr) is the value of the Cb, Cr component of a pixel in the YCbCr color space;

(3) establishing Gaussian skin color model for skin color detection

The method specifically comprises the following two steps:

step a: obtaining hand skin color samples in a constructed gesture picture library, converting skin color pixels from RGB to YCbCr color space by using a formula (9), performing statistical analysis on the skin color pixels, and obtaining a mean value mu (mu) of a Gaussian model through an elliptic Gaussian joint probability density function and Bayesian maximum likelihood estimation_cbμ_cr) Sum covariance ∑ (∑)_cb∑_cr)；

The elliptical gaussian joint probability density function is as follows:

wherein c is a skin color vector, mu is a mean vector, and ∑ is a covariance matrix;

wherein the content of the first and second substances,

step b: calculating the similarity degree of the given pixel point and the skin color, namely skin color likelihood, to obtain the maximum skin color likelihood of the image to be detected, wherein the ratio of the two is skin color probability value, so as to obtain a skin color likelihood map, and then thresholding the skin color likelihood map to segment a skin color region;

D(Cb,Cr)＝exp[-0.5(c-μ)^T∑^-1(c-μ)](16)

calculating the skin color likelihood D (Cb, Cr), obtaining the maximum skin color likelihood of the image to be detected, wherein the ratio of the maximum skin color likelihood and the maximum skin color likelihood is the skin color probability value, thus obtaining a skin color likelihood image, and thresholding the skin color likelihood image to obtain a binary image of the skin color image;

skin color region S in skin color model_skinTo satisfy high skin colorModel S_g(Cb, Cr), YCbCr explicit threshold S_ycbcr(cb, cr) and RGB explicit threshold S_rgbThe intersection of (r, g, b), i.e.

S_skin＝S_g(Cb,Cr)∩S_ycbcr(cb,cr)∩S_rgb(r,g,b) (17)

If a certain region simultaneously meets an RGB display threshold, a YCbCr display threshold and a skin color Gaussian model, the region is considered to be a skin color region;

entering the area meeting the skin color model into the step 4, and carrying out morphological processing on the area; images which do not meet the requirements are not processed;

and 5: carrying out size normalization on skin-color-like areas segmented from an original image, taking the skin-color-like areas as the input of a deformable part model, training a hand model by using the deformable part model, and detecting the hand areas by using the trained hand model; wherein the deformable part model adopts HOG characteristics;

the method comprises two parts of model training and hand detection, and specifically comprises the following steps:

(1) model training

Step 4, normalizing the skin color area obtained by segmentation to be used as a training sample; the training sample comprises a positive sample and a negative sample, wherein the positive sample is a photo containing a hand area, and the negative sample is a photo not containing the hand area; using the marked positive and negative sample information as the input of a deformable part model, training a hand detection model by using the deformable part model, and generating a hand detection model required by a user;

(2) hand region detection using deformable part model

And 4, after the skin color area is divided out and normalized, detecting the hand area by using the trained deformable part model.

2. A hand region detection device based on a layered structure and a deformable part model comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;

the gesture library module is used for providing gesture pictures;

the skin color detection module extracts a skin color similar region meeting the skin color model after detecting the human body upper body region by the human body detection module; the method specifically comprises the following steps:

(1) detecting a skin color region using an RGB color space explicit threshold;

the threshold of the skin color model in the RGB color space is:

wherein r (r, g, b) is the value of a certain pixel in the RGB color space;

(2) detecting skin tones using YCbCr explicit thresholds;

(80≤Cb≤120)∩(133≤Cr≤173) (10)

by

Obtained by the formula (9)

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

(3) establishing Gaussian skin color model for skin color detection

The method specifically comprises the following two steps:

step a: obtaining hand skin color samples in a gesture picture library, converting skin color pixels from RGB to YCbCr color space by using a formula (9), and performing statistical analysis on the skin color pixelsObtaining the mean value mu (mu) of the Gaussian model by an over-ellipse Gaussian combined probability density function and Bayesian maximum likelihood estimation_cbμ_cr) Sum covariance ∑ (∑)_cb∑_cr)；

The elliptical gaussian joint probability density function is as follows:

wherein the content of the first and second substances,

calculating a vector c (Cb) to be measured according to the established Gaussian skin color model_,Cr) and the likelihood of skin color D (Cb, Cr), the calculation formula is as follows:

D(Cb,Cr)＝exp[-0.5(c-μ)^T∑^-1(c-μ)](16)

skin color region S in skin color model_skinTo satisfy the skin color Gaussian model S_g(Cb, Cr), YCbCr explicit threshold S_ycbcr(cb, cr) and RGB explicit threshold S_rgbThe intersection of (r, g, b), i.e.

S_skin＝S_g(Cb,Cr)∩S_ycbcr(cb,cr)∩S_rgb(r,g,b) (18)

entering the region meeting the skin color model into a region adjusting module, and carrying out morphological processing on the region; images which do not meet the requirements are not processed;

the hand detection module performs size normalization on the skin-color-like region segmented by the region adjustment module, the skin-color-like region is used as the input of a deformable part model, the deformable part model is used for training a hand model, and the trained hand model is used for detecting a hand region; wherein the deformable part model adopts HOG characteristics;

(1) model training

The area adjustment module divides skin color areas to be normalized and then the skin color areas are used as training samples; the training sample comprises a positive sample and a negative sample, wherein the positive sample is a photo containing a hand area, and the negative sample is a photo not containing the hand area; using the marked positive and negative sample information as the input of a deformable part model, training a hand detection model by using the deformable part model, and generating a hand detection model required by a user;

(2) hand region detection using deformable part model

After the skin color area is divided by the area adjusting module and normalized, the hand area is detected by using the trained deformable part model.