CN106909884B - Hand region detection method and device based on layered structure and deformable part model - Google Patents

Hand region detection method and device based on layered structure and deformable part model Download PDF

Info

Publication number
CN106909884B
CN106909884B CN201710035087.XA CN201710035087A CN106909884B CN 106909884 B CN106909884 B CN 106909884B CN 201710035087 A CN201710035087 A CN 201710035087A CN 106909884 B CN106909884 B CN 106909884B
Authority
CN
China
Prior art keywords
skin color
model
hand
region
skin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710035087.XA
Other languages
Chinese (zh)
Other versions
CN106909884A (en
Inventor
丁希仑
齐静
徐坤
杨帆
郑羿
陈佳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201710035087.XA priority Critical patent/CN106909884B/en
Publication of CN106909884A publication Critical patent/CN106909884A/en
Application granted granted Critical
Publication of CN106909884B publication Critical patent/CN106909884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a hand region detection method and a device based on a layered structure and a deformable part model, wherein the method comprises the following steps: step 1: establishing a gesture picture library; step 2: detecting and segmenting an upper body region of a human body from an input gesture image; and step 3: establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body; and 4, step 4: segmenting a skin color similar region from the original RGB image by deleting morphological operations such as small-area region and cavity filling in the binary image and aspect ratio and area threshold of the minimum connected region; and 5: the method comprises the steps of segmenting an original image into skin color similar areas, carrying out size normalization, extracting directional gradient histogram features, using a support vector machine to establish a hand detection model, and then using a trained hand model to detect a hand area.

Description

Hand region detection method and device based on layered structure and deformable part model
Technical Field
The invention relates to the field of specific object detection in computer vision, in particular to a hand region detection method and device based on a layered structure and a deformable part model.
Background
The gesture is one of natural interaction channels, and has important research value and wide application prospect. The first step of gesture recognition (static gesture and dynamic gesture) and hand tracking is also an important step, namely, the hand region is segmented from the image. Hand region segmentation is the basis for gesture (static gesture and dynamic gesture) recognition and hand tracking, and the effect of segmentation directly affects the effect of gesture recognition or gesture tracking, and therefore, it is necessary to study this.
With the wide application of robots, human-computer interaction technology is increasingly receiving attention from people. The gesture recognition based on vision has the advantages of natural interaction and wide application prospect, and is man-machine interactionImportant constituent parts of each other[2]. Hand detection and segmentation are the basis for gesture recognition and gesture tracking, and the effect of segmentation directly influences the effect of gesture recognition or gesture tracking[3]
In the interaction process of the robot and the human body, when the video acquisition equipment arranged on the robot has a certain distance with the human body, the acquired photos comprise the whole human body. Because a large number of backgrounds exist in the pictures, the hand area is only a small part of the pictures, and how to detect the hand from the large number of background areas and divide the hand, a foundation is laid for gesture recognition, and the method is a problem worthy of research.
Disclosure of Invention
The present invention is directed to solve the above problems, and an object of the present invention is to provide a hand region detection method based on a layered structure and a deformable portion model, which is applicable to a gesture image including a whole body of a human body under a complex background, and can quickly and accurately detect a hand region in the gesture image.
A hand region detection method based on a layered structure and a deformable part model, comprising the steps of:
step 1: establishing a gesture picture library;
the gesture picture library includes: predefining x types of gestures by y testers, wherein two hands of each gesture are made in turn, the gesture of each hand is shot at three distances respectively, and the three distances are d from the camera to the shot person1m、d2m and d3m, shooting each person for three times at the same distance, wherein the persons are respectively shot in the center, the left side and the right side of the image in n environments;
step 2: detecting and segmenting an upper body region of a human body from an input gesture image;
firstly, carrying out bilateral filtering on an extracted image, then extracting Haar wavelet characteristics of a face, a neck, shoulders, elbows and hands, using a Viola-Jones detector and a cascade classifier to train to obtain an upper body model of a human body, and then using the trained upper body model of the human body to detect an upper body region of the human body;
and step 3: establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body;
and 4, step 4: segmenting a skin color similar region from the original RGB image by deleting morphological operations such as small-area region and cavity filling in the binary image and aspect ratio and area threshold of the minimum connected region;
and 5: carrying out size normalization on skin-color-like areas segmented from an original image, taking the skin-color-like areas as the input of a deformable part model, training a hand model by using the deformable part model, and detecting the hand areas by using the trained hand model; wherein the deformable part model employs HOG features.
A hand region detection device based on a layered structure and a deformable part model comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;
the gesture library module is used for providing gesture pictures;
the human body detection module carries out bilateral filtering on the input human body whole body gesture image under the complex background, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human body upper body model, and then the trained human body upper body model is used for detecting a human body upper body region;
the skin color detection module extracts a skin color similar region meeting the skin color model after detecting the human body upper body region by the human body detection module;
the area adjusting module is used for adjusting the result of skin color detection and segmenting the skin color-like area from the original RGB image by deleting the minimum area in the binary image, filling the cavity morphological operation, and the length-width ratio and the area threshold of the minimum connected area;
the hand detection module performs size normalization on the skin-color-like region segmented by the region adjustment module, the skin-color-like region is used as the input of a deformable part model, the deformable part model is used for training a hand model, and the trained hand model is used for detecting a hand region; wherein the deformable part model employs HOG features.
The invention has the advantages that:
(1) the hand region detection method based on the layered structure and the deformable part model is based on ROS (Robot Operating system), has good portability and can be used for various Robot systems;
(2) the hand region detection method based on the layered structure and the deformable part model adopts a screening strategy and the layered structure to reduce the regions layer by layer, and the calculated amount is less. Firstly, detecting an upper half body area of a human body, then filtering out a skin color similar area by using skin color conditions, carrying out area adjustment on the skin color similar area, then training a deformable part model of the hand, and detecting the hand area by using the trained deformable part model of the hand;
(3) according to the hand region detection method based on the layered structure and the deformable part model, the layered structure is adopted, the upper half body region of a human body is detected, and the influence of a complex environment on the hand region detection is reduced to a certain extent; the second layer detects the skin color in the upper half of the human body and removes non-skin color areas; the third layer adjusts the detected skin color area, and the fourth layer detects the hand.
(4) The invention provides a hand detection method based on a skin color model, which comprises the steps of firstly carrying out rough detection on skin colors, removing non-skin color areas and reducing calculated amount, then extracting direction gradient histogram features from the detected skin color areas, and establishing a hand detection model by using a support vector machine.
(5) The invention provides a skin color model based on the combination of an RGB-YCbCr color space explicit threshold and a Gaussian model, which comprehensively uses the RGB-YCbCr explicit threshold and a statistical model and can better detect a skin color area. The RGB-YCbCr color space uses a method of combining a threshold value comprehensive consideration experience threshold value with a sample calculation threshold value.
(6) The modular hand region detection device based on the ROS adopts a customized design and a modular structure, functions of all modules are relatively independent, and a user can add, delete or replace some modules according to needs, so that the universality of the device is improved.
Drawings
FIG. 1 is a flow chart of a hand region detection method based on a layered structure and a deformable part model according to the present invention;
FIG. 2 is a hand region detection apparatus based on a layered structure and a deformable portion model according to the present invention;
FIG. 3 is a diagram of 12 gesture images in the gesture library provided by the present invention;
FIG. 4 is a photograph of a gesture library used in embodiments of the present invention;
FIG. 5 is a diagram illustrating a result of detecting an upper body region of a human body according to an embodiment of the present invention;
FIG. 6 shows the result of step 1 in the embodiment of the present invention; wherein (a) is an object in environment 1, (b) is an object in environment 2, and (c) is a human upper body region;
FIG. 7 shows the result of filling holes in an embodiment of the present invention;
FIG. 8 shows the result of marking the minimum connected region in the embodiment of the present invention;
FIG. 9 shows the result of skin-like color region detection in an embodiment of the present invention;
fig. 10 shows the segmentation result of the skin-like color region in the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The invention provides a hand region detection method based on a layered structure and a deformable part model, which specifically comprises the following steps, as shown in fig. 1:
step 1: establishing a gesture picture library;
because the invention is based on the robot background, the gesture detection lays a solid foundation for the operator to control the robot to move by using the gesture. Therefore, the gesture picture library of the present invention comprises: y testers predefining x gesture types, wherein each gesture is respectively at three distances (the camera is respectively at a distance d from the shot person)1m,d2m and d3m) taking pictures, each person takes pictures three times with the same distance in each hand, and the people respectively take pictures in the center, the left side and the right side of the image and in two environments of a laboratory and a corridor, as shown in the figure3, the number of the collected photos is 2 × 3 × 3 × 2xy 36 xy.. fig. 3 is a part of gesture images provided in the gesture picture library of the present invention, fig. 4 is a photo of two hands of the same gesture provided in the gesture picture library of the present invention at different distances and different positions, and fig. 5 is a photo in the gesture library used in the embodiment of the present invention;
step 2: detecting and segmenting an upper body region of a human body from an input gesture image;
firstly, bilateral filtering is carried out on the extracted image, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human upper body model, and then the trained human upper body model is used for detecting the human upper body region. The advantage of this model is that a priori knowledge of human morphology is used: the robustness of detecting the upper half of the human body is strong by local context information.
And step 3: and establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body.
The invention adopts the skin color model, and aims to take skin color detection as the pretreatment of hand detection, thereby reducing the calculation cost in the hand detection process.
The skin color model used in the invention comprehensively adopts a method of combining an RGB-YCbCr color space explicit threshold value with a Gaussian model. The explicit threshold adopts an explicit threshold of an RGB-YCbCr color space, and the operation speed is high. The Gaussian model is a statistical-based model, and the combination of the Gaussian model and the statistical-based model can better detect skin color.
The method specifically comprises the following steps:
and establishing a skin color model by comprehensively using the RGB-YCbCr explicit threshold and a Gaussian model.
Although the RGB color space is commonly used, the RGB color space is greatly influenced by the light intensity, the skin color is better clustered in the YCbCr color space, and the cross area of the skin color and the non-skin color distribution range is smaller. Therefore, the invention comprehensively uses the explicit threshold value to detect the skin color in the RGB and YCbCr color spaces, and simultaneously combines the Gaussian skin color model to establish the skin color model of the invention in order to better detect the skin color.
(1) Skin tone regions are detected using an RGB color space explicit threshold.
Extracting a hand skin color area of the established gesture library, reading the skin color pixel values, and analyzing the following relationship among RGB three channels of the pixel points:
Figure BDA0001211417400000051
Figure BDA0001211417400000052
wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure BDA0001211417400000053
extracting minimum values of three channels R, G and B of hand skin color samples respectively;
Figure BDA0001211417400000054
the maximum values of three channels R, G and B of the hand skin color samples are extracted respectively.
Figure BDA0001211417400000055
Figure BDA0001211417400000056
Wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure BDA0001211417400000057
respectively taking the minimum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;
Figure BDA0001211417400000058
the maximum values of the R-G, R-B and G-B channel difference values of the hand skin color samples are respectively.
Since in the literature, explicit skin tone thresholds are used in the RGB color space as follows:
documents [4,5] use the threshold value shown in formula (5),
Figure BDA0001211417400000059
document [6] uses the threshold shown in equation (6),
Figure BDA00012114174000000510
therefore, the skin color model of the invention adopts the threshold value shown in the formula (7) in the RGB color space,
Figure BDA0001211417400000061
skin color pixel point S in RGB color spacergb(r, g, b) satisfies the following equation (8):
Figure BDA0001211417400000062
where r (r, g, b) is the value of a pixel in the RGB color space.
(2) Skin tones were detected using YCbCr explicit thresholds.
For YCbCr, where Y is the luminance component; cb refers to the blue component; cr refers to the red chrominance component. According to document [7], the YCbCr color space can be obtained by matrix transformation of the RGB color space by equation (9),
Figure BDA0001211417400000063
according to document [8], the threshold that a skin color region satisfies in the YCbCr color space is:
(80≤Cb≤120)∩(133≤Cr≤173)(10)
by
Figure BDA0001211417400000064
Can be obtained by the formula (9)
Figure BDA0001211417400000065
Wherein the content of the first and second substances,
Figure BDA0001211417400000066
extracting minimum values of Y, Cb and Cr components of hand skin color samples respectively;
Figure BDA0001211417400000067
the maximum values of the components of the hand skin color samples Y, Cb and Cr are respectively extracted.
Figure BDA0001211417400000068
In YCbCr color space, region S satisfying skin color conditionycbcr(cb, cr) satisfies the condition of the formula (12),
Figure BDA0001211417400000071
wherein the content of the first and second substances,
Figure BDA0001211417400000072
the color space of the skin color pixel in YCbCr color space, the lowest threshold value and the highest threshold value of a Cb channel, the lowest threshold value and the highest threshold value of a Cr channel are respectively. c (Cb, Cr) is the value of the Cb, Cr component of a pixel in the YCbCr color space.
(3) Establishing Gaussian skin color model for skin color detection
Due to the statistical distribution of skin tone pixels in the YCbCr color space, it can be approximately seen as a gaussian distribution. Therefore, a gaussian skin color model is established in the YCbCr color space, and skin color detection is performed using the established gaussian skin color model.
The method specifically comprises the following two steps:
step a: firstly, manually cutting out hand skin color samples in a gesture picture library constructed by the invention, converting skin color pixels from RGB to YCbCr color space by using a formula (9), carrying out statistical analysis on the skin color pixels, and obtaining a mean value mu of a Gaussian model through an elliptic Gaussian joint probability density function (pdf) and Bayesian maximum likelihood estimation(μcbμcr) Sum covariance ∑ (∑)cbcr)。
The elliptical gaussian joint probability density function is as follows:
Figure BDA0001211417400000073
where c is the skin color vector, μ is the mean vector, and ∑ is the covariance matrix.
Figure BDA0001211417400000074
Wherein the content of the first and second substances,
Figure BDA0001211417400000075
after hand skin color samples in the gesture picture library constructed by the invention are manually cut out and converted from RGB to YCbCr color space (c)j) The parameter mu (mu) can be obtained by Bayesian maximum likelihood estimationcbμcr) And ∑ (∑)cbcr)。
Step b: and calculating the similarity degree of the given pixel point and the skin color, namely the skin color likelihood, further obtaining the maximum skin color likelihood of the image to be detected, wherein the ratio of the two is the skin color probability value, thus obtaining a skin color likelihood map, and then thresholding the skin color likelihood map so as to segment a skin color area.
The gray distribution of skin color can be approximately fitted by a one-dimensional Gaussian curve, the peak area of the CbCr space is a skin color part, and other areas with less distribution are non-skin color parts. And establishing a skin color model by using the corresponding relation and using the skin color similarity of the pixel points as the weight of the gray value of the corresponding pixel points.
According to the established Gaussian skin color model, calculating the likelihood degree D (Cb, Cr) of the vector c (Cb, Cr) to be measured and the skin color, wherein the calculation formula is as follows:
D(Cb,Cr)=exp[-0.5(c-μ)T-1(c-μ)](16)
and after the similarity D is calculated, the maximum skin color likelihood of the image to be detected can be obtained, the ratio of the maximum skin color likelihood and the maximum skin color likelihood is the skin color probability value, so that a skin color likelihood image can be obtained, and a binary image of the skin color image can be obtained after thresholding the skin color likelihood image.
Skin color region S in skin color model of the inventionskinTo satisfy the skin color Gaussian model Sg(Cb, Cr), YCbCr explicit threshold Sycbcr(cb, cr) and RGB explicit threshold SrgbThe intersection of (r, g, b), i.e.
Sskin=Sg(Cb,Cr)∩Sycbcr(cb,cr)∩Srgb(r,g,b) (17)
A region is considered a skin tone region if it satisfies both the RGB display threshold, the YCbCr explicit threshold, and the skin tone gaussian model.
And 4, entering the area meeting the skin color model into step 4, and carrying out morphological processing on the area. Images that do not meet the requirements are not processed.
And 4, step 4: and segmenting the skin-like color region from the original RGB image by deleting morphological operations such as small-area regions, hole filling and the like in the binary image and aspect ratio and area threshold of the minimum connected region.
Performing morphological processing and region adjustment on regions meeting skin color conditions, wherein the morphological processing and region adjustment comprise the following steps: deleting a small area region in the binary image, filling a hole, marking a minimum connected region, and segmenting a skin-color-like region on the original image according to the aspect ratio and the area threshold of the minimum connected region.
And 5: and (4) carrying out size normalization on the skin-like color area (namely the result obtained in the step (4)) segmented from the original image, taking the skin-like color area as the input of a deformable part model, training a hand model by using the deformable part model, and further detecting the hand area by using the trained hand model. The deformable part model adopts HOG characteristics used by Pedro F. Felzenszwalb and the like [9], and a multi-scale deformable template trained by a potential support vector machine classifier is used for hand detection.
Step 5 comprises two parts of model training and hand detection, which are specifically as follows:
(1) model training
And 4, normalizing the skin color area obtained by segmentation, and taking the skin color area as a training sample. The training samples include positive samples and negative samples, specifically, the positive samples are photographs including hand regions, and the negative samples are photographs not including hand regions. And (4) taking the marked positive and negative sample information as the input of a deformable part model, and training a hand detection model by using the deformable part model [9] to generate the hand detection model required by the user.
(2) Hand region detection using deformable part model
And 3, after the skin color area is divided out and normalized, detecting the hand area by using the deformable part model trained in the step 1.
A hand region detection device based on a layered structure and a deformable part model is shown in figure 2 and comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;
the gesture library module is used for providing gesture pictures;
the human body detection module carries out bilateral filtering on the input human body whole body gesture image under the complex background, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human body upper body model, and then the trained human body upper body model is used for detecting a human body upper body region;
the skin color detection module extracts a skin color similar region meeting the skin color model after detecting the human body upper body region by the human body detection module;
the area adjusting module is used for adjusting the result of skin color detection and segmenting the skin color-like area from the original RGB image by deleting the minimum area in the binary image, filling the cavity morphological operation, and the length-width ratio and the area threshold of the minimum connected area;
the hand detection module performs size normalization on the skin-color-like region segmented by the region adjustment module, the skin-color-like region is used as the input of a deformable part model, the deformable part model is used for training a hand model, and the trained hand model is used for detecting a hand region; wherein the deformable part model employs HOG features.
The invention aims at the whole body gesture image of the human body, adopts a layered structure to detect the hand, removes the complex background, and detects the upper half body area of the human body containing the human face on the first layer. Through the detection of the upper half body area of the human body in the first layer, the influence of a complex background is removed, and the picture detection area faced by the hand detection in the second layer is obviously reduced.
Detecting, at the second layer, a skin color-like region by a skin color model based on the detected image of the upper half of the human body;
in the third layer, the detected skin-color-like region is adjusted through morphological operation, aspect ratio limitation and area threshold of the minimum connected region;
in the fourth layer, the detected skin-like color region is normalized, and then the hand region is detected using the deformable portion model.
Example (b):
the invention provides a hand region detection method based on a layered structure and a deformable part model, as shown in figure 1, the hand region detection method based on the layered structure and the deformable part model is based on a hydro version of a Robot Operating System (ROS) installed on L inux Ubuntu 12.04.
The embodiment of the invention takes the figure 5 as an input image and detects the hand area. FIG. 5 is a photograph of a gesture library. As can be seen from fig. 5, the image is a full-body photograph of a human body with a complex background (and a background similar to skin color behind the human body).
The invention provides a modular hand region detection method based on ROS (reactive oxygen species), which specifically comprises the following steps as shown in figure 1:
step 1: detecting and segmenting an upper body region of a human body from an input gesture image;
firstly, bilateral filtering is carried out on an extracted image, then Haar wavelet characteristics of the face, the neck, the shoulders, the arms and the hands are extracted, a cascade classifier waterfall type algorithm is used for training to obtain a human upper body model based on a Viola-Jones detector framework, and then the trained human upper body model is used for detecting the human upper body region. The model has the advantages that the robustness for detecting the upper half of the human body is strong by using the local context information of the connection of the face, the neck, the shoulders, the arms and the hands.
The method of step 1 is adopted to take the photos in the self-created gesture photo library of the invention as an implementation example, the implementation example uses the photos as shown in fig. 4, but certain false detection exists at the same time, namely, in addition to the detection of the upper half body area of the human body as shown in fig. 6(c), some objects in the environment as shown in (a) and (b) in fig. 6 are detected. However, the hand candidate area is greatly reduced compared to the original image (i.e., fig. 5), and the amount of calculation is reduced.
Step 2: extracting a skin color similar region in the upper half body image of the human body by using a skin color model combining an RGB-YCbCr color space explicit threshold and a Gaussian model;
and step 3: and segmenting the skin-like color region from the original RGB image by deleting morphological operations such as small-area regions, hole filling and the like in the binary image and aspect ratio and area threshold of the minimum connected region.
Performing morphological processing and region adjustment on regions meeting skin color conditions, wherein the morphological processing and region adjustment comprise the following steps: deleting a small-area region in the binary image, filling a hole, marking a minimum connected region, and segmenting a skin-color-like region on the original image according to the length-width ratio and the area threshold of the minimum connected region. The result after filling the hole, marking the minimum connected region, detecting a skin-like color region according to the area and the aspect ratio of the minimum connected region, and segmenting the skin-like color region from the original picture, as shown in fig. 7, 8, 9 and 10 respectively.
And 4, step 4: the skin-like color region (i.e. the result obtained in step 3) segmented from the original image is normalized in size, for example, 200 × 200, the deformable part model of the hand is trained by the method described in 4.1, and then the hand is detected by using the trained deformable part model of the hand, that is, the hand region is detected by using the trained hand model after the size normalization of the result obtained in step 3.
Reference documents:
1.Hasan,H.and S.Abdul-Kareem,Human–computer interaction using vision-based hand gesture recognition systems:a survey.Neural Computing andApplications,2014.25(2):p.251-261.
2.Pisharady,P.K.and M.Saerbeck,Recent methods and databases invision-based hand gesture recognition:A review.Computer Vision and ImageUnderstanding,2015.141:p.152-165.
3.Rautaray,S.S.and A.Agrawal,Vision based hand gesture recognitionfor human computer interaction:a survey.Artificial Intelligence Review,2015.43(1):p.1-54.
4.Peer,P.and F.Solina,An automatic human face detection method.1999.
5.Solina,F.,et al.15seconds of fame-an interactive,computer-visionbased art installation.in Control,Automation,Robotics and Vision,2002.ICARCV2002.7th International Conference on.2002.IEEE.
6.Chen,W.-C.and M.-S.Wang,Region-based and content adaptive skindetection in color images.Internationaljournal of pattern recognition andartificial intelligence,2007.21(05):p.831-853.
7.Ban,Y.,et al.,Face detection based on skin color likelihood.PatternRecognition,2014.47(4):p.1573-1585.
8.Basilio,J.A.M.,et al.,Explicit image detection using YCbCr spacecolor model as skin detection.Applications of Mathematics and ComputerEngineering,2011:p.123-128.
9.Felzenszwalb,P.F.,et al.,Object detection with discriminativelytrained part-based models.IEEEtransactions on pattern analysis and machineintelligence,2010.32(9):p.1627-1645.

Claims (2)

1. a hand region detection method based on a layered structure and a deformable part model, comprising the steps of:
step 1: establishing a gesture picture library;
the gesture picture library includes: y testers predefining x gesture types, wherein each gesture is shot at three distances respectively, and the three distances are d from the camera to the shot person1、d2And d3Shooting each person for three times at the same distance, wherein the persons are respectively shot in the center, the left side and the right side of the image in n environments;
step 2: detecting and segmenting an upper body region of a human body from an input gesture image;
firstly, carrying out bilateral filtering on an extracted image, then extracting Haar wavelet characteristics of a face, a neck, shoulders, elbows and hands, using a Viola-Jones detector and a cascade classifier to train to obtain an upper body model of a human body, and then using the trained upper body model of the human body to detect an upper body region of the human body;
and step 3: establishing a skin color model, and extracting a skin color similar region in the upper half body image of the human body;
the method specifically comprises the following steps:
the method specifically comprises the following steps of establishing a skin color model by adopting an RGB-YCbCr explicit threshold value and a Gaussian model:
(1) detecting a skin color region using an RGB color space explicit threshold;
extracting a hand skin color area of the established gesture library, reading a skin color pixel value, wherein the relationship among RGB three channels of the pixel point is as follows:
Figure FDA0002539644560000011
Figure FDA0002539644560000012
wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure FDA0002539644560000013
extracting minimum values of three channels R, G and B of hand skin color samples respectively;
Figure FDA0002539644560000014
extracting maximum values of three channels R, G and B of hand skin color samples respectively;
Figure FDA0002539644560000015
Figure FDA0002539644560000021
wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure FDA0002539644560000022
respectively taking the minimum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;
Figure FDA0002539644560000023
respectively taking the maximum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;
the threshold of the skin color model in the RGB color space is:
Figure FDA0002539644560000024
skin color pixel point S in RGB color spacergb(r, g, b) satisfies the following requirements:
Figure FDA0002539644560000025
wherein r (r, g, b) is the value of a certain pixel in the RGB color space;
(2) detecting skin tones using YCbCr explicit thresholds;
for YCbCr, where Y is the luminance component, Cb is the blue component, Cr is the red chrominance component, the YCbCr color space is obtained from the RGB color space by matrix transformation as follows:
Figure FDA0002539644560000026
the threshold value satisfied by the skin color area in the YCbCr color space is as follows:
(80≤Cb≤120)∩(133≤Cr≤173) (10)
by
Figure FDA0002539644560000031
Obtained by the formula (9)
Figure FDA0002539644560000032
Wherein the content of the first and second substances,
Figure FDA0002539644560000033
extracting minimum values of Y, Cb and Cr components of hand skin color samples respectively;
Figure FDA0002539644560000034
respectively extracting maximum values of components Y, Cb and Cr of hand skin color samples;
Figure FDA0002539644560000035
in YCbCr color space, region S satisfying skin color conditionycbcr(cb, cr) satisfies the following condition,
Figure FDA0002539644560000036
wherein the content of the first and second substances,
Figure FDA0002539644560000037
respectively representing the lowest threshold, the highest threshold, the lowest threshold and the highest threshold of a Cr channel of a skin color pixel in a YCbCr color space; c (Cb, Cr) is the value of the Cb, Cr component of a pixel in the YCbCr color space;
(3) establishing Gaussian skin color model for skin color detection
The method specifically comprises the following two steps:
step a: obtaining hand skin color samples in a constructed gesture picture library, converting skin color pixels from RGB to YCbCr color space by using a formula (9), performing statistical analysis on the skin color pixels, and obtaining a mean value mu (mu) of a Gaussian model through an elliptic Gaussian joint probability density function and Bayesian maximum likelihood estimationcbμcr) Sum covariance ∑ (∑)cbcr);
The elliptical gaussian joint probability density function is as follows:
Figure FDA0002539644560000038
wherein c is a skin color vector, mu is a mean vector, and ∑ is a covariance matrix;
Figure FDA0002539644560000039
wherein the content of the first and second substances,
Figure FDA00025396445600000310
step b: calculating the similarity degree of the given pixel point and the skin color, namely skin color likelihood, to obtain the maximum skin color likelihood of the image to be detected, wherein the ratio of the two is skin color probability value, so as to obtain a skin color likelihood map, and then thresholding the skin color likelihood map to segment a skin color region;
according to the established Gaussian skin color model, calculating the likelihood degree D (Cb, Cr) of the vector c (Cb, Cr) to be measured and the skin color, wherein the calculation formula is as follows:
D(Cb,Cr)=exp[-0.5(c-μ)T-1(c-μ)](16)
calculating the skin color likelihood D (Cb, Cr), obtaining the maximum skin color likelihood of the image to be detected, wherein the ratio of the maximum skin color likelihood and the maximum skin color likelihood is the skin color probability value, thus obtaining a skin color likelihood image, and thresholding the skin color likelihood image to obtain a binary image of the skin color image;
skin color region S in skin color modelskinTo satisfy high skin colorModel Sg(Cb, Cr), YCbCr explicit threshold Sycbcr(cb, cr) and RGB explicit threshold SrgbThe intersection of (r, g, b), i.e.
Sskin=Sg(Cb,Cr)∩Sycbcr(cb,cr)∩Srgb(r,g,b) (17)
If a certain region simultaneously meets an RGB display threshold, a YCbCr display threshold and a skin color Gaussian model, the region is considered to be a skin color region;
entering the area meeting the skin color model into the step 4, and carrying out morphological processing on the area; images which do not meet the requirements are not processed;
and 4, step 4: segmenting a skin color similar region from the original RGB image by deleting morphological operations such as small-area region and cavity filling in the binary image and aspect ratio and area threshold of the minimum connected region;
and 5: carrying out size normalization on skin-color-like areas segmented from an original image, taking the skin-color-like areas as the input of a deformable part model, training a hand model by using the deformable part model, and detecting the hand areas by using the trained hand model; wherein the deformable part model adopts HOG characteristics;
the method comprises two parts of model training and hand detection, and specifically comprises the following steps:
(1) model training
Step 4, normalizing the skin color area obtained by segmentation to be used as a training sample; the training sample comprises a positive sample and a negative sample, wherein the positive sample is a photo containing a hand area, and the negative sample is a photo not containing the hand area; using the marked positive and negative sample information as the input of a deformable part model, training a hand detection model by using the deformable part model, and generating a hand detection model required by a user;
(2) hand region detection using deformable part model
And 4, after the skin color area is divided out and normalized, detecting the hand area by using the trained deformable part model.
2. A hand region detection device based on a layered structure and a deformable part model comprises a gesture library module, a human body detection module, a skin color detection module, a region adjustment module and a hand detection module;
the gesture library module is used for providing gesture pictures;
the human body detection module carries out bilateral filtering on the input human body whole body gesture image under the complex background, then Haar wavelet characteristics of the face, the neck, the shoulders, the elbows and the hands are extracted, a Viola-Jones detector is used, a cascade classifier is used for training to obtain a human body upper body model, and then the trained human body upper body model is used for detecting a human body upper body region;
the skin color detection module extracts a skin color similar region meeting the skin color model after detecting the human body upper body region by the human body detection module; the method specifically comprises the following steps:
the method specifically comprises the following steps of establishing a skin color model by adopting an RGB-YCbCr explicit threshold value and a Gaussian model:
(1) detecting a skin color region using an RGB color space explicit threshold;
extracting a hand skin color area of the established gesture library, reading a skin color pixel value, wherein the relationship among RGB three channels of the pixel point is as follows:
Figure FDA0002539644560000051
Figure FDA0002539644560000052
wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure FDA0002539644560000053
extracting minimum values of three channels R, G and B of hand skin color samples respectively;
Figure FDA0002539644560000054
extracting maximum values of three channels R, G and B of hand skin color samples respectively;
Figure FDA0002539644560000055
Figure FDA0002539644560000056
wherein p isR,pG,pBThe pixel values of three channels of R, G and B of a certain pixel point are respectively;
Figure FDA0002539644560000057
respectively taking the minimum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;
Figure FDA0002539644560000061
respectively taking the maximum values of the R-G, R-B and G-B channel difference values of the hand skin color samples;
the threshold of the skin color model in the RGB color space is:
Figure FDA0002539644560000062
skin color pixel point S in RGB color spacergb(r, g, b) satisfies the following requirements:
Figure FDA0002539644560000063
wherein r (r, g, b) is the value of a certain pixel in the RGB color space;
(2) detecting skin tones using YCbCr explicit thresholds;
for YCbCr, where Y is the luminance component, Cb is the blue component, Cr is the red chrominance component, the YCbCr color space is obtained from the RGB color space by matrix transformation as follows:
Figure FDA0002539644560000064
the threshold value satisfied by the skin color area in the YCbCr color space is as follows:
(80≤Cb≤120)∩(133≤Cr≤173) (10)
by
Figure FDA0002539644560000065
Obtained by the formula (9)
Figure FDA0002539644560000066
Wherein the content of the first and second substances,
Figure FDA0002539644560000067
extracting minimum values of Y, Cb and Cr components of hand skin color samples respectively;
Figure FDA0002539644560000068
respectively extracting maximum values of components Y, Cb and Cr of hand skin color samples;
Figure FDA0002539644560000071
in YCbCr color space, region S satisfying skin color conditionycbcr(cb, cr) satisfies the following condition,
Figure FDA0002539644560000072
wherein the content of the first and second substances,
Figure FDA0002539644560000073
respectively representing the lowest threshold, the highest threshold, the lowest threshold and the highest threshold of a Cr channel of a skin color pixel in a YCbCr color space; c (Cb, Cr) is the value of the Cb, Cr component of a pixel in the YCbCr color space;
(3) establishing Gaussian skin color model for skin color detection
The method specifically comprises the following two steps:
step a: obtaining hand skin color samples in a gesture picture library, converting skin color pixels from RGB to YCbCr color space by using a formula (9), and performing statistical analysis on the skin color pixelsObtaining the mean value mu (mu) of the Gaussian model by an over-ellipse Gaussian combined probability density function and Bayesian maximum likelihood estimationcbμcr) Sum covariance ∑ (∑)cbcr);
The elliptical gaussian joint probability density function is as follows:
Figure FDA0002539644560000074
wherein c is a skin color vector, mu is a mean vector, and ∑ is a covariance matrix;
Figure FDA0002539644560000075
wherein the content of the first and second substances,
Figure FDA0002539644560000076
step b: calculating the similarity degree of the given pixel point and the skin color, namely skin color likelihood, to obtain the maximum skin color likelihood of the image to be detected, wherein the ratio of the two is skin color probability value, so as to obtain a skin color likelihood map, and then thresholding the skin color likelihood map to segment a skin color region;
calculating a vector c (Cb) to be measured according to the established Gaussian skin color model,Cr) and the likelihood of skin color D (Cb, Cr), the calculation formula is as follows:
D(Cb,Cr)=exp[-0.5(c-μ)T-1(c-μ)](16)
calculating the skin color likelihood D (Cb, Cr), obtaining the maximum skin color likelihood of the image to be detected, wherein the ratio of the maximum skin color likelihood and the maximum skin color likelihood is the skin color probability value, thus obtaining a skin color likelihood image, and thresholding the skin color likelihood image to obtain a binary image of the skin color image;
skin color region S in skin color modelskinTo satisfy the skin color Gaussian model Sg(Cb, Cr), YCbCr explicit threshold Sycbcr(cb, cr) and RGB explicit threshold SrgbThe intersection of (r, g, b), i.e.
Sskin=Sg(Cb,Cr)∩Sycbcr(cb,cr)∩Srgb(r,g,b) (18)
If a certain region simultaneously meets an RGB display threshold, a YCbCr display threshold and a skin color Gaussian model, the region is considered to be a skin color region;
entering the region meeting the skin color model into a region adjusting module, and carrying out morphological processing on the region; images which do not meet the requirements are not processed;
the area adjusting module is used for adjusting the result of skin color detection and segmenting the skin color-like area from the original RGB image by deleting the minimum area in the binary image, filling the cavity morphological operation, and the length-width ratio and the area threshold of the minimum connected area;
the hand detection module performs size normalization on the skin-color-like region segmented by the region adjustment module, the skin-color-like region is used as the input of a deformable part model, the deformable part model is used for training a hand model, and the trained hand model is used for detecting a hand region; wherein the deformable part model adopts HOG characteristics;
the method comprises two parts of model training and hand detection, and specifically comprises the following steps:
(1) model training
The area adjustment module divides skin color areas to be normalized and then the skin color areas are used as training samples; the training sample comprises a positive sample and a negative sample, wherein the positive sample is a photo containing a hand area, and the negative sample is a photo not containing the hand area; using the marked positive and negative sample information as the input of a deformable part model, training a hand detection model by using the deformable part model, and generating a hand detection model required by a user;
(2) hand region detection using deformable part model
After the skin color area is divided by the area adjusting module and normalized, the hand area is detected by using the trained deformable part model.
CN201710035087.XA 2017-01-17 2017-01-17 Hand region detection method and device based on layered structure and deformable part model Active CN106909884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710035087.XA CN106909884B (en) 2017-01-17 2017-01-17 Hand region detection method and device based on layered structure and deformable part model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710035087.XA CN106909884B (en) 2017-01-17 2017-01-17 Hand region detection method and device based on layered structure and deformable part model

Publications (2)

Publication Number Publication Date
CN106909884A CN106909884A (en) 2017-06-30
CN106909884B true CN106909884B (en) 2020-08-04

Family

ID=59207252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710035087.XA Active CN106909884B (en) 2017-01-17 2017-01-17 Hand region detection method and device based on layered structure and deformable part model

Country Status (1)

Country Link
CN (1) CN106909884B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107846555A (en) * 2017-11-06 2018-03-27 深圳慧源创新科技有限公司 Automatic shooting method, device, user terminal and computer-readable storage medium based on gesture identification
CN110070478B (en) * 2018-08-24 2020-12-04 北京微播视界科技有限公司 Deformation image generation method and device
CN110827308A (en) * 2019-11-05 2020-02-21 中国医学科学院肿瘤医院 Image processing method, image processing apparatus, electronic device, and storage medium
CN111639641B (en) * 2020-04-30 2022-05-03 中国海洋大学 Method and device for acquiring clothing region not worn on human body
CN111679737B (en) * 2020-05-27 2022-06-21 维沃移动通信有限公司 Hand segmentation method and electronic device
US11488376B2 (en) * 2021-02-15 2022-11-01 Sony Group Corporation Human skin detection based on human-body prior

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719015A (en) * 2009-11-03 2010-06-02 上海大学 Method for positioning finger tips of directed gestures
CN102142084A (en) * 2011-05-06 2011-08-03 北京网尚数字电影院线有限公司 Method for gesture recognition
CN103295021A (en) * 2012-02-24 2013-09-11 北京明日时尚信息技术有限公司 Method and system for detecting and recognizing feature of vehicle in static image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012139241A1 (en) * 2011-04-11 2012-10-18 Intel Corporation Hand gesture recognition system
US9524028B2 (en) * 2013-03-08 2016-12-20 Fastvdo Llc Visual language for human computer interfaces
CN104680127A (en) * 2014-12-18 2015-06-03 闻泰通讯股份有限公司 Gesture identification method and gesture identification system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719015A (en) * 2009-11-03 2010-06-02 上海大学 Method for positioning finger tips of directed gestures
CN102142084A (en) * 2011-05-06 2011-08-03 北京网尚数字电影院线有限公司 Method for gesture recognition
CN103295021A (en) * 2012-02-24 2013-09-11 北京明日时尚信息技术有限公司 Method and system for detecting and recognizing feature of vehicle in static image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deformable Part Model Based Hand Detection against Complex Backgrounds;Chunyu Zou;《IGTA 2016:Advances in image and Graphics Technologies》;20160809;第149-159页 *
结合手指检测和HOG特征的分层静态手势识别;刘淑萍 等;《中国图象图形学报》;20150630;第20卷(第6期);第0781-0788页 *

Also Published As

Publication number Publication date
CN106909884A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106909884B (en) Hand region detection method and device based on layered structure and deformable part model
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN104268583B (en) Pedestrian re-recognition method and system based on color area features
Christa et al. CNN-based mask detection system using openCV and MobileNetV2
Huang et al. Regions of interest extraction from color image based on visual saliency
CN110956099B (en) Dynamic gesture instruction identification method
Rahim et al. Hand gesture recognition based on optimal segmentation in human-computer interaction
CN106909883A (en) A kind of modularization hand region detection method and device based on ROS
Moallem et al. Fuzzy inference system optimized by genetic algorithm for robust face and pose detection
Mahmood et al. A Comparative study of a new hand recognition model based on line of features and other techniques
Kheirkhah et al. A hybrid face detection approach in color images with complex background
CN110046544A (en) Digital gesture identification method based on convolutional neural networks
Vishwakarma et al. Simple and intelligent system to recognize the expression of speech-disabled person
Tarvekar Hand gesture recognition system for touch-less car interface using multiclass support vector machine
CN108345835B (en) Target identification method based on compound eye imitation perception
Fernando et al. Low cost approach for real time sign language recognition
Hiremath et al. Detection of multiple faces in an image using skin color information and lines-of-separability face model
Youlian et al. Face detection method using template feature and skin color feature in rgb color space
Yusuf et al. Human face detection using skin color segmentation and watershed algorithm
Singh et al. Template matching for detection & recognition of frontal view of human face through Matlab
Sharif et al. Real time face detection
Işikdoğan et al. Automatic recognition of Turkish fingerspelling
Meshram et al. Convolution Neural Network based Hand Gesture Recognition System
Singh et al. Indian sign language recognition using color space model and thresholding
Patravali et al. Skin segmentation using YCBCR and RGB color models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant