WO2017092431A1

WO2017092431A1 - Human hand detection method and device based on skin colour

Info

Publication number: WO2017092431A1
Application number: PCT/CN2016/096982
Authority: WO
Inventors: 李艳杰
Original assignee: 乐视控股（北京）有限公司; 乐视致新电子科技（天津）有限公司
Priority date: 2015-12-01
Filing date: 2016-08-26
Publication date: 2017-06-08
Also published as: CN105893925A

Abstract

Provided is a human hand detection method based on a skin colour. The method comprises: converting an acquired image to be detected from an RGB colour space into an HSV colour space to acquire an HSV image, and converting the image to be detected from the RGB colour space into an r-g colour space to acquire an r-g image; converting the HSV image into a first binary image, and converting the r-g image into a second binary image; performing a bitwise AND operation on the first binary image and the second binary image so as to obtain a comprehensive binary image; filtering the comprehensive binary image to acquire an optimized binary image; analysing a maximum connected region in the optimized binary image, and taking the maximum connected region as a skin region; using a pre-trained K neighbour classifier to determine whether the maximum connected region is in a hand shape, thereby realizing human hand recognition. The method has a rapid detection speed, and effectively avoids the human hand error detection in gesture recognition.

Description

Skin color based hand detection method and device

The present application claims priority to Chinese Patent Application No. 201510870145.1, filed on Dec. 1, 2015, the entire disclosure of which is hereby incorporated by reference. in.

Technical field

The present application relates to the field of computer vision, and in particular, to a human hand detection method and apparatus based on skin color.

Background technique

In various machine vision systems related to people, gesture recognition is increasingly being valued. For example, in a gesture-based human-computer interaction system, it is necessary to first acquire the position of the hand in the image. The most common method currently used is to obtain gesture information by detecting the skin color. Splitting your hand from the image, the most common segmentation method at present is based on skin color segmentation.

Depending on whether there is a process involving imaging, skin color detection methods fall into two basic types: statistical-based methods and physics-based methods. The statistic-based skin color detection method mainly uses skin color statistical model to detect skin color, which mainly includes two steps: color space transformation and skin color modeling; physics-based method introduces the interaction between light and skin in skin color detection, through research Skin color reflection model and spectral characteristics for skin color detection.

However, in the existing statistical-based skin color detecting method, the recognition efficiency of the human hand shape is low, the false detection rate is high, and it is very susceptible to illumination, thereby causing the accuracy of gesture recognition to be limited.

Therefore, a fast and high-quality human hand detection method needs to be proposed.

Summary of the invention

The embodiment of the present invention provides a human hand detection method and device based on skin color, which is used to solve the defects in the prior art that the skin color detection and the human hand recognition method based on statistics are low in efficiency, high in false detection rate, and highly susceptible to illumination. The recognition of human hand based on skin color detection is highly efficient and accurate, thereby further improving the accuracy of gesture recognition.

The embodiment of the present application provides a human hand detection method based on skin color, including:

Converting the acquired image to be detected from the RGB color space to the HSV color space to obtain an HSV image, and converting the image to be detected from the RGB color space to the r-g color space to obtain an r-g image;

Traversing each pixel in the HSV image and converting the HSV image into a first binary image according to a pre-established HSV histogram model, and traversing each pixel in the rg image Converting the rg image into a second binary image according to a pre-established mixed Gaussian model;

Performing a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

Filtering the integrated binary image to obtain an optimized binary image;

Analyzing a maximum connected area in the optimized binary image, and using the largest connected area as a skin area;

The pre-trained K-nearest neighbor classifier is used to determine whether the largest connected area is a hand shape, thereby realizing human hand recognition.

The embodiment of the present application provides a human hand detecting device based on skin color, including:

An image conversion module, configured to convert the acquired image to be detected from an RGB color space to an HSV color space to acquire an HSV image, and convert the image to be detected from an RGB color space to an r-g color space to obtain an r-g image;

a binary map obtaining module, configured to traverse each pixel in the HSV image, and convert the HSV image into a first binary image according to a pre-established HSV histogram model, and traverse the read Each pixel in the rg image converts the rg image into a second binary image according to a pre-established mixed Gaussian model;

a bitwise operation module, configured to perform a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

a filtering module, configured to filter the integrated binary image to obtain an optimized binary image;

a connected area judging module, configured to analyze a largest connected area in the optimized binary image, and use the largest connected area as a skin area;

The human hand identification module is configured to determine whether the maximum connected area is a hand shape using a pre-trained K-nearest neighbor classifier, thereby realizing human hand recognition.

An embodiment of the present application provides an electronic device, including the skin color based person according to any of the foregoing embodiments. Hand detection method.

The embodiment of the present application provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium can store computer instructions, which can implement the skin color based hand provided by the embodiments of the present application. Part or all of the steps in each implementation of the detection method.

An embodiment of the present application provides an electronic device, including: one or more processors; and a memory; wherein the memory stores instructions executable by the one or more processors, the instructions being set to A method for detecting a human hand based on skin color according to any of the above-mentioned applications of the present application.

An embodiment of the present application provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to perform the human skin detection method based on any of the above-mentioned embodiments of the present application.

The skin color detecting method and device provided by the embodiments of the present application achieve high-accuracy detection of the skin region by comprehensively applying the HSV histogram, the mixed Gaussian model, the filtering denoising, and the connected domain extraction method, and at the same time, through the K-nearest neighbor The classifier enables fast and accurate manual extraction.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present application can also obtain other drawings based on these drawings without departing from the prior art by those skilled in the art.

1 is a technical flowchart of Embodiment 1 of the present application;

2 is a technical flowchart of Embodiment 2 of the present application;

3 is a technical flowchart of Embodiment 3 of the present application;

4 is a schematic structural diagram of a device according to Embodiment 4 of the present application;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

In the following, the technical solutions in the embodiments of the present application are clearly and completely described in conjunction with the drawings in the embodiments of the present application. The embodiments are a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the various embodiments of the present application do not exist independently, and several embodiments may be added to each other in combination or in combination.

Embodiment 1

1 is a technical flowchart of Embodiment 1 of the present application. Referring to FIG. 1, a human hand detection method based on skin color according to an embodiment of the present application includes the following steps:

Step 110: Convert the acquired image to be detected from the RGB color space to the HSV color space to obtain an HSV image, and convert the image to be detected from the RGB color space to the r-g color space to obtain an r-g image;

In order to make the logical description clearer, the following steps are divided into two steps: step 111 and step 112. It should be noted that there is no order between step 111 and step 112, and the following description is performed in the order of Does not constitute a limit.

Step 111: Convert the acquired image to be detected from the RGB color space to the HSV color space to obtain an HSV image.

The RGB color space is obtained by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other. RGB stands for red and green. The color values of the three channels of blue. This standard includes almost all colors that human vision can perceive.

HSV (HueSaturation Value) The color space is a color space created based on the intuitive characteristics of the color. H, S, and V represent hue, saturation, and brightness, respectively. Converting the image to be detected from RGB color space to HSV color space overcomes the influence of illumination changes on skin color detection to some extent.

In the HSV color space, the hue H represents color information, that is, the position of the spectral color in which it is located. H is measured by angle, ranging from 0° to 360°. It is calculated from the red counterclockwise direction, red is 0°, green is 120°, and blue is 240°. Their complementary colors are: 60° for yellow, 180° for cyan, and 300° for magenta; S is the ratio between the purity of the selected color and the purity of the color. The value of S ranges from 0.0 to 1.0, the larger the value, the more saturated the color, when S=0, only the gray scale; the brightness V is usually measured in percentage, from 0% (black) to 100% (white). Both RGB and CMY color models are hardware oriented The HSV (Hue Saturation Value) color model is user-oriented. The three-dimensional representation of the HSV model evolved from the RGB cube. Imagine looking at the hexagonal shape of the cube from the white vertices of the RGB along the cube's diagonal to the black vertices. The hexagonal boundary represents color, the horizontal axis represents purity, and the brightness is measured along the vertical axis.

In the embodiment of the present application, the image to be detected is converted from the RGB color space to the HSV color space by using the following formula:

V=max(R, G, B)

Where R is the red value of the pixel, G is the green value of the pixel, B is the blue value of the pixel; max() indicates the maximum value operation, and min() indicates the minimum value operation V is the maximum value among R, G, and B; H, S, and V are the color values corresponding to the pixel points after the conversion, respectively.

Step 112: Convert the image to be detected from an RGB color space to an r-g color space to obtain an r-g image.

In the embodiment of the present application, the RGB image is converted from the RGB color space to the r-g color space by using the following formula:

b=1-g-r

Where R is the red value of the pixel, G is the green value of the pixel, B is the blue value of the pixel; r, g, b are the color values corresponding to the pixel after conversion .

The RGB color space here refers to a variety of colors by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other. Usually, RGB has 256 levels each. Brightness, expressed as numbers from 0, 1, 2... up to 255. An RGB color value specifies the relative brightness of the three primary colors of red, green, and blue, producing a specific color for display, that is, any one color can be recorded and expressed by a set of RGB values. For example, the RGB value corresponding to a pixel is (149, 123, 98), and the color of this pixel is a superposition of different brightnesses of the three colors of RGB.

In the embodiment of the present application, the RGB value corresponding to each pixel in the picture can be directly obtained by using OpenCv, and the implementation code can be like this:

CvScalar p;

p=cvGet2D(ImageIn,j,i);

Double a=p.val[0];

Double b=p.val[1];

Double c=p.val[2];

Where i and j are the horizontal and vertical coordinates of the pixel on the image respectively; channels 0, 1, and 2 correspond to the brightness values of the three colors of blue, green, and red, respectively;

After the value pixel value is converted from RGB space to r-g space, the influence of illumination changes on skin color detection can be overcome to some extent. In the embodiment of the present application, converting the color space from RGB to r-g is actually a normalization process for RGB colors. In this normalization process, when a pixel is affected by light or shadow and the color channel R, G, and B values change, the numerator and denominator in the normalization formula change simultaneously, and the normalized value obtained actually The float is not large, this transformation removes the information of the light from the image, thus reducing the effects of lighting.

For example, the pixel value of pixel A at the time T1 before normalization is: RGB (30, 60, 90), and at time T2, the color values of the three color channels of RGB are changed due to the influence of illumination, and the pixel value of pixel A is changed. It becomes RGB (60, 120, 180).

After the normalization formula is converted into rg space, the pixel value of pixel A at time T1 is: RGB (1/6, 1/3, 2/3), and the pixel value of pixel A at time T2 is: RGB (1/ 6, 1/3, 2/3). It can be seen that the values of the normalized RGB at the time of T1 and T2 do not change.

Step 120: traversing and reading each pixel in the HSV image, and converting the HSV image into a first binary image according to a pre-established HSV histogram model, and traversing and reading each of the rg images a pixel point, converting the rg image into a second binary image according to a pre-established mixed Gaussian model;

The following sections are more clearly described for the sake of clarity, and step 120 is split into five steps: step 121 to step 125. There is no fixed sequence in the actual implementation of the steps 122 to 125, and the embodiment of the present application is not limited.

Step 121: Read an HSV value of the pixel, and calculate a matching probability value between the HSV value and an HSV histogram model of the skin pixel and an HSV histogram model of the non-skin pixel, respectively, according to the matching. The degree value determines whether the pixel belongs to a skin area;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image. Among them, x generally takes 255, and y generally takes 0.

The pre-trained HSV histogram model stores a histogram distribution of HSV values of skin pixels and non-skin pixels. This distribution is used as a reference for determining whether a new pixel is a skin pixel in the embodiment of the present application. . The implementation is: reading an HSV value of the pixel in the image to be detected, and calculating a matching probability value between the HSV value and the HSV histogram model of the skin pixel and the HSV histogram model of the non-skin pixel, respectively. And determining, according to the matching degree value, whether the pixel point belongs to a skin area.

In this embodiment, by converting the RGB image into the HSV color space, when the skin color detection is performed, the detection result has a certain stability to the change of the illumination.

S122: calculating a first probability density of the pixel point under the skin-mixed Gaussian model and a second probability density of the pixel point under the non-skin mixed Gaussian model;

The mixed Gaussian model GMM, also known as MOG, is an extension of the single Gaussian model, which uses K (basically 3 to 10) Gaussian models to characterize the individual pixels in the image.

The formula for the single Gaussian model is as follows:

Where x is the d-dimensional Euclidean space, a is the mean vector of the single Gaussian model, S is the covariance matrix of the single Gaussian model, () ^T represents the transpose operation of the matrix, and () ^-1 represents the inverse of the matrix .

The formula of the mixed Gaussian model is formed by adding K single Gaussian models according to the weights, and is expressed by the following formula:

Where π _k is the weight of the kth Gaussian model, m is the number of preset Gaussian models, and p _k (x) is the kth single Gaussian model. Among them, for the kth single Gaussian model, the formula is expressed as follows:

As mentioned above, x belongs to d-dimensional Euclidean space, m is the number of preset Gaussian models, p _k (x) is the probability density of the k-th Gaussian model, and a _k is the mean of the k-th Gaussian model. Vector, S _k is the covariance matrix of the kth Gaussian model, and π _k is the weight of the kth Gaussian model;

It should be noted that the actual calculation results of p(x; a _k , S _k , π _k ) and p _k (x) characterize the probability density of x under the corresponding model.

In the embodiment of the present application, a mixed Gaussian model is established for the skin pixel and the non-skin pixel respectively, and the formulas of the two models are the same, except that the parameters in the model, that is, the mean vector a _k and the covariance matrix S _{k are} different.

For each pixel in the image to be detected, the embodiment of the present application calculates its first probability density under the skin-mixed Gaussian model, and calculates its second probability density under the non-skin mixed Gaussian model until all pixel points are traversed.

In the embodiment of the present application, the traversing process may be traversing by column by column, or may randomly select a pixel to determine whether it is a pixel of the skin region, and if so, first within a certain size neighborhood thereof Pixels are traversed, and the application is not limited.

When the mean vector of the skin-mixed Gaussian model is a _k1 , the covariance matrix is S _{k1 ,} and the weights of the plurality of single Gaussian models respectively correspond to π _k1 ,

When the mean vector of the non-skin mixed Gaussian model is a _k2 , the covariance matrix is S _{k2 ,} and the weights corresponding to the multiple single Gaussian models are respectively π _k2 ,

S123: Calculating the image according to the first probability density and the second probability density of the pixel point The posterior probability that the prime point belongs to the skin area;

In the embodiment of the present application, the calculation formula of the posterior probability is as follows:

Where P is the value of the posterior probability, p _skin is the first probability density; p _non-skin is the second probability density.

S124: When determining that the posterior probability is greater than a preset posterior probability threshold, assigning the pixel to the skin region;

Preferably, the embodiment of the present application sets the posterior probability threshold to 0.5, that is, when the value of the posterior probability exceeds 0.5, it is determined that the pixel corresponding to the posterior probability belongs to the skin region. The posterior probability threshold of 0.5 is an empirical value. It is judged by a large number of experiments that if a pixel point belongs to the skin pixel, the posterior probability exceeds 0.5, and this pixel belongs to the skin area of the image. Certainly, according to different picture samples, the posterior probability threshold may also be dynamically adjusted, and the application is not limited thereto.

S125: If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image. And the second binary image.

In step 120 of the embodiment of the present application, (x, y) = (255, 0), that is, 255 is the skin pixel point assignment, and 0 is the non-skin pixel point assignment, then the HSV histogram model is obtained respectively. The first binary image and the second binary image under the mixed Gaussian model are described.

Step 130: performing a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image.

Specifically, the operation principle of the bitwise AND operation is that if both numbers in the same position are 1, the operation result is 1; if one is not 1, the operation result is 0. In the embodiment of the present application, if the pixel point belongs to the skin area by the matching result of the HSV histogram model and the mixed Gaussian model for the same pixel, the result of the bitwise operation is The pixel belongs to the skin pixel; if the matching result of the HSV histogram model and the mixed Gaussian model is inconsistent, the result of the bitwise operation is that the pixel belongs to a non-skin pixel.

Using the bit-and-computation to combine the two test results, the result of more accurate detection is obtained, and the probability of false detection is reduced.

Step 140: Filter the integrated binary image to obtain an optimized binary image.

In the embodiment of the present application, the integrated binary image is denoised by median filtering to remove some scattered pixel points in the binarized image, thereby improving the efficiency of subsequently searching for the connected region.

Median filtering is a very mature algorithm, which can eliminate the noise of the image. The basic principle is that the pixel value of a certain position in the target image depends on the same position of the original image and the pixel value in the vicinity, for example, the pixel of a certain position of the original image. There are 9 pixels in the vicinity thereof, and after sorting the 9 pixel values, the pixel value located in the middle is taken as the pixel value of the target image pixel.

Step 150: Analyze a largest connected area in the optimized binary image, and use the largest connected area as a skin area.

A Connected Component generally refers to an image region (Region, Blob) composed of foreground pixel points having the same pixel value and adjacent in the image. Connected Component Analysis (Connected Component Analysis) refers to finding and marking each connected area in an image. Usually, the object of the connected area analysis processing is a binarized image.

It can be known from the definition of the connected area that a connected area is composed of adjacent pixels having the same pixel value, so that the connected area can be found in the image by these two conditions, and each connected area is given A unique label (Label) to distinguish other connected areas.

Common algorithms for connected region analysis are Two-Pass and Seed-Filling.

The two-pass scanning method, as its name suggests, means that by scanning two images, all connected areas existing in the image can be found and marked. The main implementation idea is as follows: a label is given to each pixel position during the first scan, and one or more different labels may be assigned to the pixel set in the same connected area during the scanning process, so these need to be connected to the same one. Regions but labels with different values are merged, that is, the equality relationship between them is recorded; the second pass scan is to classify the pixels marked by equal_labels with equal relationship into one connected region and give the same label (usually this label is equal_labels) The minimum value).

The seed filling method is derived from computer graphics and is often used to fill a graphic. The main idea is to select a foreground pixel as a seed, and then merge the foreground pixels adjacent to the seed into the same pixel set according to the two basic conditions of the connected region (the pixel values are the same and the positions are adjacent). The set of pixels is a connected area.

The pixel neighboring relationship in the connected area mainly has 4 neighborhoods and 8 neighborhoods. In the embodiment of the present application, the 4th neighborhood is used to analyze the largest connected area in the optimized binary image.

Step 160: Determine whether the largest connected area is a hand shape by using a pre-trained K-nearest neighbor classifier, thereby implementing recognition of a gesture.

The K-nearest neighbor classifier is a very mature classifier. The principle is that if the number of data of the i-th class is the majority of the M data closest to a certain data, the data belongs to the i-th class. The data is generally a vector that can represent the characteristics of the class.

The key to pre-training the K-nearest neighbor classifier is to extract the features of the sample pictures and classify the sample pictures into different classes based on these characteristics. The embodiment of the present application selects the following four features:

Feature 1: Ratio of the square of the perimeter of the connected area to the area;

Feature 2: area of the connected area;

Feature 3: the probability mean value of the connected region pixels obtained by the GMM (mixed Gaussian model) belonging to the skin region;

Feature 4: the mean probability of the connected region pixels obtained by the HSV histogram model belonging to the skin;

The feature 3 and the feature 4 are calculated by calling the HSV histogram model and the GMM hybrid Gaussian model which are pre-trained in the embodiment of the present application, and are not described here.

In the embodiment of the present application, the pre-trained K-nearest neighbor classifier obtains samples of the hand region and the non-hand region by using a certain number of hand-shaped and non-hand-shaped image samples and calculating features 1 to 4 of the largest connected region. For a connected graph to be detected, the above features 1 to 4 are extracted, and based on the statistical results of the samples, whether the human hand region is included in the connected graph can be determined.

The specific implementation may be such that the similarity ratios of the features 1 to 4 in the connectivity graph to the feature 1 to the feature 4 in the K neighbor neighbor classifier are determined one by one, and a reasonable threshold is set for the similarity rate. When the similarity ratio is greater than the threshold, it is determined that the connected graph to be detected includes a human hand region.

In this implementation, the skin pixels in the image to be detected are identified based on the HSV histogram and the GMM model detection by the image to be detected; further, the comprehensive operation and filtering of the two different model detection methods are used to obtain the The optimized binary image corresponding to the image to be detected; through the analysis of the maximum connected region and the judgment of the K neighborhood classifier, the hand shape recognition is accurately realized, and the speed detection and the error detection of the hand shape in the prior art are effectively solved. Thereby indirectly improving the efficiency of gesture recognition in human-computer interaction.

Embodiment 2

2 is a technical flowchart of the second embodiment of the present application. In conjunction with FIG. 2, in the human hand detection method based on skin color, the training of the HSV histogram model is mainly implemented by the following steps:

Step 210: Perform marking of the skin region and the non-skin region on the sample image to obtain a skin pixel sample and a non-skin pixel sample;

The marking of the sample can be done manually to ensure a high degree of accuracy of the sample.

Step 220: Convert the skin pixel sample and the non-skin pixel sample from an RGB color space to an HSV color space to obtain a skin HSV pixel sample and a non-skin HSV pixel sample;

The specific implementation formula and the technical effect of the conversion from the RGB color space to the HSV color space are shown in step 110 of the first embodiment, and are not described herein again.

Step 230: Statistics the HSV value of the skin HSV pixel sample, and establish an HSV histogram model of the skin pixel according to the distribution of the HSV value of the skin HSV pixel sample;

In this step, the frequency distribution of the H value (hue), S value (saturation), and V value (brightness) is separately calculated for the pixel points of the skin sample, thereby establishing an HSV histogram model of the skin pixel, and at the same time The same operation is performed on the pixels of the non-skin sample.

It should be noted that the core of the present application is that the gray level of the HSV histogram model is compressed according to a preset proportional relationship to obtain an optimized histogram statistical effect.

H, S and V channels each have 256 gray levels, if all of the gray level histogram of length ^224, is approximately 16 million, this effect can not be obtained when good statistical sample size is not large enough. Therefore, the embodiment of the present application compresses the length of the histogram, and the ratio of compression can be selected according to experience. In this embodiment, the H channel is compressed by 64 gray levels in a ratio of 4:2:1, the S channel is compressed to 32 gray levels, and the V channel is compressed to 16 gray levels, and the histogram after compression. The length is 2 ¹⁵ , which is 65536. The HSV uses three different gray levels for the three channels, because the three channels of HSV are affected by the light intensity, the H (chrominance) channel is not affected by the illumination change, the V channel is proportional to the change of the light intensity, and the S channel is illuminated. The degree of influence is somewhere in between.

By compressing the gray level of the histogram, high-accuracy skin color detection can be performed even in the case of a small number of samples.

Step 240: Statistics the HSV value of the non-skin HSV pixel sample, and according to the non-skin HSV The distribution of HSV values for pixel samples establishes an HSV histogram model of non-skin pixels.

The execution process and technical effect of establishing the HSV histogram model for the non-skin pixel samples are the same as the above step 230, and will not be described here. It should be noted that there is no actual order in the step 230 and the step 240. The embodiment of the present application is not limited.

In this embodiment, the HSV histogram model of the skin pixel and the non-skin pixel is respectively established by training the skin sample and the non-skin sample and compressing the gray level of the HSV histogram, even if the number of training samples is small, Reduce the false detection rate of skin pixels.

Embodiment 3

3 is a technical flowchart of Embodiment 3 of the present application. In conjunction with FIG. 2, in a human hand detection method based on skin color, the establishment of a mixed Gaussian model (GMM) mainly includes the following steps:

Step 310: Mark a skin pixel area and a non-skin pixel area of the RGB sample picture to obtain a skin pixel sample and a non-skin pixel sample.

In the embodiment of the present application, the RGB sample picture is first marked, which may be artificial, to distinguish the skin area and the non-skin area in the picture, that is, the skin pixel sample and the non-skin pixel sample are obtained. Pre-classifying the samples helps to improve the efficiency of the subsequent EM algorithm in calculating the parameters of the mixed Gaussian model and how close the parameters are to the actual model.

Step 320: Convert the skin pixel sample and the non-skin pixel sample from an RGB color space to an r-g color space;

The conversion method in this step is the same as that described in the first embodiment, and the following formula is adopted:

b=1-g-r

Step 330: Calculate parameters of the skin pixel mixed Gaussian model and the non-skin pixel mixed Gaussian model according to the skin space converted skin sample and the non-skin pixel sample, respectively, using an expectation maximization algorithm. The parameters include a _k , S _k and π _k .

The mixed Gaussian model is a superposition of multiple single Gaussian models. In the mixed Gaussian model, the weight of each single Gaussian model is different, that is, the data in the mixed Gaussian model is generated from several single Gaussian models. The number K of the single Gaussian model needs to be set in advance, and π _k is the weight of each single Gaussian model.

In statistical calculations, the Expectation Maximization (EM) algorithm is an algorithm for finding a parameter maximum likelihood estimate or a maximum a posteriori estimate in a probabilistic model, where the probability model relies on an unobservable hidden variable. When some data is missing or unobservable, the EM algorithm provides an efficient iterative procedure to calculate the maximum likelihood estimate for these data. The iteration is divided into two steps at each step: the Expectation step and the Maximization step, hence the EM algorithm. The EM algorithm is a very mature algorithm and the derivation process is complicated, which is not described in detail in the embodiment of the present application.

Step 340: Establish a mixed Gaussian model according to the mixed Gaussian model formula.

According to the labeled skin pixel samples, combined with the EM algorithm, the mean vector a _k1 of the skin mixed Gaussian model, the covariance matrix S _k1 and the weights π _k1 corresponding to the multiple single Gaussian models can be calculated and substituted into the mixed Gaussian model formula. Get the skin mixed Gaussian model as:

According to the labeled non-skin pixel samples, combined with the EM algorithm, the mean vector a _k2 of the non-skin mixed Gaussian model, the covariance matrix S _{k2 ,} and the weights π _k2 corresponding to the plurality of single Gaussian models respectively can be calculated, and the obtained non-skin mixture is obtained. The Gaussian model is:

When a new picture to be detected is read, each pixel of the picture to be detected is read after the color space is transformed, and the pixel is substituted into the two models, and the pixel points are respectively calculated. _Skin and p _non-skin .

In this embodiment, by combining the skin area and the non-skin area of a small number of sample pictures, the EM algorithm is used to establish a mixed Gaussian model of skin pixels and non-skin pixels, and the prior art based on the histogram Compared with skin color detection, a large number of training samples are not needed, which saves various resource consumption and improves the efficiency of skin color detection.

It should be noted that, in the embodiment of the present application, the establishment of the HSV histogram model and the establishment of the mixed Gaussian model are not sequential, and the matching process between the image to be detected and any of the above two models is also in no order. . The layout of the various embodiments of the present application is merely illustrative of the respective establishment processes of the two models, and the order of use of the order in which they are established is not limited.

Finally, it should be understood that those skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-transitory computer. In a readable storage medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Embodiment 4

4 is a technical flowchart of Embodiment 4 of the present application. Referring to FIG. 4, a human hand detection method based on skin color mainly includes the following large modules: an image conversion module 410, a binary image acquisition module 420, and a bitwise position. The arithmetic module 430, the filtering module 440, the connected area determining module 450, and the model training module 460.

The image conversion module 410 is configured to convert the acquired image to be detected from an RGB color space to an HSV color space to acquire an HSV image, and convert the image to be detected from an RGB color space to an rg color space to obtain an rg image. ;

The binary map obtaining module 420 is configured to traverse each pixel in the HSV image and call the HSV histogram model pre-established by the model training module 460 to convert the HSV image into the first two a value image, and traversing each pixel point in the rg image, calling a mixed Gaussian model pre-established by the model training module to convert the rg image into a second binary image;

The bitwise operation module 430 is configured to perform a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

The filtering module 440 is configured to filter the integrated binary image to obtain an optimized binary image;

The connected area determining module 450 is configured to analyze a largest connected area in the optimized binary image, and use the largest connected area as a skin area.

Specifically, the model training module 460 is configured to:

Marking the skin image and the non-skin area of the sample image to obtain a skin pixel sample and a non-skin pixel sample;

Invoking the image conversion module 410 to convert the skin pixel sample and the non-skin pixel sample from an RGB color space to an HSV color space to obtain a skin HSV pixel sample and a non-skin HSV pixel sample;

Calculating an HSV value of the skin HSV pixel sample, and establishing an HSV histogram model of the skin pixel according to a distribution of HSV values of the skin HSV pixel sample;

Counting an HSV value of the non-skin HSV pixel sample, and establishing an HSV histogram model of the non-skin pixel according to a distribution of HSV values of the non-skin HSV pixel sample;

Specifically, the model training module 460 is further configured to:

Calling the image conversion module 410 to convert the skin pixel sample and the non-skin pixel sample from the RGB color space to the r-g color space to obtain an r-g skin pixel sample and an r-g non-skin pixel sample;

Calculating parameters of the skin pixel mixed Gaussian model and the non-skin pixel mixed Gaussian model according to the rg skin pixel sample and the rg non-skin pixel sample, respectively, using an expectation maximization algorithm to establish the skin pixel hybrid Gauss The model and the non-skin pixel mixed Gaussian model, wherein the parameters include a mean vector, a covariance matrix, and a weight of each Gaussian model in the mixed Gaussian model.

Specifically, the binary map obtaining module 420 is further configured to:

Reading an HSV value of the pixel, calculating a matching probability value of the HSV value with an HSV histogram model of the skin pixel and an HSV histogram model of the non-skin pixel, respectively, and determining according to the matching degree value Whether the pixel points belong to a skin area;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel belongs to the skin region, the pixel is assigned with y, thereby obtaining the first binary image;

The binary map obtaining module 420 is further configured to:

Calculating a first probability density of the pixel point under a skin-mixed Gaussian model and a second probability density of the pixel point under a non-skin mixed Gaussian model;

Calculating a posterior probability that the pixel belongs to a skin region according to the first probability density of the pixel point and the second probability density;

When the posterior probability is determined to be greater than a preset posterior probability threshold, the pixel point is attributed to the skin region;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image and the The second binary image is described.

Specifically, the connected area determining module 450 is further configured to:

The pre-trained K-nearest neighbor classifier is used to determine whether the largest connected area is a hand shape, thereby realizing recognition of a gesture.

The implementation process and technical effects of the embodiment corresponding to FIG. 4 are the same as those of the embodiment corresponding to FIG. 1, FIG. 2, and FIG. 3, and details are not described herein again.

In another embodiment of the present application, there is also provided an electronic device comprising the skin color based human hand detecting device according to any of the preceding embodiments.

In another embodiment of the present application, a non-transitory computer readable storage medium is also provided, the non-transitory computer readable storage medium storing computer executable instructions executable by any of the above methods The skin color based human hand detection method in the example.

FIG. 5 is a schematic diagram of a hardware structure of an electronic device for performing a skin-based human hand detection method according to an embodiment of the present application. As shown in FIG. 5, the device includes:

One or more processors 510 and memory 520, one processor 510 is taken as an example in FIG.

The apparatus for performing the skin color based human hand detection method may further include: an input device 530 and an output device 440.

The processor 510, the memory 520, the input device 530, and the output device 540 may be connected by a bus or other means, as exemplified by a bus connection in FIG.

The memory 520 is used as a non-transitory computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a skin-based human hand detection method in the embodiment of the present application. Program instructions/modules (for example, image conversion module 410, binary image acquisition module 420, bitwise operation module 430, filter module 440, connected region determination module 450, and model training module 460 shown in FIG. 4). The processor 510 executes various functional applications and data processing of the electronic device by running non-volatile software programs, instructions, and modules stored in the memory 520, that is, on the implementation. The method embodiment is based on a human hand detection method of skin color.

The memory 520 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the skin color-based human hand detection device, and the like. . Further, the memory 520 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, memory 520 can optionally include a memory remotely located relative to processor 510 that can be connected to a skin tone based hand detection device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 can receive input numeric or character information and generate key signal inputs related to user settings and function control of the skin tone based hand detection device. The output device 540 can include a display device such as a display screen.

The one or more modules are stored in the memory 520, and when executed by the one or more processors 510, perform a skin tone based human hand detection method in any of the above method embodiments.

The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.

The electronic device of the embodiment of the present application exists in various forms, including but not limited to:

(1) Mobile communication devices: These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.

(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.

(3) Portable entertainment devices: These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.

(4) Server: A device that provides computing services. The server consists of a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general-purpose computer architecture, but due to the need to provide highly reliable services, Therefore, it is highly demanded in terms of processing power, stability, reliability, security, scalability, and manageability.

(5) Other electronic devices with data interaction functions.

The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.

Finally, it should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A human skin detection method based on skin color, characterized in that it is applied to an electronic device, comprising the following steps:

Converting the acquired image to be detected from the RGB color space to the HSV color space to obtain an HSV image, and converting the image to be detected from the RGB color space to the r-g color space to obtain an r-g image;

Traversing each pixel in the HSV image and converting the HSV image into a first binary image according to a pre-established HSV histogram model, and traversing each pixel in the rg image Converting the rg image into a second binary image according to a pre-established mixed Gaussian model;

Performing a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

Filtering the integrated binary image to obtain an optimized binary image;

Analyzing a maximum connected area in the optimized binary image, and using the largest connected area as a skin area;

The pre-trained K-nearest neighbor classifier is used to determine whether the largest connected area is a hand shape, thereby realizing human hand recognition.
The method according to claim 1, wherein converting the HSV image into a first binary image according to a pre-established HSV histogram model further comprises:

Marking the skin image and the non-skin area of the sample image to obtain a skin pixel sample and a non-skin pixel sample;

Converting the skin pixel sample and the non-skin pixel sample from an RGB color space to an HSV color space to obtain a skin HSV pixel sample and a non-skin HSV pixel sample;

Calculating an HSV value of the skin HSV pixel sample, and establishing an HSV histogram model of the skin pixel according to a distribution of HSV values of the skin HSV pixel sample;

The HSV values of the non-skin HSV pixel samples are counted, and an HSV histogram model of non-skin pixels is established based on the distribution of HSV values of the non-skin HSV pixel samples.
Method according to claim 1 or 2, characterized in that it is based on a pre-established HSV The histogram model converts the HSV image into a first binary image, further comprising:

Reading an HSV value of the pixel, calculating a matching probability value of the HSV value with an HSV histogram model of the skin pixel and an HSV histogram model of the non-skin pixel, respectively, and determining according to the matching degree value Whether the pixel points belong to a skin area;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image.
The method according to claim 1, wherein converting the r-g image into a second binary image according to a pre-established mixed Gaussian model further comprises:

Marking the skin pixel area and the non-skin pixel area of the RGB sample picture to obtain a skin pixel sample and a non-skin pixel sample;

Converting the skin pixel sample and the non-skin pixel sample from an RGB color space to an r-g color space to obtain an r-g skin pixel sample and an r-g non-skin pixel sample;

Calculating parameters of the skin pixel mixed Gaussian model and the non-skin pixel mixed Gaussian model according to the rg skin pixel sample and the rg non-skin pixel sample, respectively, using an expectation maximization algorithm to establish the skin pixel hybrid Gauss The model and the non-skin pixel mixed Gaussian model, wherein the parameters include a mean vector, a covariance matrix, and a weight of each Gaussian model in the mixed Gaussian model.
The method according to claim 1 or 4, wherein converting the HSV image into a second binary image according to a pre-established mixed Gaussian model further comprises:

Calculating a first probability density of the pixel point under a skin-mixed Gaussian model and a second probability density of the pixel point under a non-skin mixed Gaussian model;

Calculating a posterior probability that the pixel belongs to a skin region according to the first probability density of the pixel point and the second probability density;

When the posterior probability is determined to be greater than a preset posterior probability threshold, the pixel point is attributed to the skin region;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image and the The second binary image is described.
A human skin detecting device based on skin color, comprising the following modules:

An image conversion module, configured to convert the acquired image to be detected from an RGB color space to an HSV color space to acquire an HSV image, and convert the image to be detected from an RGB color space to an r-g color space to obtain an r-g image;

a binary map obtaining module, configured to traverse each pixel in the HSV image, and convert the HSV image into a first binary image according to a pre-established HSV histogram model, and traverse the read Each pixel in the rg image converts the rg image into a second binary image according to a pre-established mixed Gaussian model;

a bitwise operation module, configured to perform a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

a filtering module, configured to filter the integrated binary image to obtain an optimized binary image;

a connected area judging module, configured to analyze a largest connected area in the optimized binary image, and use the largest connected area as a skin area;

The human hand identification module is configured to determine whether the maximum connected area is a hand shape using a pre-trained K-nearest neighbor classifier, thereby realizing human hand recognition.
The apparatus of claim 6 wherein said apparatus further comprises a model training module, said model training module for:

Marking the skin image and the non-skin area of the sample image to obtain a skin pixel sample and a non-skin pixel sample;

Converting the skin pixel sample and the non-skin pixel sample from an RGB color space to an HSV color space to obtain a skin HSV pixel sample and a non-skin HSV pixel sample;

Calculating an HSV value of the skin HSV pixel sample, and establishing an HSV histogram model of the skin pixel according to a distribution of HSV values of the skin HSV pixel sample;

The HSV values of the non-skin HSV pixel samples are counted, and an HSV histogram model of non-skin pixels is established based on the distribution of HSV values of the non-skin HSV pixel samples.
The device according to claim 6, wherein the device further comprises a model training module, wherein the model training module is further configured to:

Converting the skin pixel sample and the non-skin pixel sample from an RGB color space to an r-g color space to obtain an r-g skin pixel sample and an r-g non-skin pixel sample;

Calculating parameters of the skin pixel mixed Gaussian model and the non-skin pixel mixed Gaussian model according to the rg skin pixel sample and the rg non-skin pixel sample, respectively, using an expectation maximization algorithm to establish the skin pixel hybrid Gauss The model and the non-skin pixel mixed Gaussian model, wherein the parameters include a mean vector, a covariance matrix, and a weight of each Gaussian model in the mixed Gaussian model.
The apparatus according to claim 6 or 7, wherein the binary map obtaining module is further configured to:

Reading an HSV value of the pixel, calculating a matching probability value of the HSV value with an HSV histogram model of the skin pixel and an HSV histogram model of the non-skin pixel, respectively, and determining according to the matching degree value Whether the pixel points belong to a skin area;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image.
The device according to claim 6 or 8, wherein the binary image obtaining module, the binary image obtaining module, is further configured to:

Calculating a first probability density of the pixel point under a skin-mixed Gaussian model and a second probability density of the pixel point under a non-skin mixed Gaussian model;

Calculating a posterior probability that the pixel belongs to a skin region according to the first probability density of the pixel point and the second probability density;

When the posterior probability is determined to be greater than a preset posterior probability threshold, the pixel point is attributed to the skin region;

If the pixel belongs to the skin region, the pixel is assigned with x, and if the pixel does not belong to the skin region, the pixel is assigned with y, thereby obtaining the first binary image and the The second binary image is described.
A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer instructions for causing the computer to perform the method of any of claims 1-5 .
An electronic device, comprising:

One or more processors; and,

a memory communicatively coupled to the one or more processors; wherein

The memory stores instructions executable by the one or more processors, the instructions being executed by the one or more processors to enable the one or more processors to:

Converting the acquired image to be detected from the RGB color space to the HSV color space to obtain an HSV image, and converting the image to be detected from the RGB color space to the r-g color space to obtain an r-g image;

Traversing each pixel in the HSV image and converting the HSV image into a first binary image according to a pre-established HSV histogram model, and traversing each pixel in the rg image Converting the rg image into a second binary image according to a pre-established mixed Gaussian model;

Performing a bitwise AND operation on the first binary image and the second binary image to obtain a comprehensive binary image;

Filtering the integrated binary image to obtain an optimized binary image;

Analyzing a maximum connected area in the optimized binary image, and using the largest connected area as a skin area;

The pre-trained K-nearest neighbor classifier is used to determine whether the largest connected area is a hand shape, thereby realizing human hand recognition.
A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute The method of claims 1-5.