WO2020173024A1

WO2020173024A1 - Multi-gesture precise segmentation method for smart home scenario

Info

Publication number: WO2020173024A1
Application number: PCT/CN2019/092970
Authority: WO
Inventors: 张晖; 张迪; 赵海涛; 孙雁飞; 朱洪波
Original assignee: 南京邮电大学
Priority date: 2019-02-26
Filing date: 2019-06-26
Publication date: 2020-09-03
Also published as: JP6932402B2; CN109961016A; JP2021517281A; CN109961016B

Abstract

Disclosed in the present invention is a multi-gesture precise segmentation method for a smart home scenario, comprising the following steps: S1: pre-processing a gesture image Image0 to obtain an image Image1; S2: performing skin tone segmentation on the pre-processed image Image1 to obtain a processed image Image4; S3: constructing an image minimum bounding rectangle (MBR) in the image Image4; S4: excluding the non-gesture areas in the image Image4 by means of a non-gesture area exclusion criterion to acquire a gesture image Image5; and S5: processing the image Image5 by means of an arm redundancy removal algorithm based on hand shape features to implement removal of arm redundancy. The present invention can locally and smartly segment gestures, and the entire process is quick and accurate, significantly increasing the comfort of use of gesture-based human-machine interaction systems.

Description

Multi-gesture accurate segmentation method for smart home scene

Technical field

The invention relates to an intelligent recognition method, in particular to a multi-gesture accurate segmentation method oriented to a smart home scene, and belongs to the field of smart home.

Background technique

Gesture segmentation refers to the technology of segmenting gesture information from the complex image background. The quality of gesture segmentation (accuracy, completeness, redundancy) is good or bad for the recognition and detection accuracy of gesture-based human-computer interaction systems. important influence.

Real-time gesture segmentation in home-oriented scenarios is more complex. User gestures are not only more complex and changeable, but also vulnerable to factors such as background, lighting, and shooting angle. In the current computer vision field, there is no adaptive gesture segmentation algorithm. Some of the current representative gesture segmentation methods mainly rely on external devices or require special processing of the user’s hands. However, they limit the range of people's activities and require supporting hardware equipment, cost and expensive. The technology is also difficult to be widely promoted in practical applications.

Corresponding to technological development, smart home devices with gesture segmentation are still relatively rare in the market today. Most gesture segmentation products only focus on segmentation of the skin, and cannot completely and accurately segment the gestures. The segmentation effect is not ideal. And most of these devices rely on cloud servers and rely too much on the network, and will not be able to work without a network.

In summary, how to propose a new multi-gesture precision segmentation method for smart home scenes based on the existing technology to realize the large-scale promotion and application of gesture segmentation technology on smart home devices has become the current Problems that need to be solved urgently by the insiders.

Summary of the invention

In view of the above-mentioned defects in the prior art, the purpose of the present invention is to propose a multi-gesture accurate segmentation method for smart home scenes, including the following steps:

S1. Preprocess the gesture image Image0 to obtain the image Image1;

S2. Perform skin color segmentation on the preprocessed image Image1 to obtain a processed image Image4;

S3. Construct the smallest bound rectangle MBR of the image in the image Image4;

S4. Exclude the non-gesture area in the image Image4 according to the non-gesture area exclusion criterion to obtain the gesture image Image5;

S5. The image Image5 is processed by the arm redundancy removal algorithm based on the hand shape feature to complete the removal of the arm redundancy.

Preferably, the preprocessing in S1 at least includes: gesture image denoising, gesture image binarization, and morphological processing.

Preferably, S2 specifically includes the following steps:

S21. Convert the Image1 image from the RGB color space to the YCbCr color space to obtain the image Image2, and then compare each pixel with the threshold through the global fixed threshold binarization method to obtain the binarized image Image3;

S22. Use the dilation and corrosion operation in the morphology to eliminate the holes and gaps in the binarized image Image3, and use the median filter to process the binarized image to obtain the image Image4.

Preferably, S3 specifically includes the following steps: store the contour information of the binarized gesture image obtained in S2 into the list contours, and obtain the coordinates of the four vertices of the circumscribed rectangle according to the coordinate information, which are top_left, top_right, and bottom_left. With bottom_right.

Preferably, the non-gesture area exclusion criterion in S4 specifically includes:

1) When the area of the circumscribed rectangle is less than 2500, it is regarded as a non-gesture area, and the collected image size is 640*480;

2) When the ratio of the length to the width of the circumscribed rectangle is greater than 5, it is regarded as a non-gesture area;

3) When the ratio of the point with the pixel value of 255 to the area of the rectangle is greater than 0.8 or less than 0.4, it is regarded as a non-gesture area.

Preferably, the arm redundancy removal algorithm based on hand shape features described in S5 specifically includes: counting the hand width distribution histogram and gradient distribution histogram of the image Image6, wherein the width of the gesture width distribution histogram is the largest The value and its corresponding coordinate are the thumb carpal joint, and the coordinates of the wrist dividing line are determined by finding the value in the histogram of the gradient distribution after the thumb carpal joint point.

Preferably, the coordinates of the wrist dividing line in step S5 are determined by finding the value in the histogram of the gradient distribution after the thumb carpal joint point, and the determination criterion is: the gradient of the current point is 0, and the gradient of the next point is greater than or equal to 0.

Compared with the prior art, the advantages of the present invention are mainly embodied in the following aspects:

The multi-gesture precise segmentation method for smart home scenes proposed by the present invention can segment the gestures intelligently locally, overcomes the drawbacks of the prior art that is too dependent on the network, and makes the device applying this method not connected to the network. It can still work normally.

The present invention completes the segmentation of the skin color by converting the gesture picture from the RGB color space to the YCbCr color space, and then by the method of global fixed threshold binarization. Subsequently, the non-gesture area is excluded, the MBR and MABR of the gesture contour are constructed, the gesture image is rotated to count the hand width, the width distribution histogram and the width-based gradient distribution histogram are constructed, and the wrist division line is determined. Finally, the removal of arm redundancy is completed, and a complete gesture image is obtained. The invention can quickly and accurately segment the gestures in the home environment image, significantly improves the use comfort of the gesture-based human-computer interaction system, and improves user satisfaction.

In addition, the present invention also provides a reference for other related issues in the same field, which can be used as a basis for expansion and application in other technical solutions related to gesture segmentation, and has very broad application prospects.

The specific implementation of the present invention will be described in further detail below in conjunction with the accompanying drawings of the embodiments, so as to make the technical solution of the present invention easier to understand and grasp.

Description of the drawings

FIG. 1 is a schematic diagram of the steps of performing skin color segmentation on gesture images provided by the present invention;

FIG. 2 is a schematic flow chart of steps for removing arm redundancy from gesture images provided by the present invention;

FIG. 3 is a schematic flow diagram of the overall steps of the multi-gesture precise segmentation method for smart home scenes provided by the present invention.

detailed description

The present invention discloses a multi-gesture precise segmentation method for smart home scenes. The method is based on a skin color segmentation algorithm of YCbCr color space, non-gesture region exclusion criteria, and an arm redundancy removal algorithm based on hand shape features. The method of the present invention includes the following steps:

S1. Preprocess the gesture image Image0 to obtain the image Image1;

It can be seen from the above steps that the method of the present invention mainly includes two major aspects, namely skin color segmentation and arm redundancy removal.

Hereinafter, the method of the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 shows a method for segmenting a gesture image according to an embodiment of the present invention. The steps of the method mainly include:

S1. Preprocess the gesture image Image0 to obtain the image Image1.

As the gesture image is inevitably interfered by noise when it is acquired, it will have a serious impact on the segmentation and recognition of the gesture image, so it is particularly important to preprocess the image before the gesture segmentation. The preprocessing at least includes: gesture image denoising, gesture image binarization and morphological processing.

Among them, the gesture image denoising mainly uses a Gaussian filter, which is a linear filter. The pixel value of the filter window obeys the Gaussian distribution, and decreases with the increase of the distance from the center of the template. Its two-dimensional Gaussian function for:

Among them, h(x,y) represents the value on the (x,y) coordinate in the Gaussian filter, and σ represents the standard deviation.

S21. Convert the Image1 image from the RGB color space to the YCbCr color space to obtain the image Image2, and then use the global fixed threshold binarization method to compare each pixel with the threshold value to obtain the binarized image Image3.

YCbCr color space is a commonly used color space in video images and digital images. Contains three components: Y (luma, brightness), which represents the brightness and darkness of the image, ranging from 0 to 255; Cb component represents the blue component in the RGB color space and the brightness value in the RGB color space The value range of the difference is 0-255; the Cr component represents the difference between the value of the red component in the RGB color space and the brightness in the RGB color space, and the value range is 0-255. The Cb component and the Cr component are independent of each other and can be effectively separated from the Y component.

The conversion formula from RGB color space to YCbCr color space is as follows:

Converted to matrix form:

In the above steps, each pixel is compared with the threshold. The specific operation is that the Y, Cb, and Cr values of human skin are approximately [0:256,130:174,77:128], if the YCbCr value of the pixel in the image If it belongs to this interval, the pixel value is set to 255, otherwise it is set to 0, and the binary image Image3 can be obtained.

After the gesture image is binarized, the resulting image will have gaps and defects. The role of morphology is to remove isolated dots, burrs, fill small holes, bridge small gaps, etc. There are four main types of morphological operations:

1. Expansion. The process of dilation operation in morphological operation is to merge the background points touched by the object into the object. As a result, the area of the target object becomes larger; its meaning is to fill the holes and gaps in the target area.

2. Corrosion. The corrosion calculation process in the morphological operation is to eliminate all boundary points of the object. As a result, the area of the target object becomes smaller; its significance is to eliminate some small meaningless isolated points in the target area.

3. Open operation. The opening operation process in the morphological operation first performs the erosion operation on the binary image, and then performs the expansion operation on it. Its significance is to eliminate the isolated small dots, burrs and other meaningless points in the target area (corrosion operation), and fill cavities and gaps (expansion operation).

4. Close operation. The closing operation process in the morphological operation first performs dilation operation on the binary image, and then performs erosion operation on it. Its significance lies in filling the voids and gaps in the target area (expansion operation), and eliminating isolated small dots, burrs and other meaningless points (corrosion operation).

The median filter is a non-linear filter, which mainly counts and sorts the surrounding pixels of the current point, and selects the median value as the pixel value of the current point, thereby eliminating isolated noise points. The median filter is mainly used to smooth the burrs on the edges of the gesture binarization image, so that the edges are smoothed, and the influence of the search of the wrist division line is reduced.

2 is a method for removing arm redundancy from a gesture image provided by an embodiment of the present invention. The steps of the method mainly include:

S3. Construct the minimum area binding rectangle MABR of the gesture image;

Construct the smallest bound rectangle MBR of the gesture image in Image4, and its vertex coordinate information is,

The MABR of the image is constructed on the basis of the MBR. Under the premise of the known contour, the convex hull of the gesture contour can be obtained according to the Graham scanning method. The center of the MBR of the graphic is the origin, and the β is the scale within the 90 degree range, etc. Rotate at intervals. At the same time, the MBR area of the graphics under the corresponding rotation angle is recorded, and the MBR corresponding to the smallest MBR area in the record is the required MABR.

S4. Exclude the non-gesture area in the image Image4 according to the non-gesture area exclusion criterion to obtain the gesture image Image5.

The non-gesture area exclusion criteria specifically include:

Then rotate the binarized gesture image counterclockwise, the rotation angle corresponding to the MABR can be obtained in the above steps, and rotate the gesture image counterclockwise to make the gesture direction vertical.

The arm redundancy removal algorithm based on hand shape features described in S5 specifically includes: Counting image Image6's hand width distribution histogram and gradient distribution histogram, where the maximum width of the gesture width distribution histogram and its The corresponding coordinate is the thumb carpal joint, and the coordinates of the wrist dividing line are determined by finding the value in the histogram of the gradient distribution after the thumb carpal joint point.

The calculation procedure of the width histogram is as follows:

The calculation procedure of the gradient histogram is as follows:

gradient=[0]

for index in range(1,len(width)):

Gradient.append(width[index]-width[index-1]).

Then, determine the wrist segmentation line. Since the maximum width of the gesture width distribution histogram and its corresponding coordinates are the thumb carpal joint, the coordinates of the wrist segmentation line can be found by looking for the value in the gradient distribution histogram after the thumb carpal joint point To determine, the method for determining is: the gradient of the current point is 0, and the gradient of the next point is greater than or equal to 0.

Finally, complete the arm redundancy removal. In the above steps, the coordinate information of the wrist dividing line is obtained, and the pixel value under the wrist dividing line is set to 0, that is, only the upper gesture image is retained, and the arm part is removed.

In the following, in conjunction with FIG. 3, a specific embodiment of a multi-gesture precise segmentation method for smart home scenes is proposed. This embodiment mainly includes the following steps:

Step S301, image acquisition;

The home image is collected mainly through a 2D camera.

Step S302, preprocessing the collected image;

Perform filtering, morphological, and binarization processing on the image.

Step S303, perform skin tone segmentation on the image;

By using the global fixed threshold method in the YCbCr color space, it is binarized, and the contour information of each area is obtained by the eight-neighbor method.

Step S304, filtering the non-gesture area;

By performing non-gesture region filtering on the gesture image segmented in step S303, the MBR of the gesture image is first constructed, regions that do not meet the conditions are filtered, and those that meet the conditions are subjected to gesture segmentation processing.

Step S305, perform gesture segmentation on the image;

The MABR is constructed on the basis of the gesture image MBR, and the deflection angle of the gesture image is obtained. By analyzing the histogram of the hand width distribution and the hand gradient distribution histogram, the division line of the wrist of the gesture is obtained, and the arm area is filtered.

Step S306, obtaining a complete gesture image;

After the gesture segmentation, 0 to multiple gestures will be generated, and all the gestures in the image can be extracted for subsequent needs. It is mainly used for gesture-based human-computer interaction systems to realize people's control of home equipment through gestures.

For those skilled in the art, it is obvious that the present invention is not limited to the details of the above exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit and basic characteristics of the present invention. Therefore, from any point of view, the embodiments should be regarded as exemplary and non-restrictive. The scope of the present invention is defined by the appended claims rather than the above description, and therefore it is intended to fall within the claims. All changes within the meaning and scope of equivalent elements of are included in the present invention, and any reference signs in the claims should not be regarded as limiting the involved claims.

In addition, it should be understood that although this specification is described in accordance with the implementation manners, not each implementation manner only contains an independent technical solution. This narration in the specification is only for clarity, and those skilled in the art should regard the specification as a whole The technical solutions in each embodiment can also be appropriately combined to form other implementations that can be understood by those skilled in the art.

Claims

A multi-gesture accurate segmentation method for smart home scenes is characterized in that it includes the following steps:

S1. Preprocess the gesture image Image0 to obtain the image Image1;

S2. Perform skin color segmentation on the preprocessed image Image1 to obtain a processed image Image4;

S3. Construct the smallest bound rectangle MBR of the image in the image Image4;

S4. Exclude the non-gesture area in the image Image4 according to the non-gesture area exclusion criterion to obtain the gesture image Image5;

S5. The image Image5 is processed by the arm redundancy removal algorithm based on the hand shape feature to complete the removal of the arm redundancy.
The multi-gesture precise segmentation method for smart home scenes according to claim 1, wherein the preprocessing in S1 at least includes: gesture image denoising, gesture image binarization, and morphological processing.
The multi-gesture precise segmentation method for smart home scenes according to claim 1, wherein S2 specifically includes the following steps:

S21. Convert the Image1 image from the RGB color space to the YCbCr color space to obtain the image Image2, and then compare each pixel with the threshold through the global fixed threshold binarization method to obtain the binarized image Image3;

S22. Use the dilation and corrosion operation in the morphology to eliminate the holes and gaps in the binarized image Image3, and use the median filter to process the binarized image to obtain the image Image4.
The multi-gesture precise segmentation method for smart home scenes according to claim 1, wherein S3 specifically includes the following steps: storing the contour information of the binarized gesture image obtained in S2 in the list contours, and According to the coordinate information, the coordinates of the four vertices of the bounding rectangle are obtained, which are top_left, top_right, bottom_left and bottom_right.
The multi-gesture precise segmentation method for smart home scenes according to claim 1, wherein the non-gesture area exclusion criterion in S4 specifically includes:

1) When the area of the circumscribed rectangle is less than 2500, it is regarded as a non-gesture area, and the collected image size is 640*480;

2) When the ratio of the length to the width of the circumscribed rectangle is greater than 5, it is regarded as a non-gesture area;

3) When the ratio of the point with the pixel value of 255 to the area of the rectangle is greater than 0.8 or less than 0.4, it is regarded as a non-gesture area.
The multi-gesture accurate segmentation method for smart home scenes according to claim 1, wherein the arm redundancy removal algorithm based on hand shape features in S5 specifically includes: counting the hand width distribution of the image Image6 Histogram and gradient distribution histogram, where the maximum width of the gesture width distribution histogram and its corresponding coordinates are the thumb carpal joint, and the coordinates of the wrist division line are found in the gradient distribution histogram after the thumb carpal joint point The value is determined.
The multi-gesture precise segmentation method for smart home scenes according to claim 6, wherein the coordinates of the wrist segmentation line in step S5 are determined by finding the value in the histogram of the gradient distribution after the thumb palmar joint point , The determination criterion is: the gradient of the current point is 0, and the gradient of the next point is greater than or equal to 0.