CN110610505A

CN110610505A - Image segmentation method fusing depth and color information

Info

Publication number: CN110610505A
Application number: CN201910909933.5A
Authority: CN
Inventors: 杨跞; 钱成越; 张根雷; 刘一帆; 李法设
Original assignee: Siasun Co Ltd
Current assignee: Siasun Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2019-12-24

Abstract

The invention provides an image segmentation method fusing depth and color information, which mainly comprises the following steps: preprocessing RGB-D data: carrying out median filtering by taking the image edge information as a guide to enhance the quality of the depth image; super-pixel segmentation of RGB-D images: fusing color and depth information, and performing over-segmentation on the image; super-pixel merging: similar superpixels are merged by adopting a spectral clustering method based on graph theory, clustering is converted into the problem of graph division, and the image segmentation is completed. According to the invention, the depth map is converted into the three-dimensional point cloud according to the imaging geometric principle, so that the information of depth and color is integrated, the image is segmented, and the quality and the precision of image segmentation are improved.

Description

Image segmentation method fusing depth and color information

Technical Field

The invention relates to the field of image processing and computer vision, in particular to an image segmentation method fusing depth and color information.

Background

Image segmentation is a popular topic in the field of computer vision, and plays an important role in a variety of applications such as object recognition, target positioning and tracking, image retrieval, three-dimensional reconstruction, robot navigation and positioning, and the like. The traditional RGB image segmentation method divides an image into non-overlapping connected regions by using low-level features such as color space, texture, color distribution histogram and the like, so that the same region has high similarity, and different regions have larger difference. These methods are difficult to distinguish when adjacent different objects in the image are similar in color, or when the contrast of the edge features is low.

In recent years, with the rapid development of sensor technology, a large number of consumer-grade deep acquisition devices gradually come to the market and are widely applied, such as microsoft Kinect and Intel's real sense series. These sensors usually carry a color camera and can acquire the registered depth and color images synchronously, which provides more information and possibility for the traditional image segmentation technology based on two-dimensional color space.

Disclosure of Invention

The invention aims to provide an image segmentation method for acquiring RGB-D data, fusion depth and color information by using a depth sensor, which is mainly applied to object recognition of indoor scenes.

The registered depth and color images are synchronously acquired, so that not only can two-dimensional color information of a scene be acquired, but also three-dimensional space information of the scene can be acquired, and therefore, objects which are difficult to distinguish in the two-dimensional color space can be possibly distinguished through position information in the three-dimensional space, and targets can be identified from the background. Based on the principle, the invention provides an image segmentation method fusing depth and color information, which mainly comprises the following steps:

step 1, RGB-D data preprocessing: carrying out median filtering by taking the image edge information as a guide to enhance the quality of the depth image;

step 2, RGB-D image superpixel segmentation: fusing color and depth information, and performing over-segmentation on the image;

and 3, super-pixel combination: similar superpixels are merged by adopting a spectral clustering method based on graph theory, clustering is converted into the problem of graph division, and the image segmentation is completed.

Further, the method of step 1 comprises:

respectively carrying out edge extraction on the color image and the depth image to generate respective edge images;

and (3) carrying out fusion processing on the color edge image and the depth edge image: on the depth edge image, calculating the gradient direction of each pixel, if the edge direction of the corresponding color edge image is similar to the direction, using the color image edge, otherwise, using the depth image edge, thus obtaining the fused edge image information;

on the depth image, performing median filtering on each effective pixel by adopting a template with the size of 3 multiplied by 3;

color space transformation: and converting the color image acquired by the camera from an RGB color space to a CIELab color space.

Further, the step 2 comprises the following steps:

2.1 three-dimensional point cloud reconstruction based on depth information: converting pixel coordinates (X, Y) in the depth map into three-dimensional space coordinates (X, Y, Z) based on depth information in the depth map and intrinsic parameters of the camera;

2.2 superpixel Pre-segmentation and initialization of clustering centers: assuming that the image has N pixels, the image is divided into K superpixel blocks, each superpixel block has N/K pixels, and the space between each superpixel block isSetting an initial clustering center at the center of the superpixel block;

2.3 region clustering labeling: in the 2S multiplied by 2S neighborhood range of each super pixel clustering center, calculating the distance between each effective pixel and the clustering center in the 8-dimensional feature space [ L, a, b, X, Y, X, Y, Z ]:

D＝D_Lab+αD_xy+βD_XYZ

where subscript c denotes the cluster center of each superpixel, subscript i denotes each valid pixel within the search range, D_Lab,D_xy,D_XYZRespectively representing Euclidean distances between an effective pixel i and a clustering central point on a color feature Lab, an image two-dimensional coordinate xy and a depth-based three-dimensional space coordinate XYZ, wherein alpha and beta are weights used for balancing the importance degree of each item, and D is a final feature distance set according to specific scene data;

dividing each pixel into super pixels to which the cluster center with the minimum characteristic distance D belongs, namely marking the super pixels as the same class as the cluster center;

2.4 updating the clustering center: after all the effective pixels are marked, updating the clustering center according to the pixel classification result, namely counting the number of pixels contained in each super-pixel block and calculating the average value of the number of pixels to obtain a new clustering center;

2.5 iterative clustering: and repeating the steps of 2.3 and 2.4 until the clustering is stable, namely the clustering center and the pixel mark are not changed any more, so that the super-pixel segmentation is completed.

Further, the step 3 comprises the following steps:

3.1 similarity matrix construction: and taking each super pixel as a node of the graph, and taking the similarity between the super pixels as a weight of an edge to construct an undirected graph of all the super pixels. Assuming that the number of superpixels is K, the similarity matrix W of the graph belongs to R^K×KIs aA symmetric array of each element w_pqThe degree of similarity between a superpixel p and a superpixel q (p is 1, …, K; q is 1, …, K) is measured by color similarity, texture similarity and three-dimensional spatial connectivity:

w_pq＝D_color(p,q)+D_texture(p,q)+γD_space

wherein D is_color(p, q) represents color similarity, D_texture(p, q) represents texture similarity, D_spaceRepresenting three-dimensional space connectivity, wherein gamma is a weight factor, and adjusting the weight of the three-dimensional space connectivity according to the quality of depth data, and the three-dimensional space refers to the three-dimensional space reconstructed based on the depth information;

and 3.2, performing spectral clustering on the superpixels according to the constructed similarity matrix, and combining the superpixels with high similarity to finish image segmentation.

Further, the median filtering method is as follows:

let D (x, y) be the current valid pixel depth value, N_D(x, y) is its 8 neighbors, then:

D(x,y)＝median(N_D(x,y))

if no edge pixel exists in the neighborhood of the current pixel 8, calculating according to the formula; if the edge pixel exists, median filtering is carried out according to the neighborhood pixel on one side of the edge where the current pixel is located.

Further, the specific method for converting the single-dimensional information of the depth map into the three-dimensional space coordinate is

Wherein (x, y) is the image pixel coordinate of a certain point in the depth image, d is the depth value of the point in the depth image, f_x、f_yFocal lengths of the depth camera in the x and y directions, respectively, (c)_x,c_y) As principal point coordinates, f_x,f_y,c_x,c_yGiven by the camera manufacturer, (X, Y, Z) is the three-dimensional spatial coordinates of the point.

Further, the color similarity calculation method comprises the following steps:

on the RGB color space, for the super pixel p, respectively counting the normalized color histogram H on the three channels of RGB_p,R,H_p,G,H_p,BThe same applies to superpixel q; computing Bhattacharyya coefficients between the superpixel p and q three channel color histograms:

where i is the normalized pixel value;

then the sum of the Bhattacharyya coefficients between the three channel histograms of RGB is the color similarity:

D_color(p,q)＝Bh_R(p,q)+Bh_G(p,q)+Bh_B(p,q)。

further, the texture similarity calculation method comprises the following steps:

in Lab color space, an L channel value is selected to describe a Local Binary Pattern (LBP) histogram of a super pixel, and a Bhattacharyya coefficient is adopted to measure the texture similarity D between the super pixels p and q_texture(p,q)。

Further, the method for calculating the connectivity of the three-dimensional space comprises the following steps:

adopting a Gaussian kernel function, wherein sigma is a scale factor, setting according to an actual scene, and XYZ is a three-dimensional space coordinate of a superpixel clustering center, so that the three-dimensional space connectivity between the points p and q

According to the invention, a depth map is converted into three-dimensional point cloud according to the imaging geometric principle, so that the information of depth and color is integrated, and the image is subjected to superpixel segmentation and clustering, thereby completing image segmentation. The method makes full use of spatial position information contained in the depth image registered with the color image, so that objects which are difficult to distinguish in a two-dimensional color space are distinguished through the position information in a three-dimensional space, targets are identified from the background, and the quality and the precision of image segmentation are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram of a method for image segmentation fusing depth and color information according to an exemplary embodiment;

fig. 2 is an example of a segmentation effect of an image segmentation method applying fused depth and color information according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.

A flow chart of an exemplary embodiment of the present invention is given in fig. 1. As shown in the figure, the specific steps of this embodiment include:

step 1: and preprocessing RGB-D data. Due to the influences of working distance, shielding, noise and the like, the depth image needs to be preprocessed, median filtering is carried out by taking image edge information as guidance, and the quality of the depth image is enhanced. The method specifically comprises the following steps:

1.1, carrying out edge extraction on the color image and the depth image to generate respective edge images. Preferably, the present embodiment employs a Canny edge detection operator.

1.2 in the depth edge image, 8 neighborhoods of each edge pixel are detected, and if no invalid pixel exists, the pixel is a real edge pixel. And then, carrying out fusion processing on the color edge image and the depth edge image: on the depth edge image, the gradient direction of each pixel is calculated, if the direction of the corresponding color edge image edge is approximate, the color image edge is used, otherwise, the depth image edge is used. Thus, the fused edge image information is obtained.

1.3 on the depth image, median filtering is performed for each effective pixel. Preferably, a 3 × 3 size template is used: let D (x, y) be the current valid pixel depth value, N_D(x, y) is its 8 neighbors, then:

D(x,y)＝median(N_D(x,y))

1.4 color space transformation: and converting the color image acquired by the camera from an RGB color space to a CIELab color space. Because it has wider color gamut and can make up the defect of uneven distribution of the RGB color model.

Step 2: and (5) carrying out super-pixel segmentation on the RGB-D image. The complexity of the scene makes it difficult to directly and accurately segment the image, so that the whole image is segmented into a large number of small blocks, and adjacent similar pixels are divided into the same block, which is called super-pixel. Fusing color and depth information, and segmenting an image, specifically comprising the following steps:

2.1 three-dimensional point cloud reconstruction. The depth map contains three-dimensional space information, but the x and y coordinates of the depth map are pixel coordinates, so that the information of an object in one direction in the three-dimensional space can be reflected, the information of the other two dimensions can be lost when the depth map is directly used, and therefore the single-dimensional information of the depth map needs to be converted into the three-dimensional space coordinates.

Let (X, Y, Z) be the three-dimensional spatial coordinates of a certain point, (X, Y) be the corresponding image pixel coordinates in the depth image, and d be the corresponding depth value in the depth image. f. of_x,f_yFocal lengths of the depth camera in the x, y directions, respectively, (c)_x,c_y) As principal point coordinates, f_x,f_y,c_x,c_yGiven by the camera manufacturer, commonly referred to as intrinsic parameters. According to the imaging geometry principle, three-dimensional space coordinates can be calculated from the depth values:

at this time, each effective pixel of the RGB-D image can be described by the color feature L ab, the image two-dimensional coordinates x y, and the three-dimensional space coordinates XYZ to measure the color similarity and the continuity of the two-dimensional space and the three-dimensional space.

2.2 initializing the cluster centers. Assuming that the image has N pixels, the image is divided into K superpixel blocks, each superpixel block has N/K pixels, and the space between each superpixel block isThe initial cluster center is set at the center of the superpixel block.

2.3 region clustering labels. In the 2S multiplied by 2S neighborhood range of each super pixel clustering center, calculating the distance between each effective pixel and the clustering center in the 8-dimensional feature space [ L, a, b, X, Y, X, Y, Z ]:

D＝D_Lab+αD_xy+βD_XYZ

where subscript c denotes the cluster center of each superpixel, subscript i denotes each valid pixel within the search range, D_Lab,D_xy,D_XYZAnd respectively representing Euclidean distances between the effective pixel i and a clustering central point on a color feature L ab, an image two-dimensional coordinate xy and a three-dimensional space coordinate XYZ, wherein alpha and beta are weights, are used for balancing the importance degree of each item, and are set according to specific scene data. D is the final feature distance. Each pixel is classified into the super-pixel to which the cluster center with the smallest characteristic distance belongs, namely, the super-pixel is marked as the same class as the cluster center.

2.4 updating the cluster center. And after all the effective pixels are marked, updating the clustering center according to the pixel classification result. Namely, the number of pixels contained in each superpixel block is counted, and the average value of the number of pixels is calculated to obtain a new clustering center.

And 2.5, carrying out iterative clustering, and repeating the steps of 2.3 and 2.4 until the clustering is stable, namely the clustering center and the pixel mark are not changed any more. Generally, 5-6 times of iteration can obtain stable clustering, and the super-pixel segmentation is completed.

And step 3: and (4) super-pixel combination. The method comprises the following steps of merging similar superpixels by adopting a spectral clustering method based on graph theory, converting clustering into a graph partitioning problem, and completing image segmentation:

3.1 similarity matrix construction. And taking each super pixel as a node of the graph, and taking the similarity between the super pixels as a weight of an edge to construct an undirected graph of all the super pixels. Assuming that the number of superpixels is K, the similarity matrix W of the graph belongs to R^K×KIs a symmetric array of each element w_pqFor the similarity between superpixel p and superpixel q, the color similarity, texture similarity and three-dimensional spatial connectivity are adopted for measurement:

w_pq＝D_color(p,q)+D_texture(p,q)+γD_space

wherein gamma is a weight factor, and the weight of the three-dimensional space connectivity is adjusted according to the quality of the depth data.

Color similarity: on the RGB color space, for the super pixel p, respectively counting the normalized color histogram H on the three channels of RGB_p,R,H_p,G,H_p,BThe same applies to superpixel q. Computing Bhattacharyya coefficients between the superpixel p and q three channel color histograms:

where i is the normalized pixel value. The sum of the Bhattacharyya coefficients among the three channel histograms of RGB is then the color similarity:

D_color(p,q)＝Bh_R(p,q)+Bh_G(p,q)+Bh_B(p,q)

texture similarity: in Lab color space, an L channel value is selected to describe the super-pixels by a local binary pattern LBP histogram, similar to color similarity, and Bhattacharyya coefficients are adopted to measure texture similarity D between the super-pixels_texture(p,q)。

Three-dimensional space connectivity: adopting a Gaussian kernel function, wherein sigma is a scale factor, setting according to an actual scene, and XYZ is a three-dimensional space coordinate of a superpixel clustering center:

Therefore, according to the method, the depth map is converted into the three-dimensional point cloud according to the imaging geometric principle, information of depth and color is further integrated, and the image is subjected to superpixel segmentation and clustering, so that image segmentation is completed. The method makes full use of spatial position information contained in the depth image registered with the color image, so that objects which are difficult to distinguish in a two-dimensional color space are distinguished through the position information in a three-dimensional space, targets are identified from the background, and the quality and the precision of image segmentation are improved.

Application effect example:

as shown in fig. 2, the colors of the scene and the background are similar in the figure (as shown in fig. 2(a)), the contrast of the edge features is also low, and the segmentation is difficult by using the traditional RGB image segmentation method, and by using the method of the present invention, the segmentation is performed by comprehensively using the information of the color and the depth (as shown in fig. 2(b)), and the result is shown in fig. 2(c), wherein each part is clearly segmented.

The foregoing is merely an illustrative embodiment of the present invention, and any equivalent changes and modifications made by those skilled in the art without departing from the spirit and principle of the present invention should fall within the protection scope of the present invention.

Claims

1. An image segmentation method for fusing depth and color information mainly comprises the following steps:

2. The image segmentation method according to claim 1, wherein the method of step 1 comprises:

3. The image segmentation method according to claim 1, wherein the step 2 comprises the steps of:

D＝D_Lab+αD_xy+βD_XYZ

where subscript c denotes the cluster center of each superpixel, subscript i denotes each valid pixel within the search range, D_Lab，D_xy，D_XYZRespectively representing Euclidean distances between an effective pixel i and a clustering central point on a color feature Lab, an image two-dimensional coordinate xy and a depth-based three-dimensional space coordinate XYZ, wherein alpha and beta are weights used for balancing the importance degree of each item, and D is a final feature distance set according to specific scene data;

4. The image segmentation method according to claim 3, wherein the step 3 comprises the steps of:

3.1 similarity matrix construction: and taking each super pixel as a node of the graph, and taking the similarity between the super pixels as a weight of an edge to construct an undirected graph of all the super pixels. Assuming that the number of superpixels is K, the similarity matrix W of the graph belongs to R^K×KIs a symmetric array of each element w_pqFor the degree of similarity between a superpixel p and a superpixel q (p 1., K, q 1., K), the color similarity, texture similarity, and three-dimensional spatial connectivity are used for measurement:

w_pq＝D_color(p，q)+D_texture(p，q)+γD_space

5. The image segmentation method according to claim 2, wherein the median filtering method is:

D(x,y)＝median(N_D(x,y))

6. The image segmentation method according to claim 3, wherein the specific method for converting the one-dimensional information of the depth map into three-dimensional space coordinates is

Wherein (x, y) is the image pixel coordinate of a certain point in the depth image, d is the depth value of the point in the depth image, f_x、f_yFocal lengths of the depth camera in the x and y directions, respectively, (c)_x，c_y) As principal point coordinates, f_x，f_y，c_x,c_yGiven by the camera manufacturer, (X, Y, Z) is the three-dimensional spatial coordinates of the point.

7. The image segmentation method according to claim 4, wherein the color similarity is calculated by:

where i is the normalized pixel value;

D_color(p，q)＝Bh_R(p，q)+Bh_G(p，q)+Bh_B(p，q)。

8. the image segmentation method according to claim 4, wherein the texture similarity is calculated by:

in Lab color space, an L channel value is selected to describe a Local Binary Pattern (LBP) histogram of a super pixel, and a Bhattacharyya coefficient is adopted to measure the texture similarity D between the super pixels p and q_texture(p，q)。

9. The image segmentation method according to claim 4, wherein the three-dimensional space connectivity is calculated by:

。