CN107369158A

CN107369158A - The estimation of indoor scene layout and target area extracting method based on RGB D images

Info

Publication number: CN107369158A
Application number: CN201710442910.9A
Authority: CN
Inventors: 吴晓秋; 霍智勇
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2017-06-13
Filing date: 2017-06-13
Publication date: 2017-11-21
Anticipated expiration: 2037-06-13
Also published as: CN107369158B

Abstract

The present invention discloses a kind of indoor scene layout estimation based on RGB D images and target area extracting method, including：Scene layout estimates；Over-segmentation is done to pretreated depth map and RGB figures using the partitioning algorithm based on figure and constrained parameters minimal cut algorithm, obtains different size of regional ensemble；Over-segmentation level is grouped, and is carried out region merging technique using four kinds of different measuring similarity modes to complete regional level packet, is obtained the target area of all scale sizes；And object boundary frame matching.The present invention realizes the target area extraction of efficient, the high recall rate of indoor scene.

Description

The estimation of indoor scene layout and target area extracting method based on RGB-D images

Technical field

The invention belongs to artificial intelligence computing technique field, particularly a kind of indoor scene layout based on RGB-D images Estimation and target area extracting method, applied to indoor service robot technology.

Background technology

The research of indoor scene parsing is one of study hotspot of domestic and foreign scholars, for the semantic positioning of Indoor Robot and Map generation have important application value, simultaneously for solve some high level computer visual problems also have it is very important Meaning.Target Segmentation and extraction algorithm purpose are target positioning and the example segmentation result for obtaining high quality, are scene parsings One of committed step.Objective extraction result is usually object candidate area or object boundary frame, by development for many years, target Extraction algorithm can be divided into two classes at present：The first kind is the algorithm based on sliding window detection thought, and the second class is based on segmentation Algorithm, include image over-segmentation and segmentation split strategy.First kind algorithm comparison classics are DPM (Deformable Parts Model) algorithm of target detection, there is very strong robustness using HOG features and SVM classifier, the deformation to target is improved, but It is that this kind of algorithm calculation cost is larger, and more complicated character representation can not be used.

More classical in second class algorithm is image segmentations of the GBS (graph based segmentation) based on figure Algorithm, the algorithm is realized simply, speed, can find out visually consistent region, but easily cause over-segmentation；Also The Target Segmentation algorithm of constrained parameters minimal cut, the algorithm segmentation effect is good, but only includes foreground segmentation region.In recent years by In the popularization of depth transducer, there are the RGB-D image data sets for largely including depth image, researcher starts with RGB- D data sets lift effect by increasing geometric properties or depth information, but these algorithms generally all to there is supervision algorithm, it is necessary to The skeleton pattern being previously obtained is trained, computation complexity is larger, although part improves the degree of accuracy of Objective extraction, target Classification is less and recall rate is relatively low, and easily ignores plane domain object when detecting.Also there are some unsupervised RGB-D in addition Objective extraction and partitioning algorithm, calculating speed is very fast, but more sensitive to brightness of image change, noise etc., and robustness is not high.

Although object extraction algorithm is constantly developing, due to the office of the features such as RGB image texture, color, brightness It is sex-limited, in the indoor scene applied to complexity, problems with still be present：1) occlusion issue, because occlusion detection is less than one A little big targets；2) the problem of plane domain object and small-size object are easily ignored so that recall rate is relatively low；3) calculate complicated Spend larger, it is necessary to which pre-training, is unsuitable for real system application；4) it is poor that respond is influenceed on the uncertain factor in image, Robustness is low.

The content of the invention

The present invention in order to overcome the above-mentioned deficiencies of the prior art, takes into account recall rate and rapidity, solves complex indoor scene The problem of lower layout estimation and Objective extraction, it is proposed that a kind of estimation of indoor scene layout and target area based on RGB-D images Domain extracting method, realize the target area extraction of efficient, the high recall rate of indoor scene.

The estimation of indoor scene layout and target area extracting method based on RGB-D images, comprise the following steps,

Step 1, scene layout's estimation：Depth map is converted into intensive 3D point cloud, by three-dimensional European between calculating point cloud Distance carries out plane segmentation division plane domain and non-planar area, and gained plane domain is classified, and is divided into border and puts down Face and non-boundary plane；

Step 2, image over-segmentation：Using based on the partitioning algorithm of figure with constrained parameters minimal cut algorithm to pretreated Depth map does over-segmentation with RGB figures, obtains different size of regional ensemble；

Step 3, the packet of over-segmentation level：Utilize color, texture, size, four kinds of different measuring similarity modes of coincideing Region merging technique is carried out to complete regional level packet, obtains the target area of all scale sizes；

Step 4, the matching of object boundary frame：To plane domain and the target of non-planar area, it is divided into boundary plane, non-border In plane, plane domain, four kinds of non-planar area situation take the minimum rectangle bounding box that different strategy matchings includes target, Obtain target area boundaries frame.

The present invention does plane segmentation and classification using the 3D point cloud of input, make use of the geometric continuity of a cloud to reduce and blocks Influence to layout estimation, improve the effect of scene layout's estimation；It is minimum using the partitioning algorithm based on figure and constrained parameters Cut algorithm and over-segmentation is done to pretreated depth map and RGB figures, combine depth information and RGB information, improve segmentation effect Fruit；Region merging technique is carried out using four kinds of different measuring similarity modes, obtains the target area of all scale sizes, and A variety of image conditions are considered, add the robustness of algorithm；The target area of plane domain and non-planar area is taken not Same bounding box matching strategy, had both remained the object in plane domain, and it is undue caused by due to blocking to improve big object again Cut problem；Rate is overlapped using bounding box and eliminates redundancy bounding box, leaves optimum target zone boundary frame, is producing less candidate side Object boundary frame recall rate is effectively increased in the case of boundary's frame；Whole process does not need pre-training, and computation complexity is low, is easy to real Existing, calculating speed is fast.

Brief description of the drawings

Fig. 1 is the flow of the estimation of indoor scene layout and the embodiment of target area extracting method one based on RGB-D images Figure；

Fig. 2 is depth map and 3D point cloud schematic diagram in Fig. 1 embodiments；

Fig. 3 is plane segmentation and the classifying quality figure of different scenes；

Fig. 4 is homomorphic filtering process chart in Fig. 1 embodiments；

Fig. 5 is the bounding box design sketch of the target area extraction of an implement scene in Fig. 1 embodiments.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 1 is estimated by the layout of the indoor scene based on RGB-D images that the embodiment of the present invention proposes and target area carries Take the overview flow chart of method.The embodiment step is as follows

Step (1) scene layout estimates：Depth map is converted into intensive 3D point cloud first, as shown in Fig. 2 then passing through meter The three-dimensional Euclidean distance calculated between point cloud carries out plane segmentation division plane domain and non-planar area, and gained plane domain is entered Row classification, is divided into boundary plane and non-boundary plane.

Step (1.1) plane is split：Carry out uniformity on depth map to sample to obtain triangle point set, for therein each Triangulation point group uses one candidate plane of RANSAC algorithmic match；Then search point in plane in 3d space, it is each in point can be by A pixel and its corresponding 3D available points in depth map represent, when the three-dimensional Euclidean distance of a point to the plane is less than Interior point is apart from tolerance D_tolWhen, the interior point that the point is plane is defined, interior point is apart from tolerance D_tolCalculate as shown in formula (1)；Finally move Except the few tiny plane of interior quantity, and split spatial closeness or close to coplanar plane.

In formula, f is focal length, and b is the baseline length of sensor, and m is linear normalization parameter, Z representative depth values.

Step (1.2) plane is classified：According to obtained dominant plane region, it is assumed that the normal line vector of plane is towards observation Person, calculate in the ratio between point cloud total quantity for putting cloud number and whole scene of plane another side, will be less than the plane of certain threshold value point Class is boundary plane, and the plane more than certain threshold value is categorized as non-boundary plane.Ideally the threshold value is 0, it is contemplated that Influence of noise, it is arranged to 0.01.Plane is classified as shown in Figure 3 with the final effect of segmentation.

Step (2) image over-segmentation：Using the partitioning algorithm based on figure and constrained parameters minimal cut algorithm to pretreatment after Depth map and RGB figures do over-segmentation, obtain different size of regional ensemble R={ r₁..., r_n}。

With reference to RGB information and depth information, the image of multiple pixel scales from unlike signal passage is carried out first Different degrees of over-segmentation, the other image of region class is obtained, bottom-up group technology is then utilized to area according to provincial characteristics Domain carries out level packet until whole image turns into a region, to obtain the group rank figure for including all size target areas Picture.

The segmentation of step (2.1) based on RGI color spaces：RGB triple channel images are converted into normalization RG passages to add Brightness I passages, i.e. RGI color spaces, it is then different degrees of undue using doing three kinds to RGI images based on the dividing method of figure Cut.

The segmentation of step (2.2) based on the gray-scale map after homomorphic filtering：First RGB image is handled plus homomorphic filtering, place Then reason flow using the dividing method based on figure to the gray-scale map exported after processing as shown in figure 4, carry out three kinds in various degree Over-segmentation.

The segmentation of step (2.3) based on the depth map after hole-filling：Depth map is entered using global optimization color method Row hole-filling, three kinds of different degrees of over-segmentations then are carried out to the depth map after filling up using based on the dividing method of figure.

The segmentation of step (2.4) based on RGB-D hybrid channels：Using the foreground segmentation method of constrained parameters minimal cut, knot RGB image information and depth information are closed, over-segmentation is carried out to the image of RGB-D hybrid channels, this method is based on such as following formula (2) institute The energy theorem shown：

μ in formula, v ∈ N, λ ∈ R, v are the set of all pixels point, edge aggregations of the ε between adjacent pixel.C_λFor cost Function, a cost can be produced when assigning label to each pixel.Binary potential function V_μvHerein as penalty, when to When similar adjacent pixel assigns different labels, a penalty value will be produced.

Step (2.4.1) calculation cost function C_λ：

V in formula_fRepresent foreground seeds, v_bBackground seed is represented, λ is offset, and formula f is defined as follows formula (4)：

f(x_μ)=lnp_f(x_μ)-lnp_b(x_μ) (4)

P in formula_fRepresent that pixel μ belongs to the probability distribution of foreground area, add p after depth information_fIt is defined as follows formula (5) shown in：

D is depth map in formula, and I is RGB image.The representative pixel points of j expression seed regions, these pixels are equal by K- Value-based algorithm (k=5) is selected as regional center, and α is scale factor, and γ is scale factor.

Step (2.4.2) calculates penalty V_μv：

The similitude g (μ, υ) of two neighborhood pixels is calculated according to pixel μ and υ gPb values in formula：

σ in formula²For edge sharpening parameter, binary item V is controlled_μvSlickness.RGB image and depth map are all carried out GPb is calculated, and its linear combination is risen and is used as the final gPb values of each pixel：

GPb=α gPb_r+(1-α)·gPb_d (8)

GPb in formula_rRepresent the gPb values of pixel in RGB figures, gPb_dThe gPb values of pixel in depth map are represented, α is set to 0.3。

Step (3) over-segmentation level is grouped：Region merging technique is carried out to complete using four kinds of different measuring similarity modes Regional level is grouped, and obtains the target area of all scale sizes.

Step (3.1) calculates the similarity s (r of all adjacent areas two-by-two first_i, r_j) and be added to similarity collection Close in S, find out two region r that similarity is maximum in set S_iAnd r_j, it is merged into a region r_t, and it is added to region In set R；

Step (3.2) removes r from similarity set_iAnd r_jThe similarity in region adjacent thereto, i.e. S=S s (r_i, r_*), Calculate new region r_tThe similarity in region adjacent thereto, its result is added in similarity set S；

Step (3.3) repeats (3.1) and arrives (3.2) step, until whole image turns into a big region, completes the layer in region Secondary packet, obtain the target area of all scale sizes.

Step (3.1.1) calculates color similarity：The face for the 25bins for obtaining tri- passages of RGB is normalized using L1 norms Color Histogram so that each region obtains the vector of one 75 dimensionThen according to the vector as shown in formula (9) Color similarity between zoning：

Step (3.1.2) calculates texture similarity：The height that variance is 1 is calculated to 8 different directions of each Color Channel This differential, each direction of each Color Channel obtain 10bins histogram so that each region obtains one 240 dimension VectorThen the texture similarity between the zoning as shown in formula (10)：

Step (3.1.3) calculates size similarity：Accounting as shown in formula (11) according to each area size in the picture Size similarity between rate zoning：

I refers to whole image in formula.

Step (3.1.4) calculates similarity of coincideing：Calculate the minimum rectangle bounding box B for including two combined region_ijWith two The difference of the size in region, then according to identical similar between phase difference accounting rate zoning in the picture as shown in formula (12) Degree：

Step (3.1.5) combines four kinds of similarities：Last similarity s (r_i, r_j) calculation by more than four kinds of similarities Shown in linear combination such as following formula (13)：

s(r_i, r_j)=a₁s_c(r_i, r_j)+a₂s_t(r_i, r_j)+a₃s_s(r_i, r_j)+a₄s_f(r_i, r_j) (13)

A in formula_i∈ { 0,1 }, represents whether the similarity is used.

Step (4)：Object boundary frame matches：To plane domain and the target of non-planar area, it is divided into four kinds of situations and takes Different strategy matchings includes the minimum rectangle bounding box of target, obtains target area boundaries frame.

Step (4.1) is all directly used for plane domain, boundary plane region；To each non-boundary plane, look for The minimum euclidean distance with other non-boundary planes is calculated to its boundary point, the plane split that distance is less than to certain threshold value rises Come, using the non-boundary plane region after split；And positioned at the target of plane domain, only retain the target as caused by RGB image Region.

Step (4.2) is for non-planar area, in addition to the target area too small with non-planar area overlapping area, its He is directly used target area.

Step (4.3) matches object boundary frame：The target area of all uses is converted into shade, to these shades one by one Matching includes the minimum rectangle bounding box including shade, obtains shown in bounding box set B such as formulas (14)：

B=B_BP+B_MPR+B_NPR+B_PR (14)

BP represents boundary plane region in formula, and MPR represents split plane domain, and NPR represents the mesh positioned at non-planar area Mark, PR represent the target positioned at plane domain.

Then the tiny bounding box in set B is removed, and bounding box is sorted by size, is from top to bottom iterated to calculate Overlapping rate between bounding box, overlap rate Q (b_i, b_j) computational methods such as following formula (15), then filter off overlapping rate and be more than certain threshold value Bounding box is repeated, obtains optimum target zone boundary frame set to the end, effect is as shown in Figure 5.

B in formula_i, b_j∈ B, a (b_i) represent bounding box b_iArea.

Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, in addition to Formed technical scheme is combined by above technical characteristic.

Claims

1. the estimation of indoor scene layout and target area extracting method based on RGB-D images, it is characterised in that：Including following step Suddenly,

Step 1, scene layout's estimation：Depth map is converted into intensive 3D point cloud, by calculating the three-dimensional Euclidean distance between point cloud Carry out plane segmentation division plane domain and non-planar area, and gained plane domain classified, be divided into boundary plane with Non- boundary plane；

Step 2, image over-segmentation：Using the partitioning algorithm based on figure and constrained parameters minimal cut algorithm to pretreated depth Figure does over-segmentation with RGB figures, obtains different size of regional ensemble；

Step 3, the packet of over-segmentation level：Carried out using color, texture, size, four kinds of different measuring similarity modes of coincideing Region merging technique obtains the target area of all scale sizes to complete regional level packet；

Step 4, the matching of object boundary frame：To plane domain and the target of non-planar area, it is divided into boundary plane, non-border is put down In face, plane domain, four kinds of non-planar area situation take different strategy matchings to include the minimum rectangle bounding box of target, obtain To target area boundaries frame.

2. the estimation of indoor scene layout and target area extracting method according to claim 1 based on RGB-D images, its It is characterised by：Step 1 detailed process is,

Step 1.1, plane segmentation：Carry out uniformity on depth map to sample to obtain triangle point set, for each triangle therein Point group uses one candidate plane of RANSAC algorithmic match；Then 3D point cloud space search plane in point, when a point arrive this The three-dimensional Euclidean distance of plane is less than interior point apart from tolerance D_tolWhen, the interior point that the point is plane is defined, interior point is apart from tolerance D_tol Calculate such as formula (1)；The tiny plane of point in finally removing, and split spatial closeness or close to coplanar plane；

In formula, f is focal length, and b is the baseline length of sensor, and m is linear normalization parameter, Z representative depth values；

Step 1.2, plane classification：According to obtained dominant plane region, it is assumed that the normal line vector of plane calculates towards observer In the ratio between point cloud total quantity for putting cloud number and whole scene of plane another side, the plane less than threshold value is categorized as border and put down Face, the plane more than threshold value are categorized as non-boundary plane.

3. the estimation of indoor scene layout and target area extracting method according to claim 1 based on RGB-D images, its It is characterised by：Step 2 detailed process is,

Step 2.1, the segmentation based on RGI color spaces：RGB triple channel images are converted into normalization RG passages and add brightness I Passage, i.e. RGI color spaces, over-segmentation then is done to RGI images using based on the dividing method of figure；

Step 2.2, the segmentation based on the gray-scale map after homomorphic filtering：RGB image is handled plus homomorphic filtering, to defeated after processing The gray-scale map gone out carries out over-segmentation using the dividing method based on figure；

Step 2.3, the segmentation based on the depth map after hole-filling：Cavity is carried out to depth map using global optimization color method Fill up, over-segmentation is carried out to the depth map after filling up using based on the dividing method of figure；

Step 2.4, the segmentation based on RGB-D hybrid channels：Using the foreground segmentation method of constrained parameters minimal cut, with reference to RGB Image information and depth information, over-segmentation, the energy theorem based on formula (2) are carried out to the image of RGB-D hybrid channels：

<mrow> <msup> <mi>E</mi> <mi>&lambda;</mi> </msup> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>&mu;</mi> <mo>&Element;</mo> <mi>&gamma;</mi> </mrow> </munder> <msub> <mi>C</mi> <mi>&lambda;</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>&mu;</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <mi>&mu;</mi> <mo>,</mo> <mi>v</mi> <mo>)</mo> <mo>&Element;</mo> <mi>&epsiv;</mi> </mrow> </munder> <msub> <mi>V</mi> <mrow> <mi>&mu;</mi> <mi>v</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>&mu;</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>v</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

μ in formula, v ∈ N, λ ∈ R, v are the set of all pixels point, edge aggregations of the ε between adjacent pixel, C_λFor cost function, A cost, binary potential function V can be produced when assigning label to each pixel_μvAs penalty, when to similar adjacent When pixel assigns different labels, a penalty value will be produced.

4. the estimation of indoor scene layout and target area extracting method according to claim 1 based on RGB-D images, its It is characterised by：Step 3 detailed process is,

Step 3.1, the similarity set for calculating all adjacent areas two-by-two, find out two maximum region r of similarity_iAnd r_j, It is merged into a new region r_t, it is added in regional ensemble；

Step 3.2, remove r from similarity set_iAnd r_jThe similarity in region adjacent thereto, calculate new region r_tIt is adjacent thereto The similarity in region, it is added in similarity set；

Step 3.3, repeat step 3.1~3.2, until whole image turns into a big region, the level for completing image is grouped, and is obtained Take the target area of all scale sizes.

5. the estimation of indoor scene layout and target area extracting method according to claim 1 based on RGB-D images, its It is characterised by：Step 4 detailed process is,

Step 4.1, all directly it is used for plane domain, boundary plane region；To each non-boundary plane, its is found Boundary point calculates the minimum euclidean distance with other non-boundary planes, and the plane that distance is less than to threshold value pieces together, using spelling Non- boundary plane region after conjunction；And positioned at the target of plane domain, only retain the target area as caused by RGB image；

Step 4.2, for non-planar area, in addition to the target area too small with non-planar area overlapping area, other mesh Mark region is all directly used；

Step 4.3, matching object boundary frame：The target area of all uses is converted into shade, these shades are matched one by one Comprising the minimum rectangle bounding box including shade, tiny bounding box is then removed, and bounding box is sorted by size, by upper The overlapping rate between bounding box is computed repeatedly under, the bounding box that overlapping rate is more than certain threshold value is filtered off, obtains target area to the end Domain bounding box.