CN109829449B - RGB-D indoor scene labeling method based on super-pixel space-time context - Google Patents

RGB-D indoor scene labeling method based on super-pixel space-time context Download PDF

Info

Publication number
CN109829449B
CN109829449B CN201910174110.2A CN201910174110A CN109829449B CN 109829449 B CN109829449 B CN 109829449B CN 201910174110 A CN201910174110 A CN 201910174110A CN 109829449 B CN109829449 B CN 109829449B
Authority
CN
China
Prior art keywords
tar
pixel
super
seg
superpixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910174110.2A
Other languages
Chinese (zh)
Other versions
CN109829449A (en
Inventor
王立春
王梦涵
王少帆
孔德慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910174110.2A priority Critical patent/CN109829449B/en
Publication of CN109829449A publication Critical patent/CN109829449A/en
Application granted granted Critical
Publication of CN109829449B publication Critical patent/CN109829449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an RGB-D indoor scene labeling method based on super-pixel space-time context, which is characterized in that in the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is called super-pixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image. In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result. And performing semantic classification on each block of superpixels corresponding to the leaf nodes and the intermediate nodes to obtain the semantic labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08. The method is obviously superior to the conventional indoor scene labeling method.

Description

RGB-D indoor scene labeling method based on super-pixel space-time context
Technical Field
The invention relates to RGB-D indoor scene image annotation, and belongs to the field of computer vision and pattern recognition.
Background
Semantic annotation of indoor scene images is a challenging task in current vision-based scene understanding, with the basic goal of densely providing a predefined semantic class label for each pixel in a given indoor scene image (or frame in a captured indoor scene video).
The problems of a large number of semantic categories, mutual shielding of scene objects, weak identification of bottom layer visual features, uneven illumination and the like exist in an indoor scene, so that the image annotation of the indoor scene faces huge difficulty. With the popularity of depth sensors, RGB-D data including color, texture, and depth has now been readily and reliably available. RGB-D indoor scene labeling generally has two types of methods, one is RGB-D indoor scene labeling based on definition characteristics; and secondly, RGB-D indoor scene labeling based on learning features. The invention provides an RGB-D indoor scene labeling method based on super-pixel space-time context, belonging to an RGB-D indoor scene labeling method based on definition characteristics.
Comprehensive analysis is given below for the primary method of RGB-D indoor scene labeling based on defined features. As a precursor for performing indoor scene semantic annotation by using Depth information, Silberman and the like extract SIFT feature descriptors from color images (RGB), Depth images (Depth) and RGB after rotation processing, and perform semantic classification on the feature descriptors through a feedback type forward neural network to obtain an image semantic annotation result. And after obtaining a semantic annotation result, further optimizing by using simple CRFs (conditional random field probability map models). Ren et al performs superpixel segmentation on an image using the gPb/UCM algorithm, and combines a superpixel set into a hierarchical tree structure based on a segmentation threshold. Feature descriptions of a Patch (image block) are densely calculated on an RGB-D image, and feature descriptions of a super pixel region are calculated based on the Patch features. In semantic classification, the superpixel features are used as the input of the SVM, and the classification result of each superpixel is given. And constructing new super-pixel class characteristics based on the label vectors obtained by the SVM classifier, and further optimizing the recognition result by using the new characteristics to construct an MRFs (Markov random field) model.
In semantic recognition, one consensus is to use more context information, and the recognition results are usually more accurate. The pixel-level spatial context generally constructs an MRF or CRF model based on the adjacency relation between pixels, and the semantic labels of adjacent pixels are constrained to be consistent. And (3) the super-pixel level space context is used for connecting super-pixel features with inclusion relation in series to serve as classification features, or a super-pixel information CRF model is used. In the superpixel information CRF model, the pre-estimated probability of a pixel point is used as unitary energy, the characteristic difference of a pixel point pair is used as binary energy, the superpixel information is used as high-level energy, and an optimal label is determined by solving a defined energy function.
In the use of time context, Kundu considers that the pixel information of adjacent frames in a video sequence under the same scene is overlapped, so a new dense CRF model method is provided.
Object of the Invention
The invention aims to fully utilize time and space context, calculate superpixel time context by utilizing continuous frame images in the annotation process, and jointly complete an indoor scene annotation task by utilizing the space context provided by hierarchical superpixel segmentation.
In order to achieve the purpose, the technical scheme adopted by the invention is an RGB-D indoor scene labeling method based on super-pixel space-time context, and an image Fr to be labeled is inputtarAnd its time-sequential adjacent frames Frtar-1、Frtar+1Output is FrtarPixel level labeling.
Computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1The corresponding superpixel in (1), namely the time context of the corresponding superpixel; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context at the sub-node of the partition tree.
Structure FrtarPerforming feature representation of each super pixel based on time context, and classifying by using the feature of the super pixel based on time context by adopting a Gradient Boost Decision Tree (GBDT); obtaining Fr by weighting and combining the semantic classification results of the superpixels and the space contexts thereof by utilizing the superpixel space contexttarAnd (5) semantic annotation of the middle super pixel.
S1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is known as superpixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image.
S1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of image
Figure BDA0001988977880000021
The gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure BDA0001988977880000022
. In the formula (1), the reaction mixture is,
Figure BDA0001988977880000023
is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure BDA0001988977880000024
is a probability value of a pixel belonging to a boundary calculated based on the depth image.
Figure BDA0001988977880000025
Probability value obtained according to formula (1)
Figure BDA0001988977880000026
And setting different probability threshold values tr to obtain a multi-level segmentation result.
The probability threshold tr set in the method is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
s1.2Patch feature
Patch is defined as an m × m-sized grid, and slides from the upper left corner of the color image and the depth image downward to the right in steps of n pixels, eventually forming a dense grid on the color image and the depth image. In the method, the size of the Patch is set to be 16 multiplied by 16 in an experiment, an image with the sliding step length N of 2 and the size of N multiplied by M is selected when the Patch is selected, and the number of the finally obtained patches is
Figure BDA0001988977880000031
Four types of features are calculated for each Patch: depth gradient features, color features, texture features.
S1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
Figure BDA0001988977880000032
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure BDA0001988977880000033
and
Figure BDA0001988977880000034
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure BDA0001988977880000035
and
Figure BDA0001988977880000036
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;
Figure BDA0001988977880000037
is that
Figure BDA0001988977880000038
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0001988977880000039
representing the kronecker product.
Figure BDA00019889778800000310
And
Figure BDA00019889778800000311
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800000312
and
Figure BDA00019889778800000313
are parameters corresponding to a gaussian kernel function. Finally, the depth gradient feature is transformed by using an EMK (efficient Match Kernel) algorithm, and the transformed feature vector is still marked as Fg_d
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
Figure BDA0001988977880000041
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure BDA0001988977880000042
and
Figure BDA0001988977880000043
respectively representing the gradient direction and the gradient magnitude of the pixel z;
Figure BDA0001988977880000044
and
Figure BDA0001988977880000045
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure BDA0001988977880000046
is that
Figure BDA0001988977880000047
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0001988977880000048
representing the kronecker product.
Figure BDA0001988977880000049
And
Figure BDA00019889778800000410
respectively a color gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800000411
and
Figure BDA00019889778800000412
are parameters corresponding to a gaussian kernel function. Finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is represented by the formula(4) Defining:
Figure BDA00019889778800000413
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure BDA00019889778800000414
and
Figure BDA00019889778800000415
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;
Figure BDA00019889778800000416
is that
Figure BDA00019889778800000417
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure BDA00019889778800000418
representing the kronecker product.
Figure BDA00019889778800000419
And
Figure BDA00019889778800000420
respectively a color gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800000421
and
Figure BDA00019889778800000422
are parameters corresponding to a gaussian kernel function. Finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol
S1.2.4 Texture feature (Texture)
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
Figure BDA0001988977880000051
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the Local Binary Pattern feature (LBP) of pixel z;
Figure BDA0001988977880000052
and
Figure BDA0001988977880000053
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure BDA0001988977880000054
is that
Figure BDA0001988977880000055
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0001988977880000056
representing the kronecker product.
Figure BDA0001988977880000057
And
Figure BDA0001988977880000058
respectively local binary pattern Gaussian kernel function and bitThe method is characterized by comprising the following steps of setting a Gaussian kernel function,
Figure BDA0001988977880000059
and
Figure BDA00019889778800000510
are parameters corresponding to a gaussian kernel function. Finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex
S1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
Figure BDA00019889778800000511
Figure BDA00019889778800000512
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
Figure BDA00019889778800000513
in the formula (7), Fg_d(p),Fg_c(p),Fcol(p),Ftex(p) represents the feature of the Patch whose p-th center position falls within the super pixel seg, and n represents the number of the patches whose center positions fall within the super pixel seg.
Superpixel geometry
Figure BDA00019889778800000514
Is defined by the formula (8):
Figure BDA00019889778800000515
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAs defined in formula (9):
Figure BDA0001988977880000061
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg.
Area to perimeter ratio R of super pixelsegAs defined in formula (10):
Figure BDA0001988977880000062
Figure BDA0001988977880000063
is based on the x-coordinate s of the pixel sxY coordinate syAnd a second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moment calculated by multiplying the x coordinate by the y coordinate, respectively, as defined in equations (11), (12) and (13)
Figure BDA0001988977880000064
Figure BDA0001988977880000065
Figure BDA0001988977880000066
In formula (14)
Figure BDA0001988977880000067
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
Figure BDA0001988977880000068
width and Height respectively represent the Width and Height of the image, i.e.
Figure BDA0001988977880000069
The calculation is based on the normalized pixel coordinate values.
Figure BDA00019889778800000610
DvarRespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Figure BDA00019889778800000611
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Figure BDA0001988977880000071
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
S2 superpixel context
The method respectively constructs a time context and a space context based on an RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation.
S2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
In the method, the optical flow obtained by calculating from a target frame to a reference frame is defined as a forward optical flow, and the optical flow obtained by calculating from the reference frame to the target frame is defined as a backward optical flow.
(1) Initial optical flow estimation
The SimpleFlow method is adopted for the interframe initial optical flow estimation. For two frame images FrtarAnd Frtar+1(x, y) represents FrtarThe middle pixel point, (u (x, y), v (x, y)) represents the optical flow vector at (x, y). Defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}. In the following process, u (x, y) and v (x, y) are abbreviated as u and v, respectively, and FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The corresponding pixel point in (x + u, y + v).
First, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarFrame pixel (x)0,y0) Taking a window of a size a x a centered on it
Figure BDA0001988977880000072
In this process, where a is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel points in the frame are (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarColor information of the middle pixel (p, q), Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain a2Vector e of dimensions.
Then, based on the local smooth likelihood model, the optical flow vector is optimized by combining the color feature and the local distance feature as shown in formula (18):
Figure BDA0001988977880000081
Figure BDA0001988977880000082
Figure BDA0001988977880000083
Figure BDA0001988977880000084
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarPixel point in frame (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn the frame (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;
Figure BDA0001988977880000085
in the method, O is set to be 20, and the change range of the optical flow vector (u, v) is represented; distance weight wdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance difference and color difference determination, setting the color parameter σc0.08 (empirical value), distance parameter σd5.5 (empirical value). The (u, v) that minimizes the E energy is the pixel (x)0,y0) For Fr, the optical flow vector estimation result oftarCalculating optical flow vectors of all pixel points on the frame image to obtain an image FrtarTo the image Frtar+1Forward optical flow of (2).
Likewise, Fr is calculatedtar+1Frame to FrtarThe backward optical flow of the frame.
(2) Occlusion point detection
Recording image FrtarFrame to image Frtar+1The frame forward optical flow is { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1Frame to image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}. Calculating | l (u) for pixel (x, y)f(x),vf(v))-(-ub(x+uf(x)),-vb(y+vf(y))) | |, if the value is not 0, the pixel point (x, y) is considered as a shielding point.
(3) Reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
Figure BDA0001988977880000086
Figure BDA0001988977880000087
Figure BDA0001988977880000088
In the formula (19), the compound represented by the formula (I),
Figure BDA0001988977880000089
denotes FrtarFrame pixel (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;
Figure BDA00019889778800000810
denotes FrtarFrame pixel (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbMinimum (u, v) even imageElement (x)0,y0) The optical flow vector of (a).
And (4) adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point.
S2.1.2 superpixel temporal context and its feature representation
Method for calculating super pixel segmentation map by using S1.1 to FrtarFrame image Frtar-1Frame image and Frtar+1The frame image is subjected to superpixel segmentation.
(1) Superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarFrame superpixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }
Figure BDA0001988977880000091
As shown in equation (20):
Figure BDA0001988977880000092
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1To obtain the region Segtar={(x′,y′)|x′=x+uf(x),y=y+uf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1The corresponding area of (a). Calculating Seg'tarAnd Frtar+1Ith super pixel in frame
Figure BDA0001988977880000097
The cross-over ratio IOU is as shown in equation (21):
Figure BDA0001988977880000093
in the formula (21), Num (·) indicates that the region includes the number of pixels. If it is
Figure BDA0001988977880000094
τ is according to Frtar+1To FrtarInverse optical flow computing superpixels
Figure BDA0001988977880000095
In FrtarCorresponding region Seg 'of frame'tarRegion Seg 'is calculated from equation (21)'tarAnd super pixel SegtarCrow to IOU (Seg'tar,Segtar). If IOU (Segtar,Segtar) τ is then
Figure BDA0001988977880000096
Called super-pixel SegtarIn Frtar+1Corresponding super pixel, super pixel SegtarIn Frtar+1May be 0, 1 or more. In the method, the intersection ratio determination threshold τ is set to 0.3. In the same way, find the super pixel SegtarIn Frtar-1Corresponding superpixels, superpixels Seg, of a frametarIn Frtar-1Is 0, 1 or more.
Super pixel SegtarTime context memory of
Figure BDA0001988977880000101
Wherein
Figure BDA0001988977880000102
And
Figure BDA0001988977880000103
are each FrtarFrame superpixel SegtarIn Frtar-1Frame and Frtar+1A corresponding set of superpixels for the frame.
(2) Superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature of
Figure BDA0001988977880000104
As shown in formula (22):
Figure BDA0001988977880000105
Figure BDA0001988977880000106
is FrtarSuper pixel Seg in frametarIs characterized in that it is a mixture of two or more of the above-mentioned components,
Figure BDA0001988977880000107
is Frtar-1All corresponding superpixels in a frame
Figure BDA0001988977880000108
The mean value of the features is determined by the average,
Figure BDA0001988977880000109
is Frtar+1All corresponding superpixels in a frame
Figure BDA00019889778800001010
The mean of the features, the features of each superpixel, is calculated according to the method of section 1.3 of equation.
FrtarSuperpixel Seg in frametarIn Frtar+1Frame or Frtar-1Using its own characteristics when the number of corresponding superpixels of a frame is 0
Figure BDA00019889778800001011
Substitution
Figure BDA00019889778800001012
Or
Figure BDA00019889778800001013
S2.2 superpixel spatial context
Carrying out superpixel segmentation on the image by using the method of S1.1, and obtaining a superpixel segmentation graph of the highest level when the threshold value of a superpixel hierarchical segmentation tree is set to be 1, namely a root node of the hierarchical segmentation tree, wherein the node represents the whole image as a superpixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary determination criterion ratio is increased, so that pixel points with original boundary probability values of [0.06,0.08] are determined as non-boundary points, and the points are determined as boundary points when the threshold is 0.06. A super pixel of a high level will include a super pixel of a low level. In the method, a spatial context of a parent node superpixel is defined as a child node superpixel in a hierarchical partition tree.
S3 semantic Classification
S3.1 temporal context based superpixel semantic classification
The method inputs the temporal context characteristics of the superpixels, utilizes GBDT (gradient lifting decision tree) to carry out semantic classification on the superpixels, and outputs prediction labels of the superpixels.
In the GBDT training process, a training MR wheel is set, MR belongs to {1, 2, 3., MR }, and the MR wheel trains a regression tree, namely a weak classifier, for each class, namely L regression trees are trained when L classes exist, and j belongs to {1, 2, 3., L }. Finally, L × MR weak classifiers can be obtained. The training method is the same for each classifier in each round.
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
Figure BDA0001988977880000111
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L}。
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vector
Figure BDA0001988977880000112
labi[k]E {0, 1}, if the true label of the ith training sample is j, the l-th component lab of the label vectori[l]The other component value is 0, 1. Calculate the probability that the ith sample belongs to class l
Figure BDA0001988977880000113
I(labiJ) is an indicator function whose value is 1 when the label of sample i is j, and 0 otherwise.
The prediction result of the ith sample applied to the jth classifier of the mr-1 th round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier is
Figure BDA0001988977880000114
As defined in formula (23):
Figure BDA0001988977880000115
Figure BDA0001988977880000116
then get the classification error set of the mr-1 th round
Figure BDA0001988977880000117
When the first classifier of the mr th round is constructed, traversing the training sample data set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the data set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Figure BDA0001988977880000118
Figure BDA0001988977880000119
Wherein,
Figure BDA00019889778800001110
NRegionmindicates that falls into RegionmTotal number of samples. The feature value that minimizes the regression tree error is finally selected as the new classification value of the tree. The regression tree is repeatedly constructed until the set height of the tree is reached, and the height of the regression tree is set to be 5 in the method. The regression trees of other categories in the current round are constructed in the same way.
The number of the regression tree leaf nodes of the jth class in the mr-th round is recorded as Regmr,lEach node is a subset of the training sample set, and any two leaf nodes intersect to form an empty set. Calculating the gain value of each leaf node of the constructed l-type regression tree in the mr round
Figure BDA00019889778800001111
As shown in formula (26):
Figure BDA0001988977880000121
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Figure BDA0001988977880000122
Wherein Reg is in the {1, 2mr,l}
Until the end of training the MR round. Predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
Figure BDA0001988977880000123
wherein Reg is in the {1, 2MR,l}。
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the predicted prediction result of the i sample, and obtaining the formula (29):
Figure BDA0001988977880000124
and by analogy, substituting the predicted prediction result of the ith sample into the regression tree from the I-th class of the MR-1 to the I-th class of the 0 th round to obtain the formula (30)
Figure BDA0001988977880000125
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegSegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(FeaSeg). The class l with the highest probability value is the prediction class of the super pixel Seg.
S3.2 optimizing semantic classification based on spatial context
When the method is used for carrying out superpixel segmentation on an image, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained.
In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result.
Firstly, according to the method of S3.1, semantic classification is carried out on each block of superpixels corresponding to the leaf nodes and the intermediate nodes, the semantic labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08 is obtained, and the final semantic label of the superpixel block is calculated through the formula (31).
Figure BDA0001988977880000131
Figure BDA0001988977880000132
Wherein l represents the final semantic label of the super-pixel block which is the category of the maximum probability value calculated by the formula (31),
Figure BDA0001988977880000133
representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,
Figure BDA0001988977880000134
is the threshold 0.08 probability of a superpixel semantic label being l. Naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence level of the super-pixel semantic annotation is 0.06 of the threshold value, and the value of the method is 0.4; w is atar getThe confidence level of the super-pixel semantic annotation is 0.08, and the value of the method is 0.6.
Drawings
FIG. 1 is a flow chart of an RGBD indoor scene recognition method based on space-time context.
FIG. 2 is a diagram of a superpixel partition hierarchical tree.
FIG. 3 is a schematic diagram of spatial context based optimization.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
As shown in FIGS. 1-3, an RGB-D indoor scene labeling method based on super-pixel space-time context inputs an image Fr to be labeledtarAnd its time-sequential adjacent frames Frtar-1、Frtar+1Output is FrtarPixel level labeling.
Computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1Corresponding super image inA pixel, the corresponding superpixel being its temporal context; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context at the sub-node of the partition tree.
Structure FrtarPerforming feature representation of each super pixel based on time context, and classifying by using the feature of the super pixel based on time context by adopting a Gradient Boost Decision Tree (GBDT); obtaining Fr by weighting and combining the semantic classification results of the superpixels and the space contexts thereof by utilizing the superpixel space contexttarAnd (5) semantic annotation of the middle super pixel.
S1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is known as superpixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image.
S1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of image
Figure BDA0001988977880000141
The gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure BDA0001988977880000142
In the formula (1), the reaction mixture is,
Figure BDA0001988977880000143
is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure BDA0001988977880000144
is a probability value of a pixel belonging to a boundary calculated based on the depth image.
Figure BDA0001988977880000145
Probability value obtained according to formula (1)
Figure BDA0001988977880000146
And setting different probability threshold values tr to obtain a multi-level segmentation result.
The probability threshold tr set in the method is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
S1.2Patch feature
Patch is defined as an m × m-sized grid, and slides from the upper left corner of the color image and the depth image downward to the right in steps of n pixels, eventually forming a dense grid on the color image and the depth image. In the method, the size of the Patch is set to be 16 multiplied by 16 in an experiment, the value of the sliding step length N when the Patch is selected to be 2, an image with the size of N multiplied by M is taken as an example, and the number of the Patch finally obtained is
Figure BDA0001988977880000147
Four types of features are calculated for each Patch: depth gradient features, color features, texture features.
S1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
Figure BDA0001988977880000148
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure BDA0001988977880000151
and
Figure BDA0001988977880000152
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure BDA0001988977880000153
and
Figure BDA0001988977880000154
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;
Figure BDA0001988977880000155
is that
Figure BDA0001988977880000156
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0001988977880000157
representing the kronecker product.
Figure BDA0001988977880000158
And
Figure BDA0001988977880000159
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800001510
and
Figure BDA00019889778800001511
are parameters corresponding to a gaussian kernel function. Finally, the depth gradient feature is transformed by using an EMK (efficient Match Kernel) algorithm, and the transformed feature vector is still marked as Fg_d
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
Figure BDA00019889778800001512
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure BDA00019889778800001513
and
Figure BDA00019889778800001514
respectively representing the gradient direction and the gradient magnitude of the pixel z;
Figure BDA00019889778800001515
and
Figure BDA00019889778800001516
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure BDA00019889778800001517
is that
Figure BDA00019889778800001518
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA00019889778800001519
representing the kronecker product.
Figure BDA00019889778800001520
And
Figure BDA00019889778800001521
respectively a color gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800001522
and
Figure BDA00019889778800001523
are parameters corresponding to a gaussian kernel function. Finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (4):
Figure BDA0001988977880000161
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure BDA0001988977880000162
and
Figure BDA0001988977880000163
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;
Figure BDA0001988977880000164
is that
Figure BDA0001988977880000165
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA0001988977880000166
representing the kronecker product.
Figure BDA0001988977880000167
And
Figure BDA0001988977880000168
respectively a color gaussian kernel function and a position gaussian kernel function,
Figure BDA0001988977880000169
and
Figure BDA00019889778800001610
are parameters corresponding to a gaussian kernel function. Finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol
S1.2.4 Texture feature (Texture)
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
Figure BDA00019889778800001611
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the Local Binary Pattern feature (LBP) of pixel z;
Figure BDA00019889778800001612
and
Figure BDA00019889778800001613
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure BDA00019889778800001614
is that
Figure BDA00019889778800001615
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA00019889778800001616
representing the kronecker product.
Figure BDA00019889778800001617
And
Figure BDA00019889778800001618
respectively a local binary pattern gaussian kernel function and a position gaussian kernel function,
Figure BDA00019889778800001619
and
Figure BDA00019889778800001620
are parameters corresponding to a gaussian kernel function. Finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex
S1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
Figure BDA00019889778800001621
Figure BDA0001988977880000171
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
Figure BDA0001988977880000172
in the formula (7), Fg_d(p),Fg_c(p),Fcol(p),Ftex(p) represents the feature of Patch whose p-th center position falls within the super pixel seg, and n representsThe number of Patch whose core position falls within the super pixel seg.
Superpixel geometry
Figure BDA0001988977880000173
Is defined by the formula (8):
Figure BDA0001988977880000174
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAs defined in formula (9):
Figure BDA0001988977880000175
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg.
Area to perimeter ratio R of super pixelsegAs defined in formula (10):
Figure BDA0001988977880000176
Figure BDA0001988977880000177
is based on the x-coordinate s of the pixel sxY coordinate syAnd a second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moment calculated by multiplying the x coordinate by the y coordinate, respectively, as defined in equations (11), (12) and (13)
Figure BDA0001988977880000178
Figure BDA0001988977880000179
Figure BDA00019889778800001710
In formula (14)
Figure BDA00019889778800001711
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
Figure BDA0001988977880000181
width and Height respectively represent the Width and Height of the image, i.e.
Figure BDA0001988977880000182
The calculation is based on the normalized pixel coordinate values.
Figure BDA0001988977880000183
DvarRespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Figure BDA0001988977880000184
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Figure BDA0001988977880000185
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel to which the principal normal of the point cloud correspondsThe vectors are estimated by Principal Component Analysis (PCA).
S2 superpixel context
The method respectively constructs a time context and a space context based on an RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation.
S2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
In the method, the optical flow obtained by calculating from a target frame to a reference frame is defined as a forward optical flow, and the optical flow obtained by calculating from the reference frame to the target frame is defined as a backward optical flow.
(2) Initial optical flow estimation
The SimpleFlow method is adopted for the interframe initial optical flow estimation. For two frame images FrtarAnd Frtar+1(x, y) represents FrtarThe middle pixel point, (u (x, y), v (x, y)) represents the optical flow vector at (x, y). Defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}. In the following process, u (x, y) and v (x, y) are abbreviated as u and v, respectively, and FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The corresponding pixel point in (x + u, y + v).
First, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarFrame pixel (x)0,y0) Taking a window of a size a x a centered on it
Figure BDA0001988977880000191
In this process, where a is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel points in the frame are (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarColor information of the middle pixel (p, q), Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain a2Vector e of dimensions.
Then, based on the local smooth likelihood model, the optical flow vector is optimized by combining the color feature and the local distance feature as shown in formula (18):
Figure BDA0001988977880000192
Figure BDA0001988977880000193
Figure BDA0001988977880000194
Figure BDA0001988977880000195
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarPixel point in frame (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn the frame (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;
Figure BDA0001988977880000196
in the method, O is set to be 20, and the change range of the optical flow vector (u, v) is represented; distance weight WdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance differences and color differences,setting a color parameter σc0.08 (empirical value), distance parameter σd5.5 (empirical value). The (u, v) that minimizes the E energy is the pixel (x)0,y0) For Fr, the optical flow vector estimation result oftarCalculating optical flow vectors of all pixel points on the frame image to obtain an image FrtarTo the image Frtar+1Forward optical flow of (2).
Also, Fr is calculated according to the method described abovetar+1Frame to FrtarThe backward optical flow of the frame.
(2) Occlusion point detection
Recording image FrtarFrame to image Frtar+1The frame forward optical flow is { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1Frame to image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}. Calculating | l (u) for pixel (x, y)f(x),vf(v))-(-ub(x+uf(x)),-vb(y+vf(y))) | |, if the value is not 0, the pixel point (x, y) is considered as a shielding point.
(3) Reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
Figure BDA0001988977880000201
Figure BDA0001988977880000202
Figure BDA0001988977880000203
In the formula (19), the compound represented by the formula (I),
Figure BDA0001988977880000204
denotes FrtarFrame pixel (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;
Figure BDA0001988977880000205
denotes FrtarFrame pixel (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbMinimum (u, v) even pixel (x)0,y0) The optical flow vector of (a).
And (4) adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point.
S2.1.2 superpixel temporal context and its feature representation
Method for calculating super pixel segmentation map by using S1.1 to FrtarFrame image Frar-1Frame image and Frtar+1The frame image is subjected to superpixel segmentation.
(1) Superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarFrame superpixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }
Figure BDA0001988977880000206
As shown in equation (20):
Figure BDA0001988977880000207
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1To obtain the corresponding pixel ofTo region Segtar={(x′,y′)|x′=x+uf(x),y′=y+uf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1The corresponding area of (a). Calculating Seg'tarAnd Frtar+1Ith super pixel in frame
Figure BDA0001988977880000211
The cross-over ratio IOU is as shown in equation (21):
Figure BDA0001988977880000212
in the formula (21), Num (·) indicates that the region includes the number of pixels. If it is
Figure BDA0001988977880000213
τ is according to Frtar+1To FrtarInverse optical flow computing superpixels
Figure BDA0001988977880000214
In FrtarCorresponding region Seg 'of frame'tarThe region Seg ″, is calculated according to equation (21)tarAnd super pixel SegtarCrow to IOU (Seg'tar,Segtar). If IOU (Segtar,Segtar) τ is then
Figure BDA0001988977880000215
Called super-pixel SegtarIn Frtar+1Corresponding super pixel (super pixel Seg)tarIn Frtar+1May be 0, 1 or more). In the present method, the intersection ratio determination threshold τ is set to 0.3 (empirical value). In the same way, find the super pixel SegtarIn Frtar-1Corresponding superpixel (superpixel Seg) of frametarIn Frtar-1May be 0, 1 or more).
Super pixel SegtarTime context memory of
Figure BDA0001988977880000216
Wherein
Figure BDA0001988977880000217
And
Figure BDA0001988977880000218
are each FrtarFrame superpixel SegtarIn Frtar-1Frame and Frtar+1A corresponding set of superpixels for the frame.
(2) Superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature of
Figure BDA0001988977880000219
As shown in formula (22):
Figure BDA00019889778800002110
Figure BDA00019889778800002111
is FrtarSuper pixel Seg in frametarIs characterized in that it is a mixture of two or more of the above-mentioned components,
Figure BDA00019889778800002112
is Frtar-1All corresponding superpixels in a frame
Figure BDA00019889778800002113
The mean value of the features is determined by the average,
Figure BDA00019889778800002114
is Frtar+1All corresponding superpixels in a frame
Figure BDA00019889778800002115
The mean of the features, the features of each superpixel, is calculated according to the method of section 1.3 of equation.
FrtarSuperpixel Seg in frametarIn Frtar+1Frame or Frtar-1Using its own characteristics when the number of corresponding superpixels of a frame is 0
Figure BDA00019889778800002116
Substitution
Figure BDA00019889778800002117
Or
Figure BDA00019889778800002118
S2.2 superpixel spatial context
The image is super-pixel segmented by the method of section S1.1, and fig. 2 shows a super-pixel hierarchical segmentation tree obtained according to a plurality of boundary judgment thresholds. When the threshold value of the super pixel hierarchical segmentation tree is set to be 1, a super pixel segmentation graph of the highest level, namely a root node of the hierarchical segmentation tree, can be obtained, and the node represents the whole image as a super pixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary determination criterion ratio is increased, so that pixel points with original boundary probability values of [0.06,0.08] are determined as non-boundary points, and the points are determined as boundary points when the threshold is 0.06. It can be seen that the super-pixels of the high level include the super-pixels of the low level. In the method, a spatial context of a parent node superpixel is defined as a child node superpixel in a hierarchical partition tree.
S3 semantic Classification
S3.1 temporal context based superpixel semantic classification
The method inputs the temporal context characteristics of the superpixels, utilizes GBDT (gradient lifting decision tree) to carry out semantic classification on the superpixels, and outputs prediction labels of the superpixels.
In the GBDT training process, a training MR wheel is set, MR belongs to {1, 2, 3., MR }, and the MR wheel trains a regression tree (weak classifier) for each class, namely L regression trees are trained when L classes exist, and L belongs to {1, 2, 3., L }. Finally, L × MR weak classifiers can be obtained. The training method is the same for each classifier in each round.
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
Figure BDA0001988977880000227
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L}。
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vector
Figure BDA0001988977880000221
labi[k]E {0, 1}, if the true label of the ith training sample is l, the l-th dimension component lab of the label vectori[l]The other component value is 0, 1. Calculate the probability that the ith sample belongs to class l
Figure BDA0001988977880000222
I(labiL) is an indicator function, which has a value of 1 when the label of sample i is l, and 0 otherwise.
The prediction result of the ith sample and the first classifier of the mr-1 round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier is
Figure BDA0001988977880000223
As defined in formula (23):
Figure BDA0001988977880000224
Figure BDA0001988977880000225
then get the classification error set of the mr-1 th round
Figure BDA0001988977880000226
When the first classifier of the mr th round is constructed, traversing the training sample data set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the data set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Figure BDA0001988977880000231
Figure BDA0001988977880000232
Wherein,
Figure BDA0001988977880000233
NRegionmindicates that falls into RegionmTotal number of samples. The feature value that minimizes the regression tree error is finally selected as the new classification value of the tree. The above process is repeated to construct a regression tree until the set height of the tree is reached, and the height of the regression tree is set to be 5 in the method. The regression trees of other categories in the current round are constructed in the same way.
The number of the regression tree leaf nodes of the l class in the mr th round is recorded as Regmr,lEach node is a subset of the training sample set, and any two leaf nodes intersect to form an empty set. Calculating the gain value of each leaf node of the constructed l-type regression tree in the mr round
Figure BDA0001988977880000234
As shown in formula (26):
Figure BDA0001988977880000235
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Figure BDA0001988977880000236
Wherein Reg is in the {1, 2mr,l}
The calculation is carried out by the above flow until the training MR wheel is finished. Predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
Figure BDA0001988977880000237
wherein Reg is in the {1, 2MR,l}。
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the predicted prediction result of the i sample, and obtaining the formula (29):
Figure BDA0001988977880000238
and by analogy, substituting the predicted prediction result of the ith sample into the regression tree from the I-th class of the MR-1 to the I-th class of the 0 th round to obtain the formula (30)
Figure BDA0001988977880000241
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegsegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(Feaseg). The class l with the highest probability value is the prediction class of the super pixel Seg.
S3.2 optimizing semantic classification based on spatial context
When the method is used for carrying out superpixel segmentation on an image, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained, as shown in fig. 3.
In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result.
Firstly, according to the method of S3.1, semantically classifying each block of superpixels corresponding to the leaf nodes and the intermediate nodes in the graph 3 to obtain the semanteme labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08, and calculating the final semantic label of the superpixel block through a formula (31).
Figure BDA0001988977880000242
Figure BDA0001988977880000243
Wherein l represents the final semantic label of the super-pixel block which is the category of the maximum probability value calculated by the formula (31),
Figure BDA0001988977880000244
representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,
Figure BDA0001988977880000245
is the threshold 0.08 probability of a superpixel semantic label being l. Naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence level of the super-pixel semantic annotation is 0.06 of the threshold value, and the value of the method is 0.4; wtargetThe confidence level of the super-pixel semantic annotation is 0.08, and the value of the method is 0.6.
Table 1 class 13 semantic experiments on NYUV2 dataset this method is compared with the class mean accuracy of other RGB-D indoor scene labeling methods based on defined features.
TABLE 1
Figure BDA0001988977880000251
[1]C.Coupire,C.Farabet,L.Najman and Y.LeCun..Indoor scene segmentation using depth information.In ICLR,2013.
[2]A.Hermans,G.Floros,and B.Leibe.Dense 3d semantic mapping of indoor scenes fron rgb-d images.In ICRA,2014.
[3]A.Wang,J.Lu,J.Cai,G.Wang,and T.-J.Cham.Unsupervised joint feature 1eaming and encoding for rgb-d scene labeling(TIP),2015.
[4]J.Wang,Z.Wang,D.Tao,S.See and G.Wang.Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks.In ECCV,2016.

Claims (2)

1. An RGB-D indoor scene labeling method based on super-pixel space-time context is characterized in that: inputting the image Fr to be annotatedtarAnd the front and rear adjacent frames Fr in the time sequence thereoftar-1、Frtar+1Output is FrtarPixel level labeling of (1);
computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1The corresponding superpixel in (1), namely the time context of the corresponding superpixel; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context, the sub-node of each superpixel in the partition tree;
structure FrtarPerforming feature representation of each super pixel based on time context, and classifying the super pixels based on the time context features by adopting a gradient lifting tree; fr is obtained by utilizing the super-pixel space context weighted combination and the semantic classification result of the space contexttarSemantic annotation of the super-middle pixel;
s1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is called superpixel segmentation; the super-pixel is a region formed by a series of pixel points with adjacent positions and similar color, brightness and texture characteristics, the region retains local effective information and cannot damage the boundary information of an object in an image;
s1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of image
Figure FDA0003169217590000011
The gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)
Figure FDA0003169217590000012
In the formula (1), the reaction mixture is,
Figure FDA0003169217590000013
is a probability value calculated based on the color image that a pixel belongs to the boundary,
Figure FDA0003169217590000014
the probability value of the pixel belonging to the boundary is calculated based on the depth image;
Figure FDA0003169217590000015
probability value obtained according to formula (1)
Figure FDA0003169217590000016
Setting different probability threshold values tr to obtain a multi-level segmentation result;
the set different probability threshold values tr are respectively 0.06 and 0.08, and the pixels with the probability values smaller than the set probability threshold values are connected into a region according to an eight-connection principle, wherein each region is a super pixel;
s1.2Patch feature
Patch is defined as a grid with the size of h multiplied by h, and slides downwards from the upper left corner of the color image and the depth image to the right by taking hs pixels as step length, and finally dense grids are formed on the color image and the depth image; wherein, the size of the Patch is 16 × 16, the sliding step hs is 2 when the Patch is selected, the size is N × M, and the final number of the patches is
Figure FDA0003169217590000017
Four types of features are calculated for each Patch: depth gradient features, color features, texture features;
s1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
Figure FDA0003169217590000021
Figure FDA0003169217590000022
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
Figure FDA0003169217590000023
in the formula (7), Fg_d(q1),Fg_c(q1),Fcol(q1),Ftex(q1) represents the feature of the Patch whose center position falls within the superpixel seg at the q1 th position, and n represents the number of patches whose center positions fall within the superpixel seg;
superpixel geometry
Figure FDA0003169217590000024
Is defined by the formula (8):
Figure FDA0003169217590000025
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAccording to BsegObtained, defined as formula (9):
Figure FDA0003169217590000026
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegAs defined in formula (10):
Figure FDA0003169217590000027
Figure FDA0003169217590000028
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order Hu moment calculated by the product of the x coordinate and the y coordinate is defined as formulas (11), (12) and (13)
Figure FDA0003169217590000031
Figure FDA0003169217590000032
Figure FDA0003169217590000033
In formula (14)
Figure FDA0003169217590000034
Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
Figure FDA0003169217590000035
width and Heiqht respectively represent the image Width and height, i.e.
Figure FDA0003169217590000036
Performing a calculation based on the normalized pixel coordinate values;
Figure FDA0003169217590000037
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Figure FDA0003169217590000038
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Figure FDA0003169217590000039
Nsegis the principal normal vector mode length of the point cloud corresponding to the superpixel, wherein the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA);
s2 superpixel context
Respectively constructing a time context and a space context based on the RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation;
s2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
Defining the optical flow obtained by calculating from the target frame to the reference frame as a forward optical flow, and defining the optical flow obtained by calculating from the reference frame to the target frame as a reverse optical flow;
(1) initial optical flow estimation
The interframe initial optical flow estimation adopts a SimpleFlow method; for two frame images FrtarAnd Frtar+1(x, y) represents FrtarA middle pixel point, (u (x, y), v (x, y)) represents an optical flow vector at (x, y); defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}; when u (x, y) and v (x, y) are respectively abbreviated as u and v, FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The middle corresponding pixel point is (x + u, y + v);
first, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarPixel point (x)0,y0) Taking a window of size b × b centered thereon
Figure FDA0003169217590000041
Wherein, b is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel point in (1) is (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarMiddle pixel point (p)Q) pixel point color information, Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain b2A vector e of dimensions;
then, the optical flow vector is optimized by combining the color feature and the local distance feature based on the local smooth likelihood model, as shown in formula (18):
Figure FDA0003169217590000042
Figure FDA0003169217590000043
Figure FDA0003169217590000044
Figure FDA0003169217590000045
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarMiddle pixel (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;
Figure FDA0003169217590000046
where O ═ 20 denotes the range of change of the optical flow vector (u, v); distance weight wdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance difference and color difference determination, setting the color parameter σc0.08, distance parameter σd5.5; make E energyThe smallest (u, v) is the pixel (x)0,y0) As a result of estimating the optical flow vector of (1), for the image FrtarCalculating optical flow vectors of all the upper pixel points to obtain an image FrtarTo the image Frtar+1Forward optical flow of (a);
likewise, Fr is calculatedtar+1To FrtarThe backward light flow of (2);
(2) occlusion point detection
Recording image FrtarTo the image Frtar+1Forward optical flow of { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1To the image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}; calculating | l (u) for pixel (x, y)f(x),vf(y))-(-ub(x+uf(x)),-vb(y+vf(y))) |, if that value (| | (u)) |f(x),vf(y))-(-ub(x+uf(x)),-vb(y+vf(y))) | |) is not 0, the pixel point (x, y) is considered as a shielding point;
(3) reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
Figure FDA0003169217590000051
Figure FDA0003169217590000052
Figure FDA0003169217590000053
In the formula (19), the compound represented by the formula (I),
Figure FDA0003169217590000054
denotes FrtarPixel point (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;
Figure FDA0003169217590000055
denotes FrtarPixel point (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbThe smallest (u, v) is the pixel (x)0,y0) An optical flow vector of (d);
adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point;
s2.1.2 superpixel temporal context and its feature representation
Image Fr by using the method of super-pixel segmentation map calculated by S1.1tarImage Frtar-1And an image Frtar+1Performing super-pixel segmentation;
(1) superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarSuper pixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }
Figure FDA0003169217590000056
As shown in equation (20):
Figure FDA0003169217590000061
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1Obtaining a region Seg 'from the corresponding pixel of (1)'tar={(x′,y′)|x′=x+uf(x),y′=y+vf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1A corresponding region of (1); calculating Seg'tarAnd Frtar+1Middle ith super pixel
Figure FDA0003169217590000062
The cross-over ratio IOU is as shown in equation (21):
Figure FDA0003169217590000063
in the formula (21), Num (·) indicates that the region contains the number of pixels; if it is
Figure FDA0003169217590000064
Then according to Frtar+1To FrtarInverse optical flow computing superpixels
Figure FDA0003169217590000065
In FrtarCorresponding region of (1)' SegtarThe region Seg ″, is calculated according to equation (21)tarAnd super pixel SegtarCross-over ratio of (IOU) (Segtar,Segtar) (ii) a If IOU (Segtar,Segtar) Is greater than or equal to tau, then
Figure FDA0003169217590000066
Called super-pixel SegtarIn Frtar+1Corresponding super pixel, super pixel SegtarIn Frtar+1Is 0, 1 or more; setting an intersection ratio judgment threshold value tau to be 0.3; finding a superpixel SegtarIn Frtar-1Corresponding super pixel, super pixel SegtarIn Frtar-1Is 0, 1 or more;
super pixel SegtarTime context memory of
Figure FDA0003169217590000067
Wherein
Figure FDA0003169217590000068
And
Figure FDA0003169217590000069
are each FrtarFrame superpixel SegtarIn FFtar-1And Frtar+1A corresponding set of superpixels;
(2) superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature of
Figure FDA00031692175900000610
As shown in formula (22):
Figure FDA00031692175900000611
Figure FDA00031692175900000612
is FrtarSuper pixel SegtarIs characterized in that it is a mixture of two or more of the above-mentioned components,
Figure FDA00031692175900000613
is Frtar-1All corresponding super pixels in
Figure FDA00031692175900000614
The mean value of the features is determined by the average,
Figure FDA00031692175900000615
is Frtar+1All corresponding super pixels in
Figure FDA00031692175900000616
Characterised byMean value, the characteristic of each superpixel is calculated according to the method of S1.3;
Frtarsuper pixel Seg in (1)tarIn Frtar+1Or Frtar-1When the number of corresponding super pixels of (1) is 0, its own characteristics are used
Figure FDA0003169217590000071
Substitution
Figure FDA0003169217590000072
Or
Figure FDA0003169217590000073
S2.2 superpixel spatial context
Carrying out superpixel segmentation on the image by using the method of S1.1, and obtaining a superpixel segmentation graph of the highest level when the threshold value of a superpixel hierarchical segmentation tree is set to be 1, namely a root node of the hierarchical segmentation tree, wherein the node represents the whole image as a superpixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary judgment standard ratio is improved, so that pixel points with the original boundary probability value of [0.06,0.08] are judged as non-boundary points, and the points are judged as boundary points when the threshold is 0.06; a super pixel of a high level will include a super pixel of a low level therein; defining a spatial context of a super pixel of a child node as a super pixel of a father node in a hierarchical partition tree;
s3 semantic Classification
S3.1 temporal context based superpixel semantic classification
Taking the temporal context characteristics of the superpixels as input, performing superpixel semantic classification by using GBDT, and outputting a prediction label of the superpixels;
in the GBDT training process, setting a training MR wheel, wherein MR belongs to {1, 2, 3., MR }, and training a regression tree, namely a weak classifier, for each class by the MR wheel, namely training L regression trees when L classes exist, and L belongs to {1, 2, 3., L }; finally, L multiplied by MR weak classifiers can be obtained; the training method for each classifier is the same in each round;
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
Figure FDA0003169217590000074
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L};
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vector
Figure FDA0003169217590000075
labi[k]E {0, 1}, if the true label of the ith training sample is l, the l-th dimension component lab of the label vectori[l]1, the other component value is 0; calculate the probability that the ith sample belongs to class l
Figure FDA0003169217590000076
I(labiL) is an indicator function, the value of which is 1 when the label of sample i is l, otherwise the value is 0;
the prediction result of the ith sample and the first classifier of the mr-1 round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier is
Figure FDA0003169217590000077
As defined in formula (23):
Figure FDA0003169217590000078
Figure FDA0003169217590000081
then get the classification error set of the mr-1 th round
Figure FDA0003169217590000082
When the first classifier of the mr th round is constructed, traversing the training set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the training set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Figure FDA0003169217590000083
Figure FDA0003169217590000084
Wherein,
Figure FDA0003169217590000085
m=1,2,NRegionmindicates that falls into RegionmTotal number of samples of (a); finally, selecting the characteristic value which enables the error of the regression tree to be minimum as a new classification value of the tree; repeatedly constructing a regression tree until the set height of the tree is reached, wherein the height of the regression tree is 5; constructing regression trees of other categories in the current round by the same method;
the number of the regression tree leaf nodes of the l class in the mr th round is recorded as Regmr,lEach node is a subset of the training sample set, and the intersection of any two leaf nodes is an empty set; calculating the gain value of each leaf node of the constructed l-type regression tree in the mr round
Figure FDA0003169217590000086
As shown in formula (26):
Figure FDA0003169217590000087
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Figure FDA0003169217590000088
Wherein Reg is in the {1, 2mr,l}
Until the training of the MR wheel is finished; predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
Figure FDA0003169217590000089
wherein Reg is in the {1, 2MR,l};
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the prediction result of the i sample, and obtaining the formula (29):
Figure FDA00031692175900000810
and by analogy, substituting the predicted result of the ith sample into the regression tree from the I type of the MR-1 round to the I type of the 0 round to obtain the formula (30)
Figure FDA0003169217590000091
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegSegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(FeaSeg) (ii) a The class l with the highest probability value is the prediction class of the super pixel Seg;
s3.2 optimizing semantic classification based on spatial context
When the image is subjected to superpixel segmentation, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained;
the semantic annotation of the superpixel determined by the threshold of 0.08 is taken as an optimization target, and the superpixel determined by the threshold of 0.06 segmentation is taken as a spatial context and is used for optimizing a semantic annotation result;
firstly, according to the method of S3.1, performing semantic classification on each super pixel corresponding to the leaf nodes and the intermediate nodes to obtain the semantic labeling probability of each super pixel in the super pixel segmentation graph under the threshold values of 0.06 and 0.08, and calculating the final semantic label of the super pixel block through the formula (31);
Figure FDA0003169217590000092
Figure FDA0003169217590000093
wherein l*Means that the final semantic label of the super-pixel block which is the category with the maximum probability value is calculated by equation (31),
Figure FDA0003169217590000094
representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,
Figure FDA0003169217590000095
a probability of a superpixel semantic label being l, which is a threshold of 0.08; naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence of the super-pixel semantic annotation with a threshold value of 0.06 is taken as 0.4; w is atargetThe confidence of the super-pixel semantic annotation with the threshold value of 0.08 is 0.6.
2. The RGB-D indoor scene labeling method based on superpixel spatiotemporal context as claimed in claim 1, characterized in that: the implementation of the S1.2Patch feature is as follows,
s1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
Figure FDA0003169217590000101
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure FDA0003169217590000102
and
Figure FDA0003169217590000103
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure FDA0003169217590000104
and
Figure FDA0003169217590000105
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;
Figure FDA0003169217590000106
is that
Figure FDA0003169217590000107
The mapping coefficient of the t-th principal component obtained by kernel principal component analysis is applied,
Figure FDA0003169217590000108
represents the kronecker product;
Figure FDA0003169217590000109
and
Figure FDA00031692175900001010
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure FDA00031692175900001011
and
Figure FDA00031692175900001012
parameters corresponding to the gaussian kernel function; finally, the EMK algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
Figure FDA00031692175900001013
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure FDA00031692175900001014
and
Figure FDA00031692175900001015
respectively representing the gradient direction and the gradient magnitude of the pixel z;
Figure FDA00031692175900001016
and
Figure FDA00031692175900001017
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure FDA00031692175900001018
is that
Figure FDA00031692175900001019
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA00031692175900001020
represents the kronecker product;
Figure FDA00031692175900001021
and
Figure FDA00031692175900001022
respectively a color gradient gaussian kernel function and a position gaussian kernel function,
Figure FDA00031692175900001023
and
Figure FDA00031692175900001024
parameters corresponding to the gaussian kernel function; finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (4):
Figure FDA0003169217590000111
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure FDA0003169217590000112
and
Figure FDA0003169217590000113
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;
Figure FDA0003169217590000114
is that
Figure FDA0003169217590000115
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA0003169217590000116
represents the kronecker product;
Figure FDA0003169217590000117
and
Figure FDA0003169217590000118
respectively a color gaussian kernel function and a position gaussian kernel function,
Figure FDA0003169217590000119
and
Figure FDA00031692175900001110
parameters corresponding to the gaussian kernel function; finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol
S1.2.4 textural features
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
Figure FDA00031692175900001111
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; lbp (z) is a local binary pattern feature of pixel z;
Figure FDA00031692175900001112
and
Figure FDA00031692175900001113
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure FDA00031692175900001114
is that
Figure FDA00031692175900001115
The mapping coefficient of the t-th principal component obtained by kernel principal component analysis is applied,
Figure FDA00031692175900001116
represents the kronecker product;
Figure FDA00031692175900001117
and
Figure FDA00031692175900001118
local binary pattern Gaussian kernel function and position Gaussian kernel functionThe number of the first and second groups is,
Figure FDA00031692175900001119
and
Figure FDA00031692175900001120
parameters corresponding to the gaussian kernel function; finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex
CN201910174110.2A 2019-03-08 2019-03-08 RGB-D indoor scene labeling method based on super-pixel space-time context Active CN109829449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910174110.2A CN109829449B (en) 2019-03-08 2019-03-08 RGB-D indoor scene labeling method based on super-pixel space-time context

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910174110.2A CN109829449B (en) 2019-03-08 2019-03-08 RGB-D indoor scene labeling method based on super-pixel space-time context

Publications (2)

Publication Number Publication Date
CN109829449A CN109829449A (en) 2019-05-31
CN109829449B true CN109829449B (en) 2021-09-14

Family

ID=66865700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910174110.2A Active CN109829449B (en) 2019-03-08 2019-03-08 RGB-D indoor scene labeling method based on super-pixel space-time context

Country Status (1)

Country Link
CN (1) CN109829449B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428504B (en) * 2019-07-12 2023-06-27 北京旷视科技有限公司 Text image synthesis method, apparatus, computer device and storage medium
CN110517270B (en) * 2019-07-16 2022-04-12 北京工业大学 Indoor scene semantic segmentation method based on super-pixel depth network
CN110599517A (en) * 2019-08-30 2019-12-20 广东工业大学 Target feature description method based on local feature and global HSV feature combination
CN110751153B (en) * 2019-09-19 2023-08-01 北京工业大学 Semantic annotation method for indoor scene RGB-D image
CN111104984B (en) * 2019-12-23 2023-07-25 东软集团股份有限公司 Method, device and equipment for classifying CT (computed tomography) images
CN111292341B (en) * 2020-02-03 2023-01-03 北京海天瑞声科技股份有限公司 Image annotation method, image annotation device and computer storage medium
CN111611919B (en) * 2020-05-20 2022-08-16 西安交通大学苏州研究院 Road scene layout analysis method based on structured learning
CN113034378B (en) * 2020-12-30 2022-12-27 香港理工大学深圳研究院 Method for distinguishing electric automobile from fuel automobile
CN113570530B (en) * 2021-06-10 2024-04-16 北京旷视科技有限公司 Image fusion method, device, computer readable storage medium and electronic equipment
CN115118948B (en) * 2022-06-20 2024-04-05 北京华录新媒信息技术有限公司 Repairing method and device for irregular shielding in panoramic video
CN115952312B (en) * 2022-12-02 2024-07-19 北京工业大学 Automatic labeling and sorting method for image labels

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
CN107292253A (en) * 2017-06-09 2017-10-24 西安交通大学 A kind of visible detection method in road driving region
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109389605A (en) * 2018-09-30 2019-02-26 宁波工程学院 Dividing method is cooperateed with based on prospect background estimation and the associated image of stepped zone

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644429B (en) * 2017-09-30 2020-05-19 华中科技大学 Video segmentation method based on strong target constraint video saliency

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
CN107292253A (en) * 2017-06-09 2017-10-24 西安交通大学 A kind of visible detection method in road driving region
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109389605A (en) * 2018-09-30 2019-02-26 宁波工程学院 Dividing method is cooperateed with based on prospect background estimation and the associated image of stepped zone

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE;Jerome H. Friedman;《The Annals of Statistics》;20011231;第29卷(第5期);第1189-1232页 *
STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling;Yang He et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171231;第7158-7167页 *
Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing;Xiaohui Huang et al;《IPSJ Transactions on Computer Vision and Applications》;20180619;第1-16页 *
融合时空多特征表示的无监督视频分割算法;李雪君 等;《计算机应用》;20171110;第31卷(第11期);第3134-3138、3151页 *

Also Published As

Publication number Publication date
CN109829449A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN109829449B (en) RGB-D indoor scene labeling method based on super-pixel space-time context
CN104182772B (en) A kind of gesture identification method based on deep learning
Cao et al. Exploiting depth from single monocular images for object detection and semantic segmentation
Zhang et al. Long-range terrain perception using convolutional neural networks
CN107273905B (en) Target active contour tracking method combined with motion information
CN109859238B (en) Online multi-target tracking method based on multi-feature optimal association
CN110096961B (en) Indoor scene semantic annotation method at super-pixel level
CN105740915B (en) A kind of collaboration dividing method merging perception information
CN113592894B (en) Image segmentation method based on boundary box and co-occurrence feature prediction
CN107977660A (en) Region of interest area detecting method based on background priori and foreground node
CN106157330B (en) Visual tracking method based on target joint appearance model
CN107194929B (en) Method for tracking region of interest of lung CT image
CN108038515A (en) Unsupervised multi-target detection tracking and its storage device and camera device
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
Basavaiah et al. Robust Feature Extraction and Classification Based Automated Human Action Recognition System for Multiple Datasets.
Lin et al. An interactive approach to pose-assisted and appearance-based segmentation of humans
Schulz et al. Object-class segmentation using deep convolutional neural networks
CN108765384B (en) Significance detection method for joint manifold sequencing and improved convex hull
CN106296740B (en) A kind of target fine definition tracking based on low-rank sparse expression
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video
Hassan et al. Salient object detection based on CNN fusion of two types of saliency models
Dadgostar et al. Gesture-based human–machine interfaces: a novel approach for robust hand and face tracking
CN109389127A (en) Structuring multiple view Hessian regularization sparse features selection method
Nourmohammadi-Khiarak et al. Object detection utilizing modified auto encoder and convolutional neural networks
Maia et al. Visual object tracking by an evolutionary self-organizing neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant