CN109829449B - RGB-D indoor scene labeling method based on super-pixel space-time context - Google Patents
RGB-D indoor scene labeling method based on super-pixel space-time context Download PDFInfo
- Publication number
- CN109829449B CN109829449B CN201910174110.2A CN201910174110A CN109829449B CN 109829449 B CN109829449 B CN 109829449B CN 201910174110 A CN201910174110 A CN 201910174110A CN 109829449 B CN109829449 B CN 109829449B
- Authority
- CN
- China
- Prior art keywords
- tar
- pixel
- super
- seg
- superpixel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 67
- 230000011218 segmentation Effects 0.000 claims abstract description 61
- 238000005457 optimization Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 121
- 230000003287 optical effect Effects 0.000 claims description 98
- 230000006870 function Effects 0.000 claims description 42
- 238000012549 training Methods 0.000 claims description 33
- 230000002123 temporal effect Effects 0.000 claims description 28
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 238000000513 principal component analysis Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 12
- 238000005192 partition Methods 0.000 claims description 7
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 239000011541 reaction mixture Substances 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- MQOMKCIKNDDXEZ-UHFFFAOYSA-N 1-dibutylphosphoryloxy-4-nitrobenzene Chemical compound CCCCP(=O)(CCCC)OC1=CC=C([N+]([O-])=O)C=C1 MQOMKCIKNDDXEZ-UHFFFAOYSA-N 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims 1
- 238000003066 decision tree Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses an RGB-D indoor scene labeling method based on super-pixel space-time context, which is characterized in that in the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is called super-pixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image. In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result. And performing semantic classification on each block of superpixels corresponding to the leaf nodes and the intermediate nodes to obtain the semantic labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08. The method is obviously superior to the conventional indoor scene labeling method.
Description
Technical Field
The invention relates to RGB-D indoor scene image annotation, and belongs to the field of computer vision and pattern recognition.
Background
Semantic annotation of indoor scene images is a challenging task in current vision-based scene understanding, with the basic goal of densely providing a predefined semantic class label for each pixel in a given indoor scene image (or frame in a captured indoor scene video).
The problems of a large number of semantic categories, mutual shielding of scene objects, weak identification of bottom layer visual features, uneven illumination and the like exist in an indoor scene, so that the image annotation of the indoor scene faces huge difficulty. With the popularity of depth sensors, RGB-D data including color, texture, and depth has now been readily and reliably available. RGB-D indoor scene labeling generally has two types of methods, one is RGB-D indoor scene labeling based on definition characteristics; and secondly, RGB-D indoor scene labeling based on learning features. The invention provides an RGB-D indoor scene labeling method based on super-pixel space-time context, belonging to an RGB-D indoor scene labeling method based on definition characteristics.
Comprehensive analysis is given below for the primary method of RGB-D indoor scene labeling based on defined features. As a precursor for performing indoor scene semantic annotation by using Depth information, Silberman and the like extract SIFT feature descriptors from color images (RGB), Depth images (Depth) and RGB after rotation processing, and perform semantic classification on the feature descriptors through a feedback type forward neural network to obtain an image semantic annotation result. And after obtaining a semantic annotation result, further optimizing by using simple CRFs (conditional random field probability map models). Ren et al performs superpixel segmentation on an image using the gPb/UCM algorithm, and combines a superpixel set into a hierarchical tree structure based on a segmentation threshold. Feature descriptions of a Patch (image block) are densely calculated on an RGB-D image, and feature descriptions of a super pixel region are calculated based on the Patch features. In semantic classification, the superpixel features are used as the input of the SVM, and the classification result of each superpixel is given. And constructing new super-pixel class characteristics based on the label vectors obtained by the SVM classifier, and further optimizing the recognition result by using the new characteristics to construct an MRFs (Markov random field) model.
In semantic recognition, one consensus is to use more context information, and the recognition results are usually more accurate. The pixel-level spatial context generally constructs an MRF or CRF model based on the adjacency relation between pixels, and the semantic labels of adjacent pixels are constrained to be consistent. And (3) the super-pixel level space context is used for connecting super-pixel features with inclusion relation in series to serve as classification features, or a super-pixel information CRF model is used. In the superpixel information CRF model, the pre-estimated probability of a pixel point is used as unitary energy, the characteristic difference of a pixel point pair is used as binary energy, the superpixel information is used as high-level energy, and an optimal label is determined by solving a defined energy function.
In the use of time context, Kundu considers that the pixel information of adjacent frames in a video sequence under the same scene is overlapped, so a new dense CRF model method is provided.
Object of the Invention
The invention aims to fully utilize time and space context, calculate superpixel time context by utilizing continuous frame images in the annotation process, and jointly complete an indoor scene annotation task by utilizing the space context provided by hierarchical superpixel segmentation.
In order to achieve the purpose, the technical scheme adopted by the invention is an RGB-D indoor scene labeling method based on super-pixel space-time context, and an image Fr to be labeled is inputtarAnd its time-sequential adjacent frames Frtar-1、Frtar+1Output is FrtarPixel level labeling.
Computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1The corresponding superpixel in (1), namely the time context of the corresponding superpixel; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context at the sub-node of the partition tree.
Structure FrtarPerforming feature representation of each super pixel based on time context, and classifying by using the feature of the super pixel based on time context by adopting a Gradient Boost Decision Tree (GBDT); obtaining Fr by weighting and combining the semantic classification results of the superpixels and the space contexts thereof by utilizing the superpixel space contexttarAnd (5) semantic annotation of the middle super pixel.
S1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is known as superpixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image.
S1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of imageThe gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1). In the formula (1), the reaction mixture is,is a probability value calculated based on the color image that a pixel belongs to the boundary,is a probability value of a pixel belonging to a boundary calculated based on the depth image.
Probability value obtained according to formula (1)And setting different probability threshold values tr to obtain a multi-level segmentation result.
The probability threshold tr set in the method is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
s1.2Patch feature
Patch is defined as an m × m-sized grid, and slides from the upper left corner of the color image and the depth image downward to the right in steps of n pixels, eventually forming a dense grid on the color image and the depth image. In the method, the size of the Patch is set to be 16 multiplied by 16 in an experiment, an image with the sliding step length N of 2 and the size of N multiplied by M is selected when the Patch is selected, and the number of the finally obtained patches isFour types of features are calculated for each Patch: depth gradient features, color features, texture features.
S1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;andrespectively representing the depth gradient direction and the gradient magnitude of the pixel z;andthe depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a depth gradient gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the depth gradient feature is transformed by using an EMK (efficient Match Kernel) algorithm, and the transformed feature vector is still marked as Fg_d。
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;andrespectively representing the gradient direction and the gradient magnitude of the pixel z;andcolor gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a color gradient gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c。
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is represented by the formula(4) Defining:
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;andcolor basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,representing the kronecker product.Andrespectively a color gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol。
S1.2.4 Texture feature (Texture)
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the Local Binary Pattern feature (LBP) of pixel z;andrespectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively local binary pattern Gaussian kernel function and bitThe method is characterized by comprising the following steps of setting a Gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex。
S1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
in the formula (7), Fg_d(p),Fg_c(p),Fcol(p),Ftex(p) represents the feature of the Patch whose p-th center position falls within the super pixel seg, and n represents the number of the patches whose center positions fall within the super pixel seg.
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAs defined in formula (9):
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg.
Area to perimeter ratio R of super pixelsegAs defined in formula (10):
is based on the x-coordinate s of the pixel sxY coordinate syAnd a second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moment calculated by multiplying the x coordinate by the y coordinate, respectively, as defined in equations (11), (12) and (13)
In formula (14)Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
width and Height respectively represent the Width and Height of the image, i.e.The calculation is based on the normalized pixel coordinate values.
DvarRespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
S2 superpixel context
The method respectively constructs a time context and a space context based on an RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation.
S2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
In the method, the optical flow obtained by calculating from a target frame to a reference frame is defined as a forward optical flow, and the optical flow obtained by calculating from the reference frame to the target frame is defined as a backward optical flow.
(1) Initial optical flow estimation
The SimpleFlow method is adopted for the interframe initial optical flow estimation. For two frame images FrtarAnd Frtar+1(x, y) represents FrtarThe middle pixel point, (u (x, y), v (x, y)) represents the optical flow vector at (x, y). Defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}. In the following process, u (x, y) and v (x, y) are abbreviated as u and v, respectively, and FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The corresponding pixel point in (x + u, y + v).
First, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarFrame pixel (x)0,y0) Taking a window of a size a x a centered on it
In this process, where a is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel points in the frame are (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarColor information of the middle pixel (p, q), Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain a2Vector e of dimensions.
Then, based on the local smooth likelihood model, the optical flow vector is optimized by combining the color feature and the local distance feature as shown in formula (18):
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarPixel point in frame (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn the frame (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;in the method, O is set to be 20, and the change range of the optical flow vector (u, v) is represented; distance weight wdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance difference and color difference determination, setting the color parameter σc0.08 (empirical value), distance parameter σd5.5 (empirical value). The (u, v) that minimizes the E energy is the pixel (x)0,y0) For Fr, the optical flow vector estimation result oftarCalculating optical flow vectors of all pixel points on the frame image to obtain an image FrtarTo the image Frtar+1Forward optical flow of (2).
Likewise, Fr is calculatedtar+1Frame to FrtarThe backward optical flow of the frame.
(2) Occlusion point detection
Recording image FrtarFrame to image Frtar+1The frame forward optical flow is { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1Frame to image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}. Calculating | l (u) for pixel (x, y)f(x),vf(v))-(-ub(x+uf(x)),-vb(y+vf(y))) | |, if the value is not 0, the pixel point (x, y) is considered as a shielding point.
(3) Reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
In the formula (19), the compound represented by the formula (I),denotes FrtarFrame pixel (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;denotes FrtarFrame pixel (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbMinimum (u, v) even imageElement (x)0,y0) The optical flow vector of (a).
And (4) adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point.
S2.1.2 superpixel temporal context and its feature representation
Method for calculating super pixel segmentation map by using S1.1 to FrtarFrame image Frtar-1Frame image and Frtar+1The frame image is subjected to superpixel segmentation.
(1) Superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarFrame superpixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }As shown in equation (20):
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1To obtain the region Segtar={(x′,y′)|x′=x+uf(x),y=y+uf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1The corresponding area of (a). Calculating Seg'tarAnd Frtar+1Ith super pixel in frameThe cross-over ratio IOU is as shown in equation (21):
in the formula (21), Num (·) indicates that the region includes the number of pixels. If it isτ is according to Frtar+1To FrtarInverse optical flow computing superpixelsIn FrtarCorresponding region Seg 'of frame'tarRegion Seg 'is calculated from equation (21)'tarAnd super pixel SegtarCrow to IOU (Seg'tar,Segtar). If IOU (Segtar,Segtar) τ is thenCalled super-pixel SegtarIn Frtar+1Corresponding super pixel, super pixel SegtarIn Frtar+1May be 0, 1 or more. In the method, the intersection ratio determination threshold τ is set to 0.3. In the same way, find the super pixel SegtarIn Frtar-1Corresponding superpixels, superpixels Seg, of a frametarIn Frtar-1Is 0, 1 or more.
Super pixel SegtarTime context memory ofWhereinAndare each FrtarFrame superpixel SegtarIn Frtar-1Frame and Frtar+1A corresponding set of superpixels for the frame.
(2) Superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature ofAs shown in formula (22):
is FrtarSuper pixel Seg in frametarIs characterized in that it is a mixture of two or more of the above-mentioned components,is Frtar-1All corresponding superpixels in a frameThe mean value of the features is determined by the average,is Frtar+1All corresponding superpixels in a frameThe mean of the features, the features of each superpixel, is calculated according to the method of section 1.3 of equation.
FrtarSuperpixel Seg in frametarIn Frtar+1Frame or Frtar-1Using its own characteristics when the number of corresponding superpixels of a frame is 0SubstitutionOr
S2.2 superpixel spatial context
Carrying out superpixel segmentation on the image by using the method of S1.1, and obtaining a superpixel segmentation graph of the highest level when the threshold value of a superpixel hierarchical segmentation tree is set to be 1, namely a root node of the hierarchical segmentation tree, wherein the node represents the whole image as a superpixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary determination criterion ratio is increased, so that pixel points with original boundary probability values of [0.06,0.08] are determined as non-boundary points, and the points are determined as boundary points when the threshold is 0.06. A super pixel of a high level will include a super pixel of a low level. In the method, a spatial context of a parent node superpixel is defined as a child node superpixel in a hierarchical partition tree.
S3 semantic Classification
S3.1 temporal context based superpixel semantic classification
The method inputs the temporal context characteristics of the superpixels, utilizes GBDT (gradient lifting decision tree) to carry out semantic classification on the superpixels, and outputs prediction labels of the superpixels.
In the GBDT training process, a training MR wheel is set, MR belongs to {1, 2, 3., MR }, and the MR wheel trains a regression tree, namely a weak classifier, for each class, namely L regression trees are trained when L classes exist, and j belongs to {1, 2, 3., L }. Finally, L × MR weak classifiers can be obtained. The training method is the same for each classifier in each round.
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L}。
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vectorlabi[k]E {0, 1}, if the true label of the ith training sample is j, the l-th component lab of the label vectori[l]The other component value is 0, 1. Calculate the probability that the ith sample belongs to class lI(labiJ) is an indicator function whose value is 1 when the label of sample i is j, and 0 otherwise.
The prediction result of the ith sample applied to the jth classifier of the mr-1 th round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier isAs defined in formula (23):
When the first classifier of the mr th round is constructed, traversing the training sample data set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the data set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Wherein,NRegionmindicates that falls into RegionmTotal number of samples. The feature value that minimizes the regression tree error is finally selected as the new classification value of the tree. The regression tree is repeatedly constructed until the set height of the tree is reached, and the height of the regression tree is set to be 5 in the method. The regression trees of other categories in the current round are constructed in the same way.
The number of the regression tree leaf nodes of the jth class in the mr-th round is recorded as Regmr,lEach node is a subset of the training sample set, and any two leaf nodes intersect to form an empty set. Calculating the gain value of each leaf node of the constructed l-type regression tree in the mr roundAs shown in formula (26):
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Wherein Reg is in the {1, 2mr,l}
Until the end of training the MR round. Predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
wherein Reg is in the {1, 2MR,l}。
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the predicted prediction result of the i sample, and obtaining the formula (29):
and by analogy, substituting the predicted prediction result of the ith sample into the regression tree from the I-th class of the MR-1 to the I-th class of the 0 th round to obtain the formula (30)
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegSegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(FeaSeg). The class l with the highest probability value is the prediction class of the super pixel Seg.
S3.2 optimizing semantic classification based on spatial context
When the method is used for carrying out superpixel segmentation on an image, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained.
In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result.
Firstly, according to the method of S3.1, semantic classification is carried out on each block of superpixels corresponding to the leaf nodes and the intermediate nodes, the semantic labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08 is obtained, and the final semantic label of the superpixel block is calculated through the formula (31).
Wherein l represents the final semantic label of the super-pixel block which is the category of the maximum probability value calculated by the formula (31),representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,is the threshold 0.08 probability of a superpixel semantic label being l. Naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence level of the super-pixel semantic annotation is 0.06 of the threshold value, and the value of the method is 0.4; w is atar getThe confidence level of the super-pixel semantic annotation is 0.08, and the value of the method is 0.6.
Drawings
FIG. 1 is a flow chart of an RGBD indoor scene recognition method based on space-time context.
FIG. 2 is a diagram of a superpixel partition hierarchical tree.
FIG. 3 is a schematic diagram of spatial context based optimization.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
As shown in FIGS. 1-3, an RGB-D indoor scene labeling method based on super-pixel space-time context inputs an image Fr to be labeledtarAnd its time-sequential adjacent frames Frtar-1、Frtar+1Output is FrtarPixel level labeling.
Computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1Corresponding super image inA pixel, the corresponding superpixel being its temporal context; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context at the sub-node of the partition tree.
Structure FrtarPerforming feature representation of each super pixel based on time context, and classifying by using the feature of the super pixel based on time context by adopting a Gradient Boost Decision Tree (GBDT); obtaining Fr by weighting and combining the semantic classification results of the superpixels and the space contexts thereof by utilizing the superpixel space contexttarAnd (5) semantic annotation of the middle super pixel.
S1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is known as superpixel segmentation. Superpixels are usually small regions composed of a series of pixel points with adjacent positions and similar characteristics such as color, brightness, texture and the like, and the small regions retain local effective information and generally do not destroy boundary information of objects in an image.
S1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of imageThe gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)In the formula (1), the reaction mixture is,is a probability value calculated based on the color image that a pixel belongs to the boundary,is a probability value of a pixel belonging to a boundary calculated based on the depth image.
Probability value obtained according to formula (1)And setting different probability threshold values tr to obtain a multi-level segmentation result.
The probability threshold tr set in the method is 0.06 and 0.08, and the pixels with the probability values smaller than the set threshold are connected into a region according to the eight-connection principle, wherein each region is a super pixel.
S1.2Patch feature
Patch is defined as an m × m-sized grid, and slides from the upper left corner of the color image and the depth image downward to the right in steps of n pixels, eventually forming a dense grid on the color image and the depth image. In the method, the size of the Patch is set to be 16 multiplied by 16 in an experiment, the value of the sliding step length N when the Patch is selected to be 2, an image with the size of N multiplied by M is taken as an example, and the number of the Patch finally obtained isFour types of features are calculated for each Patch: depth gradient features, color features, texture features.
S1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;andrespectively representing the depth gradient direction and the gradient magnitude of the pixel z;andthe depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a depth gradient gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the depth gradient feature is transformed by using an EMK (efficient Match Kernel) algorithm, and the transformed feature vector is still marked as Fg_d。
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;andrespectively representing the gradient direction and the gradient magnitude of the pixel z;andcolor gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a color gradient gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c。
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (4):
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;andcolor basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a color gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol。
S1.2.4 Texture feature (Texture)
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the Local Binary Pattern feature (LBP) of pixel z;andrespectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),representing the kronecker product.Andrespectively a local binary pattern gaussian kernel function and a position gaussian kernel function,andare parameters corresponding to a gaussian kernel function. Finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex。
S1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
in the formula (7), Fg_d(p),Fg_c(p),Fcol(p),Ftex(p) represents the feature of Patch whose p-th center position falls within the super pixel seg, and n representsThe number of Patch whose core position falls within the super pixel seg.
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAs defined in formula (9):
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg.
Area to perimeter ratio R of super pixelsegAs defined in formula (10):
is based on the x-coordinate s of the pixel sxY coordinate syAnd a second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moment calculated by multiplying the x coordinate by the y coordinate, respectively, as defined in equations (11), (12) and (13)
In formula (14)Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
width and Height respectively represent the Width and Height of the image, i.e.The calculation is based on the normalized pixel coordinate values.
DvarRespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel to which the principal normal of the point cloud correspondsThe vectors are estimated by Principal Component Analysis (PCA).
S2 superpixel context
The method respectively constructs a time context and a space context based on an RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation.
S2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
In the method, the optical flow obtained by calculating from a target frame to a reference frame is defined as a forward optical flow, and the optical flow obtained by calculating from the reference frame to the target frame is defined as a backward optical flow.
(2) Initial optical flow estimation
The SimpleFlow method is adopted for the interframe initial optical flow estimation. For two frame images FrtarAnd Frtar+1(x, y) represents FrtarThe middle pixel point, (u (x, y), v (x, y)) represents the optical flow vector at (x, y). Defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}. In the following process, u (x, y) and v (x, y) are abbreviated as u and v, respectively, and FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The corresponding pixel point in (x + u, y + v).
First, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarFrame pixel (x)0,y0) Taking a window of a size a x a centered on it
In this process, where a is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel points in the frame are (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarColor information of the middle pixel (p, q), Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain a2Vector e of dimensions.
Then, based on the local smooth likelihood model, the optical flow vector is optimized by combining the color feature and the local distance feature as shown in formula (18):
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarPixel point in frame (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn the frame (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;in the method, O is set to be 20, and the change range of the optical flow vector (u, v) is represented; distance weight WdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance differences and color differences,setting a color parameter σc0.08 (empirical value), distance parameter σd5.5 (empirical value). The (u, v) that minimizes the E energy is the pixel (x)0,y0) For Fr, the optical flow vector estimation result oftarCalculating optical flow vectors of all pixel points on the frame image to obtain an image FrtarTo the image Frtar+1Forward optical flow of (2).
Also, Fr is calculated according to the method described abovetar+1Frame to FrtarThe backward optical flow of the frame.
(2) Occlusion point detection
Recording image FrtarFrame to image Frtar+1The frame forward optical flow is { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1Frame to image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}. Calculating | l (u) for pixel (x, y)f(x),vf(v))-(-ub(x+uf(x)),-vb(y+vf(y))) | |, if the value is not 0, the pixel point (x, y) is considered as a shielding point.
(3) Reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
In the formula (19), the compound represented by the formula (I),denotes FrtarFrame pixel (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;denotes FrtarFrame pixel (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbMinimum (u, v) even pixel (x)0,y0) The optical flow vector of (a).
And (4) adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point.
S2.1.2 superpixel temporal context and its feature representation
Method for calculating super pixel segmentation map by using S1.1 to FrtarFrame image Frar-1Frame image and Frtar+1The frame image is subjected to superpixel segmentation.
(1) Superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarFrame superpixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }As shown in equation (20):
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1To obtain the corresponding pixel ofTo region Segtar={(x′,y′)|x′=x+uf(x),y′=y+uf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1The corresponding area of (a). Calculating Seg'tarAnd Frtar+1Ith super pixel in frameThe cross-over ratio IOU is as shown in equation (21):
in the formula (21), Num (·) indicates that the region includes the number of pixels. If it isτ is according to Frtar+1To FrtarInverse optical flow computing superpixelsIn FrtarCorresponding region Seg 'of frame'tarThe region Seg ″, is calculated according to equation (21)tarAnd super pixel SegtarCrow to IOU (Seg'tar,Segtar). If IOU (Segtar,Segtar) τ is thenCalled super-pixel SegtarIn Frtar+1Corresponding super pixel (super pixel Seg)tarIn Frtar+1May be 0, 1 or more). In the present method, the intersection ratio determination threshold τ is set to 0.3 (empirical value). In the same way, find the super pixel SegtarIn Frtar-1Corresponding superpixel (superpixel Seg) of frametarIn Frtar-1May be 0, 1 or more).
Super pixel SegtarTime context memory ofWhereinAndare each FrtarFrame superpixel SegtarIn Frtar-1Frame and Frtar+1A corresponding set of superpixels for the frame.
(2) Superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature ofAs shown in formula (22):
is FrtarSuper pixel Seg in frametarIs characterized in that it is a mixture of two or more of the above-mentioned components,is Frtar-1All corresponding superpixels in a frameThe mean value of the features is determined by the average,is Frtar+1All corresponding superpixels in a frameThe mean of the features, the features of each superpixel, is calculated according to the method of section 1.3 of equation.
FrtarSuperpixel Seg in frametarIn Frtar+1Frame or Frtar-1Using its own characteristics when the number of corresponding superpixels of a frame is 0SubstitutionOr
S2.2 superpixel spatial context
The image is super-pixel segmented by the method of section S1.1, and fig. 2 shows a super-pixel hierarchical segmentation tree obtained according to a plurality of boundary judgment thresholds. When the threshold value of the super pixel hierarchical segmentation tree is set to be 1, a super pixel segmentation graph of the highest level, namely a root node of the hierarchical segmentation tree, can be obtained, and the node represents the whole image as a super pixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary determination criterion ratio is increased, so that pixel points with original boundary probability values of [0.06,0.08] are determined as non-boundary points, and the points are determined as boundary points when the threshold is 0.06. It can be seen that the super-pixels of the high level include the super-pixels of the low level. In the method, a spatial context of a parent node superpixel is defined as a child node superpixel in a hierarchical partition tree.
S3 semantic Classification
S3.1 temporal context based superpixel semantic classification
The method inputs the temporal context characteristics of the superpixels, utilizes GBDT (gradient lifting decision tree) to carry out semantic classification on the superpixels, and outputs prediction labels of the superpixels.
In the GBDT training process, a training MR wheel is set, MR belongs to {1, 2, 3., MR }, and the MR wheel trains a regression tree (weak classifier) for each class, namely L regression trees are trained when L classes exist, and L belongs to {1, 2, 3., L }. Finally, L × MR weak classifiers can be obtained. The training method is the same for each classifier in each round.
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L}。
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vectorlabi[k]E {0, 1}, if the true label of the ith training sample is l, the l-th dimension component lab of the label vectori[l]The other component value is 0, 1. Calculate the probability that the ith sample belongs to class lI(labiL) is an indicator function, which has a value of 1 when the label of sample i is l, and 0 otherwise.
The prediction result of the ith sample and the first classifier of the mr-1 round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier isAs defined in formula (23):
When the first classifier of the mr th round is constructed, traversing the training sample data set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the data set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Wherein,NRegionmindicates that falls into RegionmTotal number of samples. The feature value that minimizes the regression tree error is finally selected as the new classification value of the tree. The above process is repeated to construct a regression tree until the set height of the tree is reached, and the height of the regression tree is set to be 5 in the method. The regression trees of other categories in the current round are constructed in the same way.
The number of the regression tree leaf nodes of the l class in the mr th round is recorded as Regmr,lEach node is a subset of the training sample set, and any two leaf nodes intersect to form an empty set. Calculating the gain value of each leaf node of the constructed l-type regression tree in the mr roundAs shown in formula (26):
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Wherein Reg is in the {1, 2mr,l}
The calculation is carried out by the above flow until the training MR wheel is finished. Predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
wherein Reg is in the {1, 2MR,l}。
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the predicted prediction result of the i sample, and obtaining the formula (29):
and by analogy, substituting the predicted prediction result of the ith sample into the regression tree from the I-th class of the MR-1 to the I-th class of the 0 th round to obtain the formula (30)
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegsegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(Feaseg). The class l with the highest probability value is the prediction class of the super pixel Seg.
S3.2 optimizing semantic classification based on spatial context
When the method is used for carrying out superpixel segmentation on an image, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained, as shown in fig. 3.
In the method, the semantic annotation of the superpixel determined by the 0.08 threshold is taken as an optimization target, and the superpixel determined by the 0.06 segmentation threshold is taken as a spatial context and is used for optimizing a semantic annotation result.
Firstly, according to the method of S3.1, semantically classifying each block of superpixels corresponding to the leaf nodes and the intermediate nodes in the graph 3 to obtain the semanteme labeling probability of each superpixel in the superpixel segmentation graph under the threshold values of 0.06 and 0.08, and calculating the final semantic label of the superpixel block through a formula (31).
Wherein l represents the final semantic label of the super-pixel block which is the category of the maximum probability value calculated by the formula (31),representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,is the threshold 0.08 probability of a superpixel semantic label being l. Naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence level of the super-pixel semantic annotation is 0.06 of the threshold value, and the value of the method is 0.4; wtargetThe confidence level of the super-pixel semantic annotation is 0.08, and the value of the method is 0.6.
Table 1 class 13 semantic experiments on NYUV2 dataset this method is compared with the class mean accuracy of other RGB-D indoor scene labeling methods based on defined features.
TABLE 1
[1]C.Coupire,C.Farabet,L.Najman and Y.LeCun..Indoor scene segmentation using depth information.In ICLR,2013.
[2]A.Hermans,G.Floros,and B.Leibe.Dense 3d semantic mapping of indoor scenes fron rgb-d images.In ICRA,2014.
[3]A.Wang,J.Lu,J.Cai,G.Wang,and T.-J.Cham.Unsupervised joint feature 1eaming and encoding for rgb-d scene labeling(TIP),2015.
[4]J.Wang,Z.Wang,D.Tao,S.See and G.Wang.Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks.In ECCV,2016.
Claims (2)
1. An RGB-D indoor scene labeling method based on super-pixel space-time context is characterized in that: inputting the image Fr to be annotatedtarAnd the front and rear adjacent frames Fr in the time sequence thereoftar-1、Frtar+1Output is FrtarPixel level labeling of (1);
computing image Fr to be annotated based on optical flow algorithmtarWhere each super pixel is at FrtarChronologically adjacent frames Frtar-1And Frtar+1The corresponding superpixel in (1), namely the time context of the corresponding superpixel; the image is superpixel segmented using gPb/UCM algorithm, and the segmentation results are organized into a segmentation tree, Fr, according to a thresholdtarIs its spatial context, the sub-node of each superpixel in the partition tree;
structure FrtarPerforming feature representation of each super pixel based on time context, and classifying the super pixels based on the time context features by adopting a gradient lifting tree; fr is obtained by utilizing the super-pixel space context weighted combination and the semantic classification result of the space contexttarSemantic annotation of the super-middle pixel;
s1 super pixel
In the field of computer vision, the process of subdividing a digital image into a plurality of image sub-regions is called superpixel segmentation; the super-pixel is a region formed by a series of pixel points with adjacent positions and similar color, brightness and texture characteristics, the region retains local effective information and cannot damage the boundary information of an object in an image;
s1.1 superpixel segmentation of images
Super-pixel segmentation uses gPb/UCM algorithm to calculate probability value of pixel belonging to boundary through local and global features of imageThe gPb/UCM algorithm is applied to the color image and the depth image respectively, and the calculation is carried out according to the formula (1)In the formula (1), the reaction mixture is,is a probability value calculated based on the color image that a pixel belongs to the boundary,the probability value of the pixel belonging to the boundary is calculated based on the depth image;
probability value obtained according to formula (1)Setting different probability threshold values tr to obtain a multi-level segmentation result;
the set different probability threshold values tr are respectively 0.06 and 0.08, and the pixels with the probability values smaller than the set probability threshold values are connected into a region according to an eight-connection principle, wherein each region is a super pixel;
s1.2Patch feature
Patch is defined as a grid with the size of h multiplied by h, and slides downwards from the upper left corner of the color image and the depth image to the right by taking hs pixels as step length, and finally dense grids are formed on the color image and the depth image; wherein, the size of the Patch is 16 × 16, the sliding step hs is 2 when the Patch is selected, the size is N × M, and the final number of the patches isFour types of features are calculated for each Patch: depth gradient features, color features, texture features;
s1.3 superpixel features
Super pixel feature FsegIs defined as formula (6):
respectively representing a super-pixel depth gradient feature, a color feature and a texture feature, and defined as formula (7):
in the formula (7), Fg_d(q1),Fg_c(q1),Fcol(q1),Ftex(q1) represents the feature of the Patch whose center position falls within the superpixel seg at the q1 th position, and n represents the number of patches whose center positions fall within the superpixel seg;
the components in equation (8) are defined as follows:
super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegAccording to BsegObtained, defined as formula (9):
in formula (9), M, N represents the horizontal and vertical resolutions of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegAs defined in formula (10):
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order Hu moment calculated by the product of the x coordinate and the y coordinate is defined as formulas (11), (12) and (13)
In formula (14)Respectively representing the mean value of x coordinates, the mean value of y coordinates, the square of the mean value of x coordinates and the square of the mean value of y coordinates of the pixels contained in the super pixels, and defining the following formula (14):
width and Heiqht respectively represent the image Width and height, i.e.Performing a calculation based on the normalized pixel coordinate values;
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdMean of squares, variance of depth values, defined as (15):
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as (16):
Nsegis the principal normal vector mode length of the point cloud corresponding to the superpixel, wherein the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA);
s2 superpixel context
Respectively constructing a time context and a space context based on the RGB-D image sequence time sequence relation and a tree structure of super-pixel segmentation;
s2.1 superpixel temporal context
S2.1.1 interframe optical flow calculation
Defining the optical flow obtained by calculating from the target frame to the reference frame as a forward optical flow, and defining the optical flow obtained by calculating from the reference frame to the target frame as a reverse optical flow;
(1) initial optical flow estimation
The interframe initial optical flow estimation adopts a SimpleFlow method; for two frame images FrtarAnd Frtar+1(x, y) represents FrtarA middle pixel point, (u (x, y), v (x, y)) represents an optical flow vector at (x, y); defining an image FrtarAs target frame, image Frtar+1Is a reference frame, then image FrtarTo the image Frtar+1The forward optical flow of (A) is FrtarThe set of optical flow vectors of all the pixel points in (i.e., { (u (x, y), v (x, y)) | (x, y) ∈ Fr)tar}; when u (x, y) and v (x, y) are respectively abbreviated as u and v, FrtarMiddle pixel (x, y) is calculated from the optical flow at Frtar+1The middle corresponding pixel point is (x + u, y + v);
first, the image Fr is calculatedtarTo the image Frtar+1Forward optical flow of (f), for FrtarPixel point (x)0,y0) Taking a window of size b × b centered thereon
Wherein, b is 10, W1At an arbitrary point (p, q) in Frtar+1The corresponding pixel point in (1) is (p + u, q + v), and the window W is aligned1Calculating the energy term e at all points in the equation, as in (17)
e(p,q,u,v)=||Inttar(p,q)-Inttar+1(p+u,q+v)||2 (17)
Wherein (p, q) ∈ W1,Inttar(p, q) represents FrtarMiddle pixel point (p)Q) pixel point color information, Inttar+1(p + u, q + v) represents Frtar+1The color information of the pixel points of the middle pixel point (p + u, q + v) is calculated for each pair of points in the window in sequence to obtain b2A vector e of dimensions;
then, the optical flow vector is optimized by combining the color feature and the local distance feature based on the local smooth likelihood model, as shown in formula (18):
e (x) in the formula (18)0,y0U, v) is the local region energy, representing the image FrtarMiddle pixel (x)0,y0) The energy of the forward optical flow vector (u, v) is FrtarIn (x)0,y0) Window W as center1Weighted accumulation of energy items e of all internal pixel points;where O ═ 20 denotes the range of change of the optical flow vector (u, v); distance weight wdAnd a color weight wcBy pixel point (x)0,y0) Corresponding point (x) calculated from the optical flow (u, v)0+u,y0+ v) distance difference and color difference determination, setting the color parameter σc0.08, distance parameter σd5.5; make E energyThe smallest (u, v) is the pixel (x)0,y0) As a result of estimating the optical flow vector of (1), for the image FrtarCalculating optical flow vectors of all the upper pixel points to obtain an image FrtarTo the image Frtar+1Forward optical flow of (a);
likewise, Fr is calculatedtar+1To FrtarThe backward light flow of (2);
(2) occlusion point detection
Recording image FrtarTo the image Frtar+1Forward optical flow of { (u)f(x),vf(y))|(x,y)∈Frtar}, and an image Frtar+1To the image FrtarThe inverse optical flow of (a) results in { (u)b(x′),vb(y′))|(x′,y′)∈Frtar+1}; calculating | l (u) for pixel (x, y)f(x),vf(y))-(-ub(x+uf(x)),-vb(y+vf(y))) |, if that value (| | (u)) |f(x),vf(y))-(-ub(x+uf(x)),-vb(y+vf(y))) | |) is not 0, the pixel point (x, y) is considered as a shielding point;
(3) reestimation of occlusion point light flow
For pixels marked as occlusion points (x)0,y0) The optical flow energy is re-estimated using equation (19), denoted as Eb(x0,y0,u,v):
In the formula (19), the compound represented by the formula (I),denotes FrtarPixel point (x)0,y0) The average value of energy items e corresponding to different optical flow estimated values;denotes FrtarPixel point (x)0,y0) The minimum value of the corresponding energy term e is measured by the different optical flow estimation values; w is ar(x0,y0) The difference between the energy term e mean value and the minimum energy term e value is used for marking the pixel point (x) marked as shielding0,y0) Let EbThe smallest (u, v) is the pixel (x)0,y0) An optical flow vector of (d);
adopting the optical flow vector re-estimated in the step (3) for the final optical flow vector of the pixel marked as the occlusion point;
s2.1.2 superpixel temporal context and its feature representation
Image Fr by using the method of super-pixel segmentation map calculated by S1.1tarImage Frtar-1And an image Frtar+1Performing super-pixel segmentation;
(1) superpixel temporal context
First according to FrtarTo Frtar+1Forward optical flow calculation FrtarSuper pixel SegtarAll contained pixel points { (x, y) | (x, y) ∈ SegtarForward optical flow of { (u)f(x),vf(y))|(x,y)∈SegtarMean value of }As shown in equation (20):
in formula (20), Num (Seg)tar) Representing a super-pixel SegtarCalculating the number of contained pixel points, and calculating the superpixel Seg according to the forward optical flow mean valuetarContaining pixel points in Frtar+1Obtaining a region Seg 'from the corresponding pixel of (1)'tar={(x′,y′)|x′=x+uf(x),y′=y+vf(y),(x,y)∈Segtar,(x′,y′)∈Frtar+1Is called super pixel SegtarIn Frtar+1A corresponding region of (1); calculating Seg'tarAnd Frtar+1Middle ith super pixelThe cross-over ratio IOU is as shown in equation (21):
in the formula (21), Num (·) indicates that the region contains the number of pixels; if it isThen according to Frtar+1To FrtarInverse optical flow computing superpixelsIn FrtarCorresponding region of (1)' SegtarThe region Seg ″, is calculated according to equation (21)tarAnd super pixel SegtarCross-over ratio of (IOU) (Segtar,Segtar) (ii) a If IOU (Segtar,Segtar) Is greater than or equal to tau, thenCalled super-pixel SegtarIn Frtar+1Corresponding super pixel, super pixel SegtarIn Frtar+1Is 0, 1 or more; setting an intersection ratio judgment threshold value tau to be 0.3; finding a superpixel SegtarIn Frtar-1Corresponding super pixel, super pixel SegtarIn Frtar-1Is 0, 1 or more;
super pixel SegtarTime context memory ofWhereinAndare each FrtarFrame superpixel SegtarIn FFtar-1And Frtar+1A corresponding set of superpixels;
(2) superpixel temporal context semantic feature representation
Superpixel temporal context SegstarIs characterized by a semantic feature ofAs shown in formula (22):
is FrtarSuper pixel SegtarIs characterized in that it is a mixture of two or more of the above-mentioned components,is Frtar-1All corresponding super pixels inThe mean value of the features is determined by the average,is Frtar+1All corresponding super pixels inCharacterised byMean value, the characteristic of each superpixel is calculated according to the method of S1.3;
Frtarsuper pixel Seg in (1)tarIn Frtar+1Or Frtar-1When the number of corresponding super pixels of (1) is 0, its own characteristics are usedSubstitutionOr
S2.2 superpixel spatial context
Carrying out superpixel segmentation on the image by using the method of S1.1, and obtaining a superpixel segmentation graph of the highest level when the threshold value of a superpixel hierarchical segmentation tree is set to be 1, namely a root node of the hierarchical segmentation tree, wherein the node represents the whole image as a superpixel; setting the threshold value to be 0.06 to obtain a lower-level super pixel segmentation result; when the threshold is 0.08, the boundary judgment standard ratio is improved, so that pixel points with the original boundary probability value of [0.06,0.08] are judged as non-boundary points, and the points are judged as boundary points when the threshold is 0.06; a super pixel of a high level will include a super pixel of a low level therein; defining a spatial context of a super pixel of a child node as a super pixel of a father node in a hierarchical partition tree;
s3 semantic Classification
S3.1 temporal context based superpixel semantic classification
Taking the temporal context characteristics of the superpixels as input, performing superpixel semantic classification by using GBDT, and outputting a prediction label of the superpixels;
in the GBDT training process, setting a training MR wheel, wherein MR belongs to {1, 2, 3., MR }, and training a regression tree, namely a weak classifier, for each class by the MR wheel, namely training L regression trees when L classes exist, and L belongs to {1, 2, 3., L }; finally, L multiplied by MR weak classifiers can be obtained; the training method for each classifier is the same in each round;
(1) GBDT multi-classifier training
Training set FeatrComprising NSegtrOne sample:
wherein the training sample FeaiIs the temporal context feature of the ith superpixel, whose true label is labi,labi∈{1,2,3,...,L};
First, the 0 th round of initialization is performed, and the prediction function value h of the class I classifier is setl,0(x) Is 0; will really label labiConversion to L-dimensional tag vectorlabi[k]E {0, 1}, if the true label of the ith training sample is l, the l-th dimension component lab of the label vectori[l]1, the other component value is 0; calculate the probability that the ith sample belongs to class lI(labiL) is an indicator function, the value of which is 1 when the label of sample i is l, otherwise the value is 0;
the prediction result of the ith sample and the first classifier of the mr-1 round is recorded as hl,(mr-1)(Feai) The classification error of the ith sample by the mr-1 th classifier isAs defined in formula (23):
When the first classifier of the mr th round is constructed, traversing the training set FeatrTaking the feature value of the par dimension of the ith sample as a classification reference value for each feature dimension of each sample in the training set FeatrAll samples are classified, and the samples with the characteristic values larger than the reference value belong to a set { Region1Else belong to the set { Region }2And f, calculating the error of the regression tree according to the formula (25) after all samples are classified
Wherein,m=1,2,NRegionmindicates that falls into RegionmTotal number of samples of (a); finally, selecting the characteristic value which enables the error of the regression tree to be minimum as a new classification value of the tree; repeatedly constructing a regression tree until the set height of the tree is reached, wherein the height of the regression tree is 5; constructing regression trees of other categories in the current round by the same method;
the number of the regression tree leaf nodes of the l class in the mr th round is recorded as Regmr,lEach node is a subset of the training sample set, and the intersection of any two leaf nodes is an empty set; calculating the gain value of each leaf node of the constructed l-type regression tree in the mr roundAs shown in formula (26):
calculating the predicted value h of the regression tree of the l class of the mr round to the ith sample by using the formula (27)l,mr(Feai):
Wherein Reg is in the {1, 2mr,l}
Until the training of the MR wheel is finished; predicted value h of regression tree of the ith category of the first MR round on the ith samplel,MR(Feai) The expression is as (28):
wherein Reg is in the {1, 2MR,l};
And (3) substituting the formula (28) into the regression tree of the l class of the MR-2 round to obtain the prediction result of the i sample, and obtaining the formula (29):
and by analogy, substituting the predicted result of the ith sample into the regression tree from the I type of the MR-1 round to the I type of the 0 round to obtain the formula (30)
(2) GBDT prediction
Calculating the temporal context feature Fea of the superpixel SegSegThe predicted values h for the superpixel Seg belonging to the different classes are calculated using equation (30)l,MR(FeaSeg) Then, the probability value prob of the super-pixel Seg belonging to the different classes is calculated by the formula (24)l,MR(FeaSeg) (ii) a The class l with the highest probability value is the prediction class of the super pixel Seg;
s3.2 optimizing semantic classification based on spatial context
When the image is subjected to superpixel segmentation, two boundary judgment thresholds of 0.06 and 0.08 are set, so that a hierarchical segmentation tree with the height of 2 is obtained;
the semantic annotation of the superpixel determined by the threshold of 0.08 is taken as an optimization target, and the superpixel determined by the threshold of 0.06 segmentation is taken as a spatial context and is used for optimizing a semantic annotation result;
firstly, according to the method of S3.1, performing semantic classification on each super pixel corresponding to the leaf nodes and the intermediate nodes to obtain the semantic labeling probability of each super pixel in the super pixel segmentation graph under the threshold values of 0.06 and 0.08, and calculating the final semantic label of the super pixel block through the formula (31);
wherein l*Means that the final semantic label of the super-pixel block which is the category with the maximum probability value is calculated by equation (31),representing the probability that a threshold 0.08 super-pixel contains the a-th super-pixel semantic label in a threshold 0.06 super-pixel set of i,a probability of a superpixel semantic label being l, which is a threshold of 0.08; naux represents the number of 0.06 threshold superpixels contained by the 0.08 threshold superpixel; w is aauxThe confidence of the super-pixel semantic annotation with a threshold value of 0.06 is taken as 0.4; w is atargetThe confidence of the super-pixel semantic annotation with the threshold value of 0.08 is 0.6.
2. The RGB-D indoor scene labeling method based on superpixel spatiotemporal context as claimed in claim 1, characterized in that: the implementation of the S1.2Patch feature is as follows,
s1.2.1 depth gradient feature
Patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (2):
in the formula (2), Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;andrespectively representing the depth gradient direction and the gradient magnitude of the pixel z;andthe depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by kernel principal component analysis is applied,represents the kronecker product;andrespectively a depth gradient gaussian kernel function and a position gaussian kernel function,andparameters corresponding to the gaussian kernel function; finally, the EMK algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d;
S1.2.2 color gradient feature
Patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (3):
in the formula (3), Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;andrespectively representing the gradient direction and the gradient magnitude of the pixel z;andcolor gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,represents the kronecker product;andrespectively a color gradient gaussian kernel function and a position gaussian kernel function,andparameters corresponding to the gaussian kernel function; finally, the color gradient features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fg_c;
S1.2.3 color characteristics
Patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (4):
in the formula (4), Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;andcolor basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,represents the kronecker product;andrespectively a color gaussian kernel function and a position gaussian kernel function,andparameters corresponding to the gaussian kernel function; finally, the color features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Fcol;
S1.2.4 textural features
Firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (5):
in the formula (5), Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; lbp (z) is a local binary pattern feature of pixel z;andrespectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by kernel principal component analysis is applied,represents the kronecker product;andlocal binary pattern Gaussian kernel function and position Gaussian kernel functionThe number of the first and second groups is,andparameters corresponding to the gaussian kernel function; finally, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910174110.2A CN109829449B (en) | 2019-03-08 | 2019-03-08 | RGB-D indoor scene labeling method based on super-pixel space-time context |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910174110.2A CN109829449B (en) | 2019-03-08 | 2019-03-08 | RGB-D indoor scene labeling method based on super-pixel space-time context |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109829449A CN109829449A (en) | 2019-05-31 |
CN109829449B true CN109829449B (en) | 2021-09-14 |
Family
ID=66865700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910174110.2A Active CN109829449B (en) | 2019-03-08 | 2019-03-08 | RGB-D indoor scene labeling method based on super-pixel space-time context |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109829449B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428504B (en) * | 2019-07-12 | 2023-06-27 | 北京旷视科技有限公司 | Text image synthesis method, apparatus, computer device and storage medium |
CN110517270B (en) * | 2019-07-16 | 2022-04-12 | 北京工业大学 | Indoor scene semantic segmentation method based on super-pixel depth network |
CN110599517A (en) * | 2019-08-30 | 2019-12-20 | 广东工业大学 | Target feature description method based on local feature and global HSV feature combination |
CN110751153B (en) * | 2019-09-19 | 2023-08-01 | 北京工业大学 | Semantic annotation method for indoor scene RGB-D image |
CN111104984B (en) * | 2019-12-23 | 2023-07-25 | 东软集团股份有限公司 | Method, device and equipment for classifying CT (computed tomography) images |
CN111292341B (en) * | 2020-02-03 | 2023-01-03 | 北京海天瑞声科技股份有限公司 | Image annotation method, image annotation device and computer storage medium |
CN111611919B (en) * | 2020-05-20 | 2022-08-16 | 西安交通大学苏州研究院 | Road scene layout analysis method based on structured learning |
CN113034378B (en) * | 2020-12-30 | 2022-12-27 | 香港理工大学深圳研究院 | Method for distinguishing electric automobile from fuel automobile |
CN113570530B (en) * | 2021-06-10 | 2024-04-16 | 北京旷视科技有限公司 | Image fusion method, device, computer readable storage medium and electronic equipment |
CN115118948B (en) * | 2022-06-20 | 2024-04-05 | 北京华录新媒信息技术有限公司 | Repairing method and device for irregular shielding in panoramic video |
CN115952312B (en) * | 2022-12-02 | 2024-07-19 | 北京工业大学 | Automatic labeling and sorting method for image labels |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809187A (en) * | 2015-04-20 | 2015-07-29 | 南京邮电大学 | Indoor scene semantic annotation method based on RGB-D data |
CN107292253A (en) * | 2017-06-09 | 2017-10-24 | 西安交通大学 | A kind of visible detection method in road driving region |
CN107944428A (en) * | 2017-12-15 | 2018-04-20 | 北京工业大学 | A kind of indoor scene semanteme marking method based on super-pixel collection |
CN109389605A (en) * | 2018-09-30 | 2019-02-26 | 宁波工程学院 | Dividing method is cooperateed with based on prospect background estimation and the associated image of stepped zone |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644429B (en) * | 2017-09-30 | 2020-05-19 | 华中科技大学 | Video segmentation method based on strong target constraint video saliency |
-
2019
- 2019-03-08 CN CN201910174110.2A patent/CN109829449B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809187A (en) * | 2015-04-20 | 2015-07-29 | 南京邮电大学 | Indoor scene semantic annotation method based on RGB-D data |
CN107292253A (en) * | 2017-06-09 | 2017-10-24 | 西安交通大学 | A kind of visible detection method in road driving region |
CN107944428A (en) * | 2017-12-15 | 2018-04-20 | 北京工业大学 | A kind of indoor scene semanteme marking method based on super-pixel collection |
CN109389605A (en) * | 2018-09-30 | 2019-02-26 | 宁波工程学院 | Dividing method is cooperateed with based on prospect background estimation and the associated image of stepped zone |
Non-Patent Citations (4)
Title |
---|
GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE;Jerome H. Friedman;《The Annals of Statistics》;20011231;第29卷(第5期);第1189-1232页 * |
STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling;Yang He et al;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171231;第7158-7167页 * |
Supervoxel-based segmentation of 3D imagery with optical flow integration for spatiotemporal processing;Xiaohui Huang et al;《IPSJ Transactions on Computer Vision and Applications》;20180619;第1-16页 * |
融合时空多特征表示的无监督视频分割算法;李雪君 等;《计算机应用》;20171110;第31卷(第11期);第3134-3138、3151页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109829449A (en) | 2019-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109829449B (en) | RGB-D indoor scene labeling method based on super-pixel space-time context | |
CN104182772B (en) | A kind of gesture identification method based on deep learning | |
Cao et al. | Exploiting depth from single monocular images for object detection and semantic segmentation | |
Zhang et al. | Long-range terrain perception using convolutional neural networks | |
CN107273905B (en) | Target active contour tracking method combined with motion information | |
CN109859238B (en) | Online multi-target tracking method based on multi-feature optimal association | |
CN110096961B (en) | Indoor scene semantic annotation method at super-pixel level | |
CN105740915B (en) | A kind of collaboration dividing method merging perception information | |
CN113592894B (en) | Image segmentation method based on boundary box and co-occurrence feature prediction | |
CN107977660A (en) | Region of interest area detecting method based on background priori and foreground node | |
CN106157330B (en) | Visual tracking method based on target joint appearance model | |
CN107194929B (en) | Method for tracking region of interest of lung CT image | |
CN108038515A (en) | Unsupervised multi-target detection tracking and its storage device and camera device | |
CN112329784A (en) | Correlation filtering tracking method based on space-time perception and multimodal response | |
Basavaiah et al. | Robust Feature Extraction and Classification Based Automated Human Action Recognition System for Multiple Datasets. | |
Lin et al. | An interactive approach to pose-assisted and appearance-based segmentation of humans | |
Schulz et al. | Object-class segmentation using deep convolutional neural networks | |
CN108765384B (en) | Significance detection method for joint manifold sequencing and improved convex hull | |
CN106296740B (en) | A kind of target fine definition tracking based on low-rank sparse expression | |
Liu et al. | [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video | |
Hassan et al. | Salient object detection based on CNN fusion of two types of saliency models | |
Dadgostar et al. | Gesture-based human–machine interfaces: a novel approach for robust hand and face tracking | |
CN109389127A (en) | Structuring multiple view Hessian regularization sparse features selection method | |
Nourmohammadi-Khiarak et al. | Object detection utilizing modified auto encoder and convolutional neural networks | |
Maia et al. | Visual object tracking by an evolutionary self-organizing neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |