CN110096961A - A kind of indoor scene semanteme marking method of super-pixel rank - Google Patents

A kind of indoor scene semanteme marking method of super-pixel rank Download PDF

Info

Publication number
CN110096961A
CN110096961A CN201910269599.1A CN201910269599A CN110096961A CN 110096961 A CN110096961 A CN 110096961A CN 201910269599 A CN201910269599 A CN 201910269599A CN 110096961 A CN110096961 A CN 110096961A
Authority
CN
China
Prior art keywords
pixel
super
superpixel
seg
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910269599.1A
Other languages
Chinese (zh)
Other versions
CN110096961B (en
Inventor
王立春
陆建霖
王少帆
孔德慧
李敬华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910269599.1A priority Critical patent/CN110096961B/en
Publication of CN110096961A publication Critical patent/CN110096961A/en
Application granted granted Critical
Publication of CN110096961B publication Critical patent/CN110096961B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The indoor scene semanteme marking method for disclosing a kind of super-pixel rank can be avoided depth network application in Pixel-level indoor scene and mark the calculating huge problem of cost, and depth network can be made to receive super-pixel set as input.The indoor scene semanteme marking method of this super-pixel rank, comprising the following steps: (1) super-pixel segmentation is carried out to indoor scene color image using simple linear iteration cluster segmentation algorithm;(2) super-pixel for combining indoor scene depth image to obtain step (1) extracts super-pixel core and describes subcharacter (primary features);(3) neighborhood of super-pixel is constructed;(4) super-pixel depth network SuperPixelNet is constructed, the multi-modal feature of super-pixel is learnt;To super-pixel to be marked, in conjunction with the super-pixel and its multi-modal feature of neighborhood super-pixel, super-pixel grade semantic tagger is provided to indoor scene RGB-D image.

Description

Indoor scene semantic annotation method at super-pixel level
Technical Field
The invention relates to the technical field of multimedia technology and computer graphics, in particular to a super-pixel level indoor scene semantic annotation method.
Background
Indoor scene semantic annotation, as a necessary task in computer vision research, has been a research hotspot and difficulty in the field of image processing. Compared with an outdoor scene, the indoor scene has the following characteristics: 1. the variety of objects is complicated; 2. the shielding between objects is more serious; 3. the scene difference is large; 4. the illumination is not uniform; 5. the characteristics of strong discriminability are lacking. Therefore, compared with the outdoor scene, the indoor scene is more difficult to label. The semantic annotation of the indoor scene is the core content for understanding the indoor scene, and has a wide application in the fields of service, fire fighting and the like, such as the mobile positioning and environment interaction of robots, event detection in the field of security monitoring and the like.
Scene semantic labeling (or scene semantic segmentation) labels each pixel in an image with an object class label to which it belongs. Scene semantic annotation is a challenging task because it combines the traditional problems of detection, segmentation, and multi-label recognition in a single process. The RGB-D image is a color image and a depth image acquired synchronously, containing color and depth information of the image. A Depth Image (Depth Image), or distance Image (Range Image), is a special Image, and each pixel information contains the Depth of a corresponding point in the actual scene. Compared with an RGB image, the method is not easily influenced by illumination, shadow and the like, and can better express the original appearance of a scene, so that the method is widely applied to indoor scenes. Results from studies by Silberman and Fergus show that experimental results using RGB-D data are significantly better than results using RGB alone when performing semantic segmentation on indoor scenes.
In the research work of semantic annotation of indoor scene rgb (d) images, scene semantic annotation methods can be roughly divided into two categories according to different inputs, one category is using pixels as basic annotation units (pixel-level annotation), and the other category is using superpixels as basic annotation units (superpixel-level annotation).
Since the advent of AlexNet in 2012, deep networks have had enormous achievements and wide applications in image classification and image detection, and have proven to be a powerful method for extracting image features. The deep network is also widely used in the field of semantic segmentation, and the indoor scene semantic annotation based on the deep network is pixel-level annotation, and related methods can be roughly divided into two types. The first type is a Full Convolutional Network (FCN) based approach. Jonathan Long et al proposed FCN for semantic segmentation of images in 2015, so that end-to-end training of image semantic segmentation can be achieved. FCN is very poor in its ability to describe object boundaries and shape structure. To learn more contextual information, Liang-Chiehchen et al use Conditional Random Fields (CRF) to integrate global context and object structure information into the FCN.
The second category is methods based on encoding-decoding network architectures. SegNet and DeconvNet are typical encoding-decoding structures. The decoder is a key network structure of SegNet, the decoders of the SegNet are advanced layer by layer, and each decoder in each layer is provided with a corresponding encoder. DeconvoNet considers semantic segmentation as an instance-wise segmentation problem, performs pixel-by-pixel semantic segmentation on all object artifacts in the picture, and finally integrates semantic segmentation results of all object artifacts to generate a semantic segmentation result of the whole picture. DeconvoNet network parameters are large, resulting in very difficult training and very large time overhead in the testing phase. The extraction of object tokens and the network semantic segmentation are two separate steps.
A super-pixel level indoor scene semantic annotation method comprises the steps of firstly segmenting an indoor scene image into super-pixels according to pixel similarity, then extracting super-pixel characteristics, and further classifying the super-pixels. Four types of feature representations are proposed by Liefeng Bo and Xiaoofeng Ren in 2011 for indoor scene recognition, namely a size kernel descriptor (for extracting physical size information of an object), a shape kernel descriptor (for extracting three-dimensional shape information of the object), a gradient kernel descriptor (for extracting depth information of the object) and a local binary kernel descriptor (for extracting local texture information of the object), which are superior to traditional 3D features (such as Spin Image) and greatly improve the accuracy of object recognition in RGB-D indoor scenes. Xiaofeng Ren et al used a deep kernel descriptor to describe superpixel features in 2012, and modeled the context between superpixels based on a partition tree using a markov random field, increasing the indoor scene semantic annotation accuracy on the NYU v1 dataset from 56.6% to 76.1%.
The pixel level indoor scene semantic annotation method has the advantage that the calculation cost is high due to the large number of pixels. However, after the image is divided into the super pixels, although the calculation cost can be reduced, the position relation among the super pixels is not regulated any more, namely, the super pixels obtained by dividing one image are disordered, and most of the deep networks require input data in a normalized matrix form, so that the contradiction exists.
Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide a superpixel-level indoor scene semantic annotation method, which can avoid the problem of huge calculation cost when a deep network is applied to pixel-level indoor scene annotation, and can enable the deep network to accept a superpixel set as input.
The technical scheme of the invention is as follows: the method for labeling the indoor scene semantics at the superpixel level comprises the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features (primary features) by combining the superpixels obtained in the step (1) with the indoor scene depth images;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.
The invention constructs a depth network structure SuperPixelNet for processing the disordered superpixel set, and the network takes the disordered superpixel and the neighborhood superpixel thereof as input and gives the superpixel-level semantic annotation to the RGB-D image of the indoor scene, so that the problem of huge calculation cost when the depth network is applied to the pixel-level indoor scene annotation can be avoided, and the depth network can accept the superpixel set as input.
Drawings
FIG. 1 is a flow chart of a method for semantic labeling of indoor scenes at a superpixel level according to the present invention.
FIG. 2 is a SuperPixelNet superpixel depth network structure of the superpixel-level indoor scene semantic annotation method according to the present invention.
Detailed Description
The invention provides a super-pixel depth network, which is used for carrying out super-pixel-level semantic annotation on an RGB-D indoor scene. Firstly, performing superpixel segmentation on an RGB-D indoor scene image by adopting an SLIC algorithm. And searching the neighborhood superpixels of each superpixel according to a certain rule, and recording the superpixel to be marked as a core superpixel. The kernel descriptor characteristics and the geometric characteristics (primary characteristics) of the core superpixel and the neighborhood superpixel are used as the input of the superpixel depth network to learn the multi-mode fusion characteristics of the core superpixel and the neighborhood superpixel; the neighborhood context characteristics of the core superpixel are learned based on the multi-modal fusion characteristics of the core superpixel and the neighborhood superpixels thereof, and the multi-modal fusion characteristics of the core superpixel are spliced to be used as the characteristic representation of superpixel classification, so that the superpixel-level semantic annotation of the RGB-D images of the indoor scene is realized. As shown in fig. 1, the method for labeling indoor scene semantics at a superpixel level includes the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features (primary features) by combining the superpixels obtained in the step (1) with the indoor scene depth images;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.
The invention constructs a depth network structure SuperPixelNet for processing the disordered superpixel set, and the network takes the disordered superpixel and the neighborhood superpixel thereof as input and gives the superpixel-level semantic annotation to the RGB-D image of the indoor scene, so that the problem of huge calculation cost when the depth network is applied to the pixel-level indoor scene annotation can be avoided, and the depth network can accept the superpixel set as input.
The invention tests on a NYU v1 RGB-D data set, which contains 2284 scenes, 13 categories in total. The data set is partitioned into two disjoint subsets for training and testing, respectively. The training set contains 1370 scenarios, and the test set contains 914 scenarios.
The method provided by the invention comprises the following specific steps:
simple Linear Iterative Clustering (SLIC) is expanded on the basis of a K-Means Clustering algorithm and is a Simple and efficient method for constructing superpixels. Preferably, the step (1) comprises the following substeps:
(1.1) converting the image from an RGB color space to an LAB color space;
(1.2) firstly, determining a parameter K, namely the number of the super pixels obtained by segmentation;
(1.3) calculation ofWherein N is the number of pixels contained in the image, and S is used as the step length to uniformly initialize K clustering centers cjJ is more than or equal to 1 and less than or equal to K; setting clustering center label L (c)j)=j;
(1.4) clustering centers cjAny pixel point q epsilon Nb in 3 x 3 neighborhood3(cj)={(xq,yq)|xj-2≤xq≤xj+2,yj-2≤yq≤yj+2}, calculating LAB color gradient thereof
If pixel point c in the neighborhoodkHas a minimum color gradient value of (c) or CD (c)k) Less than or equal to CD (q), then xj=xq,yj=yq
(1.5) giving each coordinate except the cluster center point in the image as (x)i,yi) The pixel point i of (a) sets a label l (i) ═ 1, and a distance d (i) ═ infinity;
(1.6) clustering centers cjAny pixel point i epsilon Nb in 2S multiplied by 2S neighborhood2S(cj)={(xi,yi)|xj-2S-1≤xi≤xj+2S+1,yj-2S-1≤yi≤yj+2S +1} and cjIs a distance ofWherein (x)i,yi) And (l)i,ai,bi) Is the coordinate of pixel point i and the color value in LAB color space, (x)j,yj) And (l)j,aj,bj) Is the center of the cluster cjThe variable m is used for balancing the influence of the color distance and the space distance on the similarity of the pixels, and the larger m is, the larger the space distance influence is, and the more compact the superpixel is; the smaller m is, the larger the influence of the color distance is, and the super pixel is more attached to the edge of the image;
(1.7) if D (i, c)j) < d (i), set L (i) ═ L (c)j)=j,d(i)=D(i,cj);
(1.8) repeating the above steps (1.6) - (1.7) until all cluster centers c are traversedj
(1.9) all the pixels with label value j are marked as the jth superpixel SPj,SPj={(xi,yi) J | (i) ≦ j, j ≦ 1 ≦ K }, and calculate superpixels SPjC 'of center of gravity'j(x′j,y′j),c′jDefined as a super-pixel SPjOf new cluster centers, whereinNew clustering center c'jColor value in LAB color space (l'j,a'j,b'j) Is the average of the colors of the super-pixels,
(1.10) accumulating Euclidean distances between all new cluster centers and old cluster centers
(1.11) if E is greater than a given threshold, repeating the above steps (1.6) - (1.10); otherwise, the algorithm is finished to obtain K superpixels.
The step (2) comprises the following sub-steps:
(2.1) Patch feature calculation:
patch is defined as a 16x 16-sized grid (the grid size can be modified according to actual data), n pixels (the n value in the experiment of the invention is 2) are used as the step length to slide from the upper left corner of the color image RGB and the Depth image Depth to the right and downwards, and finally, a dense grid is formed on the color image RGB and the Depth image Depth; taking the diagram with size of N × M as an example, the number of Patch obtained finally isFour types of features are calculated for each Patch: depth gradient features, color features, texture features;
(2.2) obtaining superpixel features:
super pixel feature FsegIs formula (5):
wherein,respectively representing a super-pixel depth gradient characteristic, a super-pixel color characteristic and a super-pixel texture characteristic, and represented by formula (6):
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Denotes the characteristic of the Patch whose ith center position falls within the super pixel seg, and n denotes the number of Patch whose center position falls within the super pixel seg
Superpixel geometryAnddefined by formula (7):
wherein the super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
wherein M, N represents the horizontal and vertical resolution of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs a boundary image of the super-pixel segA set of elements;
area to perimeter ratio R of super pixelsegIs of formula (9):
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moments calculated by multiplying the x and y coordinates, respectively, are defined as equations (10), (11), (12)
In the formula (13)The x coordinate mean, the y coordinate mean, the x coordinate mean square, and the y coordinate mean square of the pixels included in the super pixel are respectively expressed, and are defined as formula (13):
width, Height respectively represent the image Width and Height,a calculation is made based on the normalized pixel coordinate values,
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdThe mean of the squares, the variance of the depth values, is defined as equation (14):
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as equation (15):
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
The depth gradient of step (2.1) is characterized by:
patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (1):
wherein Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;andrespectively representing the depth gradient direction and gradient of the pixel zSize;andthe depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,representing the kronecker product.Andrespectively a depth gradient gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the EMK (empirical matrix kernel) algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
The color gradient characteristic of the step (2.1) is as follows:
patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (2):
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;andrespectively representing the gradient direction and the gradient magnitude of the pixel z;andcolor gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,which represents the kronecker product of,andrespectively a color gradient gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the color gradient feature is transformed by using an EMK (empirical mode kernel) algorithm, and the transformed feature vector is still marked as Fg_c
The color characteristics of the step (2.1) are as follows:
patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (3):
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;andcolor basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatApplying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),which represents the kronecker product of,andrespectively a color gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the color feature is transformed by using an EMK (empirical matrix kernel) algorithm, and the transformed feature vector is still marked as Fcol
The texture characteristics of the step (2.1) are as follows:
firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (4):
wherein Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the local binary pattern feature, LBP, of pixel z;andrespectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,which represents the kronecker product of,andrespectively a local binary pattern gaussian kernel function and a position gaussian kernel function,andfor the parameters corresponding to the Gaussian kernel function, the EMK (empirical match kernel) algorithm is used for transforming the texture features, and the transformed feature vector is still marked as Ftex
In the step (3), the super-pixel set obtained after the indoor scene image is subjected to SLIC segmentation is recorded asWherein segkRepresenting the kth super-pixel, super-pixel segkIs taken as a setInner pixel as setAny super pixel segtE Im, if with superpixel segkHaving a common boundaryThen call segtIs segkAdjacent super-pixel of (seg)kAll adjacent ones ofPixel is expressed as The adjacent superpixel of all superpixels in Andtogether constitute a superpixel segkNeighborhood superpixel set of
As shown in fig. 2, in the step (4), the input of the superpixel depth network SuperPixelNet is each superpixel seg obtained by segmenting the indoor scene imagekAnd its neighborhood superpixel NS (seg)k) The output is a super pixel segkScores belonging to each semantic category; the network comprises three sub-networks: a multimodal fusion learning subnetwork, a superpixel neighborhood information fusion subnetwork, and a superpixel classification subnetwork.
In the step (4), the step of (C),
the multi-modal converged learning subnetwork comprises 7 branches Bi{ i ═ 1, … …, 7}, each characterized by a superpixel depth gradientSuperpixel color gradient featureSuper pixel color featureSuperpixel textureAnd three classes of superpixel geometryIs input; each branch input is a superpixel segkFeature representation of N superpixels in total with its N-1 neighborhood superpixelsAre all in 200-dimensional state,is in a 3-dimensional mode and has the characteristics of high sensitivity,is in the range of 7-dimension,is 5-dimensional; the first four network branches BiThe structures of { i ═ 1, … … and 4} are the same, and are all one-layer convolution (conv-64), the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 64 dimensions; the last three network legs BiThe structures of { i ═ 5, 6 and 7} are the same, and are all one-layer convolution (conv-32), the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 32 dimensions; then connecting the outputs of the three branches, and performing characteristic fusion by a convolution layer (conv-64) with the convolution kernel size of 1 multiplied by 3, the convolution step length of 1 and the output channel size of 64 dimensions; finally, the output of the front four branches is connected with the characteristics obtained by fusing the rear three branches, and the characteristics are fused through a convolutional layer (conv-1024) with a convolutional kernel of 1 multiplied by 5, a convolutional step of 1 and an output channel of 1024 dimensions, so that the multi-mode fusion characteristics of the superpixel are obtained;
the multi-modal fusion features of the N superpixels are used as input, enter a superpixel neighborhood information fusion sub-network, and are subjected to a layer of average pooling operation to obtain fusion features of the N superpixels; averaging the output of the pooling operation, passing through two layersOutputting full connection layers (FC-256 and FC-128) with the dimensionalities of 256 and 128 respectively to obtain final neighborhood characteristics; associating neighborhood features with superpixels segkThe 1024-dimensional multi-modal fusion features are connected, so that the super-pixel features with neighborhood information are obtained;
the super-pixel classification sub-network consists of three convolution layers, the sizes of the convolution kernels are all 1 multiplied by 1, the convolution step sizes are all 1, the output dimensions are respectively 512, 256 and 13(conv-512, conv-256, con-13), a dropout layer is arranged between conv-256 and conv-13, and the dropout probability is 0.5. Finally outputting the super pixel segkScores of the categories to which they belong; the invention adopts NYU V1 data sets collected and sorted by Silberman, Fergus and the like to carry out experiments, wherein the data sets have 13 semantic categories (Bel, Blind, Bookshelf, barrel, inspecting, Floor, Picture, Sofa, Table, TV, Wall, Window, Background) and 7 scenes in total; the data set comprises 2284 frames of color images (RGB) and 2284 frames of Depth images (Depth), wherein the color images correspond to the Depth images one by one, and the resolution of each image is 480 multiplied by 640; according to the traditional division method, 60% of the data set is selected for training and 40% of the data set is selected for testing; based on an NYU V1 data set, a comparison experiment between the method provided by the invention and the method provided by 5 people, such as Silberman, Ren, Salman H.Khan, Anran, Heng and the like, is carried out, and the experimental result is shown in table 1 (class average accuracy rate), so that the method provided by the invention can be seen to obtain an excellent labeling effect in indoor scene semantic labeling; in the invention, the value of N is 10, the network hyper-parameter batch size is set to be 16, the learning rate is set to be 5e-6, the initialization of all parameters in the network uses an Xavier initialization method, the rest convolutional layers and the full-link layer use Relu as an activation function except the last layer which does not use an activation function, the full-link layers FC-256 and FC-128 use a parameter of 0.01 as an L2 regularization parameter, and batch normalization is added to all the convolutional layers.
Table 1 shows the comparison of the present invention with other methods on the NYU v1 data set, from which it can be seen that the present invention is greatly superior to other methods.
TABLE 1
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (10)

1. A super-pixel level indoor scene semantic annotation method is characterized in that: the method comprises the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features by combining the superpixels obtained in the step (1) with the indoor scene depth image;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.
2. The method for semantic labeling of indoor scenes at the superpixel level according to claim 1, characterized in that: the step (1) comprises the following sub-steps:
(1.1) converting the image from an RGB color space to an LAB color space;
(1.2) firstly, determining a parameter K, namely the number of the super pixels obtained by segmentation;
(1.3) calculation ofWherein N is the number of pixels contained in the image, and S is used as the step length to uniformly initialize K clustering centers cjJ is more than or equal to 1 and less than or equal to K; setting clustering center label L (c)j)=j;
(1.4) clustering centers cjAny pixel point q epsilon Nb in 3 x 3 neighborhood3(cj)={(xq,yq)|xj-2≤xq≤xj+2,yj-2≤yq≤yj+2}, calculating LAB color gradient thereofIf pixel point c in the neighborhoodkHas a minimum color gradient value of CD (c)k) Less than or equal to CD (q), then xj=xq,yj=yq
(1.5) giving each coordinate except the cluster center point in the image as (x)i,yi) The pixel point i of (a) sets a label l (i) ═ 1, and a distance d (i) ═ infinity;
(1.6) clustering centers cjAny pixel point i epsilon Nb in 2S multiplied by 2S neighborhood2S(cj)={(xi,yi)|xj-2S-1≤xi≤xj+2S+1,yj-2S-1≤yi≤yj+2S +1} and cjIs a distance ofWherein (x)iYi) and (l)i,ai,bi) Is the coordinate of pixel point i and the color value in LAB color space, (x)j,yj) And (l)j,aj,bj) Is the center of the cluster cjThe variable m is used for balancing the influence of the color distance and the space distance on the similarity of the pixels, and the larger m is, the larger the space distance influence is, and the more compact the superpixel is; the smaller m is, the larger the influence of the color distance is, and the super pixel is more attached to the edge of the image;
(1.7) if D (i, c)j) < d (i), set L (i) ═ L (c)j)=j,d(i)=D(i,cj);
(1.8) repeating the above steps (1.6) - (1.7) until all cluster centers c are traversedj
(1.9) all the pixels with label value j are marked as the jth superpixel SPj,SPj={(xi,yi) J | (i) ≦ j, j ≦ 1 ≦ K }, and calculate superpixels SPjC 'of center of gravity'j(x′j,y′j),c′jDefined as a super-pixel SPjOf new cluster centers, whereinNew clustering center c'jColor value in LAB color space (l'j,a'j,b'j) Is the average of the colors of the super-pixels,
(1.10) accumulating Euclidean distances between all new cluster centers and old cluster centers
(1.11) if E is greater than a given threshold, repeating the above steps (1.6) - (1.10); otherwise, the algorithm is finished to obtain K superpixels.
3. The method for semantic labeling of indoor scenes at the superpixel level according to claim 2, characterized in that: the step (2) comprises the following sub-steps:
(2.1) Patch feature calculation:
patch is defined as a 16x 16-sized grid, and slides from the upper left corner of the color image RGB and the Depth image Depth to the right and downwards in steps of n pixels, so as to finally form a dense grid on the color image and the Depth image; four types of features are calculated for each Patch: depth gradient features, color features, texture features;
(2.2) obtaining superpixel features:
super pixel feature FsegIs formula (5):
wherein,respectively representing a super-pixel depth gradient characteristic, a super-pixel color characteristic and a super-pixel texture characteristic, and represented by formula (6):
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Denotes the characteristic of the Patch whose ith center position falls within the super pixel seg, and n denotes the number of Patch whose center position falls within the super pixel seg
Superpixel geometryAnddefined by formula (7):
wherein the super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
wherein M, N represents the horizontal and vertical resolution of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegIs of formula (9):
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order Hu moment calculated by multiplying the x coordinate by the y coordinate is defined as formulas (10), (11) and (12)
In the formula (13)Respectively representing a super imageThe x coordinate mean, the y coordinate mean, the x coordinate mean square, and the y coordinate mean square of the pixels included in the pixel are defined as formula (13):
width, Height respectively represent the image Width and Height,a calculation is made based on the normalized pixel coordinate values,
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdThe mean of the squares, the variance of the depth values, is defined as equation (14):
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as equation (15):
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
4. The method of claim 3, wherein the indoor scene semantic labeling at a superpixel level comprises:
the depth gradient of step (2.1) is characterized by:
patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (1):
wherein Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;andrespectively representing the depth gradient direction and the gradient magnitude of the pixel z;andthe depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,represents the kronecker product;andrespectively a depth gradient gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the EMK algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
5. The method of claim 4, wherein the indoor scene semantic labeling at a superpixel level comprises: the color gradient characteristic of the step (2.1) is as follows:
patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (2):
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;andrespectively representing the gradient direction and the gradient magnitude of the pixel z;andcolor gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,which represents the kronecker product of,andrespectively a color gradient gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the color gradient characteristics are transformed by using an EMK algorithm, and the transformed characteristic vector is still marked as Fg_c
6. The method of claim 5, wherein the indoor scene semantic labeling at a superpixel level comprises: the color characteristics of the step (2.1) are as follows:
patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (3):
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;andcolor basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,which represents the kronecker product of,andrespectively a color gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the EMK algorithm is used for transforming the color characteristics, and the transformed characteristic vector is still marked as Fcol
7. The method of claim 6, wherein the indoor scene semantic labeling at a superpixel level comprises: the texture characteristics of the step (2.1) are as follows:
first, the RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded asZgFor each ZgCalculating texture feature Ftex,Wherein the value of the t-th component is defined by equation (4):
wherein Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the local binary pattern feature, LBP, of pixel z;andrespectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;is thatThe mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,which represents the kronecker product of,andrespectively a local binary pattern gaussian kernel function and a position gaussian kernel function,andfor parameters corresponding to the Gaussian kernel function, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex
8. The method of claim 7, wherein the indoor scene semantic labeling at a superpixel level comprises: in the step (3), the super-pixel set obtained after the indoor scene image is subjected to SLIC segmentation is recorded asWherein segkRepresenting the kth super-pixel, super-pixel segkIs taken as a setInner pixel as setAny super pixel segtE Im, if with superpixel segkHaving a common boundaryThen call segtIs segkAdjacent super-pixel of (seg)kAll neighboring superpixels of The adjacent superpixel of all superpixels in Andtogether constitute a superpixel segkNeighborhood superpixel set of
9. The method of claim 8, wherein: in the step (4), the input of the superpixel depth network SuperPixelNet is each superpixel seg obtained by segmenting the indoor scene imagekAnd its neighborhood superpixel NS (seg)k) The output is a super pixel segkA score belonging to each semantic category, the score being a basis for determining a final semantic label for the superpixel; the network comprises three sub-networks: a multimodal fusion learning subnetwork, a superpixel neighborhood information fusion subnetwork, and a superpixel classification subnetwork.
10. The method of claim 9, wherein: in the step (4), the step of (C),
the multi-modal converged learning subnetwork comprises 7 branches Bi{ i ═ 1, … …, 7}, each characterized by a superpixel depth gradientSuperpixel color gradient featureSuper pixel color featureSuperpixel textureAnd three classes of superpixel geometryIs input; each branch input is a superpixel segkFeature representation of N superpixels in total with its N-1 neighborhood superpixelsAre all in 200-dimensional state,is in a 3-dimensional mode and has the characteristics of high sensitivity,is in the range of 7-dimension,is 5-dimensional; the first four network branches BiThe structures of { i ═ 1, … … and 4} are the same, and are all one layer of convolution conv-64, the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 64 dimensions; the last three network legs BiThe structures of { i ═ 5, 6 and 7} are the same, and are all one layer of convolution conv-32, the size of a convolution kernel is 1 × 1, the convolution step size is 1, and the size of an output channel is 32 dimensions; then connecting the outputs of the three branches, and performing characteristic fusion by convolution layer conv-64 with convolution kernel size of 1 multiplied by 3, convolution step length of 1 and output channel size of 64 dimensions; finally, the output of the front four branches is connected with the characteristics obtained by fusing the rear three branches, and the characteristics are fused by a convolution layer conv-1024 with the convolution kernel size of 1 multiplied by 5, the convolution step length of 1 and the output channel size of 1024 dimensions, so that the multi-mode fusion characteristics of the superpixel are obtained;
the multi-modal fusion features of the N superpixels are used as input, enter a superpixel neighborhood information fusion sub-network, and are subjected to a layer of average pooling operation to obtain fusion features of the N superpixels; averaging the output of the pooling operation, and obtaining the final neighborhood characteristics through two layers of full-connection layer FC-256 and FC-128 with output dimensions of 256 and 128 respectively; associating neighborhood features with a super-imagePlain segkThe 1024-dimensional multi-modal fusion features are connected, so that the super-pixel features with neighborhood information are obtained;
the super-pixel classification sub-network consists of three layers of convolution layers, the sizes of the convolution layers are all 1 multiplied by 1, the convolution step lengths are all 1, the output dimensions are respectively 512, 256 and 13, a dropout layer is arranged between conv-256 and conv-13, the dropout probability is 0.5, and finally the super-pixel seg is outputkScores belonging to each semantic category; the value of N is 10, the network hyper-parameter batch size is set to be 16, the learning rate is set to be 5e-6, the initialization of all parameters in the network uses an Xavier initialization method, the rest convolutional layers and the full-link layer use Relu as an activation function except that the last layer does not use an activation function, the full-link layers FC-256 and FC-128 use a parameter 0.01 as an L2 regularization parameter, and batch normalization is added to all the convolutional layers.
CN201910269599.1A 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level Active CN110096961B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910269599.1A CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910269599.1A CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Publications (2)

Publication Number Publication Date
CN110096961A true CN110096961A (en) 2019-08-06
CN110096961B CN110096961B (en) 2021-03-02

Family

ID=67444356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910269599.1A Active CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Country Status (1)

Country Link
CN (1) CN110096961B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634142A (en) * 2019-08-20 2019-12-31 长安大学 Complex vehicle road image boundary optimization method
CN112036466A (en) * 2020-08-26 2020-12-04 长安大学 Mixed terrain classification method
CN112241965A (en) * 2020-09-23 2021-01-19 天津大学 Method for generating superpixels and segmenting images based on deep learning
CN112669355A (en) * 2021-01-05 2021-04-16 北京信息科技大学 Method and system for splicing and fusing focusing stack data based on RGB-D super-pixel segmentation
CN114239756A (en) * 2022-02-25 2022-03-25 科大天工智能装备技术(天津)有限公司 Insect pest detection method and system
CN115273645A (en) * 2022-08-09 2022-11-01 南京大学 Map making method for automatically clustering indoor surface elements
CN117137374A (en) * 2023-10-27 2023-12-01 张家港极客嘉智能科技研发有限公司 Sweeping robot recharging method based on computer vision

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130156314A1 (en) * 2011-12-20 2013-06-20 Canon Kabushiki Kaisha Geodesic superpixel segmentation
CN103903257A (en) * 2014-02-27 2014-07-02 西安电子科技大学 Image segmentation method based on geometric block spacing symbiotic characteristics and semantic information
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
EP2980754A1 (en) * 2014-07-28 2016-02-03 Thomson Licensing Method and apparatus for generating temporally consistent superpixels
WO2016016033A1 (en) * 2014-07-31 2016-02-04 Thomson Licensing Method and apparatus for interactive video segmentation
CN105513070A (en) * 2015-12-07 2016-04-20 天津大学 RGB-D salient object detection method based on foreground and background optimization
CN106022353A (en) * 2016-05-05 2016-10-12 浙江大学 Image semantic annotation method based on super pixel segmentation
CN107256399A (en) * 2017-06-14 2017-10-17 大连海事大学 A kind of SAR image coastline Detection Method algorithms based on Gamma distribution super-pixel algorithms and based on super-pixel TMF
CN107274419A (en) * 2017-07-10 2017-10-20 北京工业大学 A kind of deep learning conspicuousness detection method based on global priori and local context
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109345549A (en) * 2018-10-26 2019-02-15 南京览众智能科技有限公司 A kind of natural scene image dividing method based on adaptive compound neighbour's figure
CN109345536A (en) * 2018-08-16 2019-02-15 广州视源电子科技股份有限公司 Image super-pixel segmentation method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130156314A1 (en) * 2011-12-20 2013-06-20 Canon Kabushiki Kaisha Geodesic superpixel segmentation
CN103903257A (en) * 2014-02-27 2014-07-02 西安电子科技大学 Image segmentation method based on geometric block spacing symbiotic characteristics and semantic information
EP2980754A1 (en) * 2014-07-28 2016-02-03 Thomson Licensing Method and apparatus for generating temporally consistent superpixels
WO2016016033A1 (en) * 2014-07-31 2016-02-04 Thomson Licensing Method and apparatus for interactive video segmentation
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
CN105513070A (en) * 2015-12-07 2016-04-20 天津大学 RGB-D salient object detection method based on foreground and background optimization
CN106022353A (en) * 2016-05-05 2016-10-12 浙江大学 Image semantic annotation method based on super pixel segmentation
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN107256399A (en) * 2017-06-14 2017-10-17 大连海事大学 A kind of SAR image coastline Detection Method algorithms based on Gamma distribution super-pixel algorithms and based on super-pixel TMF
CN107274419A (en) * 2017-07-10 2017-10-20 北京工业大学 A kind of deep learning conspicuousness detection method based on global priori and local context
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109345536A (en) * 2018-08-16 2019-02-15 广州视源电子科技股份有限公司 Image super-pixel segmentation method and device
CN109345549A (en) * 2018-10-26 2019-02-15 南京览众智能科技有限公司 A kind of natural scene image dividing method based on adaptive compound neighbour's figure

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHARLES R. QI 等: "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
RADHAKRISHNA ACHANTA 等: "SLIC Superpixels Compared to State-of-the-Art Superpixel Methods", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
XIAOFENG REN 等: "RGB-(D) Scene Label: Features and Algorithm", 《CVPR12》 *
冯希龙: "基于RGB-D图像的室内场景语义分割方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
熊汉江 等: "基于2D-3D语义传递的室内三维点云模型语义分割", 《武汉大学学报信息科学版》 *
费婷婷: "非参数化RGB-D场景理解", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634142A (en) * 2019-08-20 2019-12-31 长安大学 Complex vehicle road image boundary optimization method
CN110634142B (en) * 2019-08-20 2024-02-02 长安大学 Complex vehicle road image boundary optimization method
CN112036466A (en) * 2020-08-26 2020-12-04 长安大学 Mixed terrain classification method
CN112241965A (en) * 2020-09-23 2021-01-19 天津大学 Method for generating superpixels and segmenting images based on deep learning
CN112669355A (en) * 2021-01-05 2021-04-16 北京信息科技大学 Method and system for splicing and fusing focusing stack data based on RGB-D super-pixel segmentation
CN112669355B (en) * 2021-01-05 2023-07-25 北京信息科技大学 Method and system for splicing and fusing focusing stack data based on RGB-D super pixel segmentation
CN114239756A (en) * 2022-02-25 2022-03-25 科大天工智能装备技术(天津)有限公司 Insect pest detection method and system
CN114239756B (en) * 2022-02-25 2022-05-17 科大天工智能装备技术(天津)有限公司 Insect pest detection method and system
CN115273645A (en) * 2022-08-09 2022-11-01 南京大学 Map making method for automatically clustering indoor surface elements
CN115273645B (en) * 2022-08-09 2024-04-09 南京大学 Map making method for automatically clustering indoor surface elements
CN117137374A (en) * 2023-10-27 2023-12-01 张家港极客嘉智能科技研发有限公司 Sweeping robot recharging method based on computer vision
CN117137374B (en) * 2023-10-27 2024-01-26 张家港极客嘉智能科技研发有限公司 Sweeping robot recharging method based on computer vision

Also Published As

Publication number Publication date
CN110096961B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN110096961B (en) Indoor scene semantic annotation method at super-pixel level
Lei et al. A universal framework for salient object detection
Kohli et al. Simultaneous segmentation and pose estimation of humans using dynamic graph cuts
CN109829449B (en) RGB-D indoor scene labeling method based on super-pixel space-time context
Wang et al. Joint learning of visual attributes, object classes and visual saliency
CN107944428B (en) Indoor scene semantic annotation method based on super-pixel set
Malik et al. The three R’s of computer vision: Recognition, reconstruction and reorganization
Xie et al. Object detection and tracking under occlusion for object-level RGB-D video segmentation
Wang et al. Object instance detection with pruned Alexnet and extended training data
Nedović et al. Stages as models of scene geometry
CN106022351B (en) It is a kind of based on non-negative dictionary to the robust multi-angle of view clustering method of study
CN105160312A (en) Recommendation method for star face make up based on facial similarity match
CN103729885A (en) Hand-drawn scene three-dimensional modeling method combining multi-perspective projection with three-dimensional registration
Couprie et al. Convolutional nets and watershed cuts for real-time semantic labeling of rgbd videos
CN110517270B (en) Indoor scene semantic segmentation method based on super-pixel depth network
CN105740915A (en) Cooperation segmentation method fusing perception information
CN108090485A (en) Display foreground extraction method based on various visual angles fusion
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
Zhang et al. Deep salient object detection by integrating multi-level cues
CN109191485B (en) Multi-video target collaborative segmentation method based on multilayer hypergraph model
Jin et al. The Open Brands Dataset: Unified brand detection and recognition at scale
Cai et al. Rgb-d scene classification via multi-modal feature learning
Zhang et al. Planeseg: Building a plug-in for boosting planar region segmentation
Couprie et al. Toward real-time indoor semantic segmentation using depth information
Li et al. Spatiotemporal road scene reconstruction using superpixel-based Markov random field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant