CN110096961B - Indoor scene semantic annotation method at super-pixel level - Google Patents

Indoor scene semantic annotation method at super-pixel level Download PDF

Info

Publication number
CN110096961B
CN110096961B CN201910269599.1A CN201910269599A CN110096961B CN 110096961 B CN110096961 B CN 110096961B CN 201910269599 A CN201910269599 A CN 201910269599A CN 110096961 B CN110096961 B CN 110096961B
Authority
CN
China
Prior art keywords
pixel
super
superpixel
seg
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910269599.1A
Other languages
Chinese (zh)
Other versions
CN110096961A (en
Inventor
王立春
陆建霖
王少帆
孔德慧
李敬华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201910269599.1A priority Critical patent/CN110096961B/en
Publication of CN110096961A publication Critical patent/CN110096961A/en
Application granted granted Critical
Publication of CN110096961B publication Critical patent/CN110096961B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a super-pixel level indoor scene semantic annotation method which can avoid the problem of huge computation cost when a depth network is applied to pixel level indoor scene annotation, and can enable the depth network to accept a super-pixel set as input. The method for labeling the indoor scene semantics at the superpixel level comprises the following steps: (1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm; (2) extracting superpixel kernel descriptor features (primary features) by combining the superpixels obtained in the step (1) with the indoor scene depth images; (3) constructing a neighborhood of the superpixel; (4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.

Description

Indoor scene semantic annotation method at super-pixel level
Technical Field
The invention relates to the technical field of multimedia technology and computer graphics, in particular to a super-pixel level indoor scene semantic annotation method.
Background
Indoor scene semantic annotation, as a necessary task in computer vision research, has been a research hotspot and difficulty in the field of image processing. Compared with an outdoor scene, the indoor scene has the following characteristics: 1. the variety of objects is complicated; 2. the shielding between objects is more serious; 3. the scene difference is large; 4. the illumination is not uniform; 5. the characteristics of strong discriminability are lacking. Therefore, compared with the outdoor scene, the indoor scene is more difficult to label. The semantic annotation of the indoor scene is the core content for understanding the indoor scene, and has a wide application in the fields of service, fire fighting and the like, such as the mobile positioning and environment interaction of robots, event detection in the field of security monitoring and the like.
Scene semantic annotation (scene labeling), otherwise known as scene semantic segmentation (scene semantic segmentation), labels each pixel in an image with the object class label to which it belongs. Scene semantic annotation is a challenging task because it combines the traditional problems of detection, segmentation, and multi-label recognition in a single process. The RGB-D image is a color image and a depth image acquired synchronously, containing color and depth information of the image. A Depth Image (Depth Image), or distance Image (Range Image), is a special Image, and each pixel information contains the Depth of a corresponding point in the actual scene. Compared with an RGB image, the method is not easily influenced by illumination, shadow and the like, and can better express the original appearance of a scene, so that the method is widely applied to indoor scenes. Results from studies by Silberman and Fergus show that experimental results using RGB-D data are significantly better than results using RGB alone when performing semantic segmentation on indoor scenes.
In the research work of semantic annotation of indoor scene rgb (d) images, scene semantic annotation methods can be roughly divided into two categories according to different inputs, one category is using pixels as basic annotation units (pixel-level annotation), and the other category is using superpixels as basic annotation units (superpixel-level annotation).
Since the advent of AlexNet in 2012, deep networks have had enormous achievements and wide applications in image classification and image detection, and have proven to be a powerful method for extracting image features. The deep network is also widely used in the field of semantic segmentation, and the indoor scene semantic annotation based on the deep network is pixel-level annotation, and related methods can be roughly divided into two types. The first type is a Full Convolutional Network (FCN) based approach. Jonathan Long et al proposed FCN for semantic segmentation of images in 2015, so that end-to-end training of image semantic segmentation can be achieved. FCN is very poor in its ability to describe object boundaries and shape structure. To learn more contextual information, Liang-Chieh Chen et al use Conditional Random Fields (CRF) to integrate global context and object structure information into the FCN.
The second category is methods based on encoding-decoding network architectures. SegNet and DeconvNet are typical encoding-decoding structures. The decoder is a key network structure of SegNet, the decoders of the SegNet are advanced layer by layer, and each decoder in each layer is provided with a corresponding encoder. DeconvoNet considers semantic segmentation as an instance-wise segmentation problem, performs pixel-by-pixel semantic segmentation on all object artifacts in the picture, and finally integrates semantic segmentation results of all object artifacts to generate a semantic segmentation result of the whole picture. DeconvoNet network parameters are large, resulting in very difficult training and very large time overhead in the testing phase. The extraction of object tokens and the network semantic segmentation are two separate steps.
A super-pixel level indoor scene semantic annotation method comprises the steps of firstly segmenting an indoor scene image into super-pixels according to pixel similarity, then extracting super-pixel characteristics, and further classifying the super-pixels. Four types of feature representations are proposed by Liefeng Bo and Xiaoofeng Ren in 2011 for indoor scene recognition, namely a size kernel descriptor (for extracting physical size information of an object), a shape kernel descriptor (for extracting three-dimensional shape information of the object), a gradient kernel descriptor (for extracting depth information of the object) and a local binary kernel descriptor (for extracting local texture information of the object), which are superior to traditional 3D features (such as Spin Image) and greatly improve the accuracy of object recognition in RGB-D indoor scenes. Xiaofeng Ren et al used a deep kernel descriptor to describe superpixel features in 2012, and modeled the context between superpixels based on a partition tree using a markov random field, increasing the indoor scene semantic annotation accuracy on the NYU v1 dataset from 56.6% to 76.1%.
The pixel level indoor scene semantic annotation method has the advantage that the calculation cost is high due to the large number of pixels. However, after the image is divided into the super pixels, although the calculation cost can be reduced, the position relation among the super pixels is not regulated any more, namely, the super pixels obtained by dividing one image are disordered, and most of the deep networks require input data in a normalized matrix form, so that the contradiction exists.
Disclosure of Invention
In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide a superpixel-level indoor scene semantic annotation method, which can avoid the problem of huge calculation cost when a deep network is applied to pixel-level indoor scene annotation, and can enable the deep network to accept a superpixel set as input.
The technical scheme of the invention is as follows: the method for labeling the indoor scene semantics at the superpixel level comprises the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features (primary features) by combining the superpixels obtained in the step (1) with the indoor scene depth images;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.
The invention constructs a depth network structure SuperPixelNet for processing the disordered superpixel set, and the network takes the disordered superpixel and the neighborhood superpixel thereof as input and gives the superpixel-level semantic annotation to the RGB-D image of the indoor scene, so that the problem of huge calculation cost when the depth network is applied to the pixel-level indoor scene annotation can be avoided, and the depth network can accept the superpixel set as input.
Drawings
FIG. 1 is a flow chart of a method for semantic labeling of indoor scenes at a superpixel level according to the present invention.
FIG. 2 is a SuperPixelNet superpixel depth network structure of the superpixel-level indoor scene semantic annotation method according to the present invention.
Detailed Description
The invention provides a super-pixel depth network, which is used for carrying out super-pixel-level semantic annotation on an RGB-D indoor scene. Firstly, performing superpixel segmentation on an RGB-D indoor scene image by adopting an SLIC algorithm. And searching the neighborhood superpixels of each superpixel according to a certain rule, and recording the superpixel to be marked as a core superpixel. The kernel descriptor characteristics and the geometric characteristics (primary characteristics) of the core superpixel and the neighborhood superpixel are used as the input of the superpixel depth network to learn the multi-mode fusion characteristics of the core superpixel and the neighborhood superpixel; the neighborhood context characteristics of the core superpixel are learned based on the multi-modal fusion characteristics of the core superpixel and the neighborhood superpixels thereof, and the multi-modal fusion characteristics of the core superpixel are spliced to be used as the characteristic representation of superpixel classification, so that the superpixel-level semantic annotation of the RGB-D images of the indoor scene is realized. As shown in fig. 1, the method for labeling indoor scene semantics at a superpixel level includes the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features (primary features) by combining the superpixels obtained in the step (1) with the indoor scene depth images;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; and (3) giving out the semantic annotation of the super pixel level on the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super pixel and the neighborhood super pixel to the super pixel to be annotated.
The invention constructs a depth network structure SuperPixelNet for processing the disordered superpixel set, and the network takes the disordered superpixel and the neighborhood superpixel thereof as input and gives the superpixel-level semantic annotation to the RGB-D image of the indoor scene, so that the problem of huge calculation cost when the depth network is applied to the pixel-level indoor scene annotation can be avoided, and the depth network can accept the superpixel set as input.
The invention tests on a NYU v1 RGB-D data set, which contains 2284 scenes, 13 categories in total. The data set is partitioned into two disjoint subsets for training and testing, respectively. The training set contains 1370 scenarios, and the test set contains 914 scenarios.
The method provided by the invention comprises the following specific steps:
simple Linear Iterative Clustering (SLIC) is expanded on the basis of a K-Means Clustering algorithm and is a Simple and efficient method for constructing superpixels. Preferably, the step (1) comprises the following substeps:
(1.1) converting the image from an RGB color space to an LAB color space;
(1.2) firstly, determining a parameter K, namely the number of the super pixels obtained by segmentation;
(1.3) calculation of
Figure BDA0002017950410000051
Wherein N is the number of pixels contained in the image, and S is used as the step length to uniformly initialize K clustering centers cjJ is more than or equal to 1 and less than or equal to K; setting clustering center label L (c)j)=j;
(1.4) clustering centers cjAny pixel point q epsilon Nb in 3 x 3 neighborhood3(cj)={(xq,yq)|xj-2≤xq≤xj+2,yj-2≤yq≤yj+2}, calculating LAB color gradient thereof
Figure BDA0002017950410000052
If pixel point c in the neighborhoodkColor of (2)Minimum gradient value, CD (c)k) Less than or equal to CD (q), then xj=xq,yj=yq
(1.5) giving each coordinate except the cluster center point in the image as (x)i,yi) The pixel point i of (a) sets a label l (i) ═ 1, and a distance d (i) ═ infinity;
(1.6) clustering centers cjAny pixel point i epsilon Nb in 2S multiplied by 2S neighborhood2S(cj)={(xi,yi)|xj-2S-1≤xi≤xj+2S+1,yj-2S-1≤yi≤yj+2S +1} and cjIs a distance of
Figure BDA0002017950410000061
Wherein (x)i,yi) And (l)i,ai,bi) Is the coordinate of pixel point i and the color value in LAB color space, (x)j,yj) And (l)j,aj,bj) Is the center of the cluster cjThe variable m is used for balancing the influence of the color distance and the space distance on the similarity of the pixels, and the larger m is, the larger the space distance influence is, and the more compact the superpixel is; the smaller m is, the larger the influence of the color distance is, and the super pixel is more attached to the edge of the image;
(1.7) if D (i, c)j) < d (i), set L (i) ═ L (c)j)=j,d(i)=D(i,cj);
(1.8) repeating the above steps (1.6) - (1.7) until all cluster centers c are traversedj
(1.9) all the pixels with label value j are marked as the jth superpixel SPj,SPj={(xi,yi) J | (i) ≦ j, j ≦ 1 ≦ K }, and calculate superpixels SPjC 'of center of gravity'j(x′j,y′j),c′jDefined as a super-pixel SPjOf new cluster centers, wherein
Figure BDA0002017950410000062
New clustering center c'jColor value in LAB color space (l'j,a'j,b'j) Is the average of the colors of the super-pixels,
Figure BDA0002017950410000063
(1.10) accumulating Euclidean distances between all new cluster centers and old cluster centers
Figure BDA0002017950410000064
(1.11) if E is greater than a given threshold, repeating the above steps (1.6) - (1.10); otherwise, the algorithm is finished to obtain K superpixels.
The step (2) comprises the following sub-steps:
(2.1) Patch feature calculation:
patch is defined as a 16x 16-sized grid (the grid size can be modified according to actual data), n pixels (the n value in the experiment of the invention is 2) are used as the step length to slide from the upper left corner of the color image RGB and the Depth image Depth to the right and downwards, and finally, a dense grid is formed on the color image RGB and the Depth image Depth; taking the diagram with size of N × M as an example, the number of Patch obtained finally is
Figure BDA0002017950410000071
Four types of features are calculated for each Patch: depth gradient features, color features, texture features;
(2.2) obtaining superpixel features:
super pixel feature FsegIs formula (5):
Figure BDA0002017950410000072
wherein the content of the first and second substances,
Figure BDA0002017950410000073
respectively representing a super-pixel depth gradient characteristic, a super-pixel color characteristic and a super-pixel texture characteristic, and represented by formula (6):
Figure BDA0002017950410000074
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Denotes the characteristic of the Patch whose ith center position falls within the super pixel seg, and n denotes the number of Patch whose center position falls within the super pixel seg
Superpixel geometry
Figure BDA0002017950410000075
And
Figure BDA0002017950410000076
defined by formula (7):
Figure BDA0002017950410000077
wherein the super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
Figure BDA0002017950410000081
wherein M, N represents the horizontal and vertical resolution of the RGB scene image, respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegIs of formula (9):
Figure BDA0002017950410000082
Figure BDA0002017950410000083
is based on the x-coordinate s of the pixel sxY coordinate syX coordinate andthe second-order (2+0 ═ 2 or 0+2 ═ 2) Hu moments calculated by the y-coordinate product, respectively, are defined as equations (10), (11), (12)
Figure BDA0002017950410000084
Figure BDA0002017950410000085
Figure BDA0002017950410000086
In the formula (13)
Figure BDA0002017950410000087
The x coordinate mean, the y coordinate mean, the x coordinate mean square, and the y coordinate mean square of the pixels included in the super pixel are respectively expressed, and are defined as formula (13):
Figure BDA0002017950410000088
width, Height respectively represent the image Width and Height,
Figure BDA0002017950410000089
a calculation is made based on the normalized pixel coordinate values,
Figure BDA00020179504100000810
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdThe mean of the squares, the variance of the depth values, is defined as equation (14):
Figure BDA0002017950410000091
Dmissrepresenting depth of loss in superpixelsThe proportion of pixels of information, defined as formula (15):
Figure BDA0002017950410000092
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
The depth gradient of step (2.1) is characterized by:
patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (1):
Figure BDA0002017950410000093
wherein Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure BDA0002017950410000094
and
Figure BDA0002017950410000095
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure BDA0002017950410000096
and
Figure BDA0002017950410000097
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;
Figure BDA0002017950410000098
is that
Figure BDA0002017950410000099
Upper application core masterThe mapping coefficient of the t-th principal component obtained by the component analysis KPCA,
Figure BDA00020179504100000910
representing the kronecker product.
Figure BDA00020179504100000911
And
Figure BDA00020179504100000912
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00020179504100000913
and
Figure BDA00020179504100000914
for parameters corresponding to the Gaussian kernel function, the EMK (empirical matrix kernel) algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
The color gradient characteristic of the step (2.1) is as follows:
patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (2):
Figure BDA0002017950410000101
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure BDA0002017950410000102
and
Figure BDA0002017950410000103
respectively representing the gradient direction and the gradient magnitude of the pixel z;
Figure BDA0002017950410000104
and
Figure BDA0002017950410000105
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure BDA0002017950410000106
is that
Figure BDA0002017950410000107
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure BDA0002017950410000108
which represents the kronecker product of,
Figure BDA0002017950410000109
and
Figure BDA00020179504100001010
respectively a color gradient gaussian kernel function and a position gaussian kernel function,
Figure BDA00020179504100001011
and
Figure BDA00020179504100001012
for parameters corresponding to the Gaussian kernel function, the color gradient feature is transformed by using an EMK (empirical mode kernel) algorithm, and the transformed feature vector is still marked as Fg_c
The color characteristics of the step (2.1) are as follows:
patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (3):
Figure BDA00020179504100001013
wherein Z ∈ ZcTo representThe relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure BDA00020179504100001014
and
Figure BDA00020179504100001015
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;
Figure BDA00020179504100001016
is that
Figure BDA00020179504100001017
Applying mapping coefficient of t-th principal component obtained by Kernel Principal Component Analysis (KPCA),
Figure BDA00020179504100001018
which represents the kronecker product of,
Figure BDA00020179504100001019
and
Figure BDA00020179504100001020
respectively a color gaussian kernel function and a position gaussian kernel function,
Figure BDA00020179504100001021
and
Figure BDA00020179504100001022
for parameters corresponding to the Gaussian kernel function, the color feature is transformed by using an EMK (empirical matrix kernel) algorithm, and the transformed feature vector is still marked as Fcol
The texture characteristics of the step (2.1) are as follows:
firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWhereinThe value of the t-th component is defined by equation (4):
Figure BDA0002017950410000111
wherein Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z; LBP (z) is the local binary pattern feature, LBP, of pixel z;
Figure BDA0002017950410000112
and
Figure BDA0002017950410000113
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure BDA0002017950410000114
is that
Figure BDA0002017950410000115
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure BDA0002017950410000116
which represents the kronecker product of,
Figure BDA0002017950410000117
and
Figure BDA0002017950410000118
respectively a local binary pattern gaussian kernel function and a position gaussian kernel function,
Figure BDA0002017950410000119
and
Figure BDA00020179504100001110
to correspond to highParameters of the kernel function, the texture features are transformed by using an EMK (empirical mode kernel) algorithm, and the transformed feature vector is still marked as Ftex
In the step (3), the super-pixel set obtained after the indoor scene image is subjected to SLIC segmentation is recorded as
Figure BDA00020179504100001111
Wherein segkRepresenting the kth super-pixel, super-pixel segkIs taken as a set
Figure BDA00020179504100001112
Inner pixel as set
Figure BDA00020179504100001113
Any super pixel segtE Im, if with superpixel segkHaving a common boundary
Figure BDA00020179504100001114
Then call segtIs segkAdjacent super-pixel of (seg)kAll neighboring superpixels of
Figure BDA00020179504100001115
Figure BDA00020179504100001120
The adjacent superpixel of all superpixels in
Figure BDA00020179504100001116
Figure BDA00020179504100001117
And
Figure BDA00020179504100001118
together constitute a superpixel segkNeighborhood superpixel set of
Figure BDA00020179504100001119
As shown in fig. 2, in the step (4), the input of the superpixel depth network SuperPixelNet is each superpixel seg obtained by segmenting the indoor scene imagekAnd its neighborhood superpixel NS (seg)k) The output is a super pixel segkScores belonging to each semantic category; the network comprises three sub-networks: a multimodal fusion learning subnetwork, a superpixel neighborhood information fusion subnetwork, and a superpixel classification subnetwork.
In the step (4), the step of (C),
the multi-modal converged learning subnetwork comprises 7 branches Bi{ i ═ 1, … …, 7}, each characterized by a superpixel depth gradient
Figure BDA0002017950410000121
Superpixel color gradient feature
Figure BDA0002017950410000122
Super pixel color feature
Figure BDA0002017950410000123
Superpixel texture
Figure BDA0002017950410000124
And three classes of superpixel geometry
Figure BDA0002017950410000125
Is input; each branch input is a superpixel segkFeature representation of N superpixels in total with its N-1 neighborhood superpixels
Figure BDA0002017950410000126
Are all in 200-dimensional state,
Figure BDA0002017950410000127
is in a 3-dimensional mode and has the characteristics of high sensitivity,
Figure BDA0002017950410000128
is in the range of 7-dimension,
Figure BDA0002017950410000129
is 5-dimensional(ii) a The first four network branches BiThe structures of { i ═ 1, … … and 4} are the same, and are all one-layer convolution (conv-64), the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 64 dimensions; the last three network legs BiThe structures of { i ═ 5, 6 and 7} are the same, and are all one-layer convolution (conv-32), the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 32 dimensions; then connecting the outputs of the three branches, and performing characteristic fusion by a convolution layer (conv-64) with the convolution kernel size of 1 multiplied by 3, the convolution step length of 1 and the output channel size of 64 dimensions; finally, the output of the front four branches is connected with the characteristics obtained by fusing the rear three branches, and the characteristics are fused through a convolutional layer (conv-1024) with a convolutional kernel of 1 multiplied by 5, a convolutional step of 1 and an output channel of 1024 dimensions, so that the multi-mode fusion characteristics of the superpixel are obtained;
the multi-modal fusion features of the N superpixels are used as input, enter a superpixel neighborhood information fusion sub-network, and are subjected to a layer of average pooling operation to obtain fusion features of the N superpixels; averaging the output of the pooling operation, and obtaining final neighborhood characteristics through two full-connection layers (FC-256 and FC-128) with 256 and 128 output dimensions respectively; associating neighborhood features with superpixels segkThe 1024-dimensional multi-modal fusion features are connected, so that the super-pixel features with neighborhood information are obtained;
the super-pixel classification sub-network consists of three convolution layers, the sizes of the convolution kernels are all 1 multiplied by 1, the convolution step sizes are all 1, the output dimensions are respectively 512, 256 and 13(conv-512, conv-256, con-13), a dropout layer is arranged between conv-256 and conv-13, and the dropout probability is 0.5. Finally outputting the super pixel segkScores of the categories to which they belong; the invention adopts NYU V1 data sets collected and sorted by Silberman, Fergus and the like to carry out experiments, wherein the data sets have 13 semantic categories (Bel, Blind, Bookshelf, barrel, inspecting, Floor, Picture, Sofa, Table, TV, Wall, Window, Background) and 7 scenes in total; the data set comprises 2284 frames of color images (RGB) and 2284 frames of Depth images (Depth), wherein the color images correspond to the Depth images one by one, and the resolution of each image is 480 multiplied by 640; according to the traditional partitioning method, the invention selects 60% of the data set for trainingRefining, 40% for testing; based on an NYU V1 data set, a comparison experiment between the method provided by the invention and the method provided by 5 people, such as Silberman, Ren, Salman H.Khan, Anran, Heng and the like, is carried out, and the experimental result is shown in table 1 (class average accuracy rate), so that the method provided by the invention can be seen to obtain an excellent labeling effect in indoor scene semantic labeling; in the invention, the value of N is 10, the network hyper-parameter batch size is set to be 16, the learning rate is set to be 5e-6, the initialization of all parameters in the network uses an Xavier initialization method, the rest convolutional layers and the full-link layer use Relu as an activation function except the last layer which does not use an activation function, the full-link layers FC-256 and FC-128 use a parameter of 0.01 as an L2 regularization parameter, and batch normalization is added to all the convolutional layers.
Table 1 shows the comparison of the present invention with other methods on the NYU v1 data set, from which it can be seen that the present invention is greatly superior to other methods.
TABLE 1
Figure BDA0002017950410000131
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (8)

1. A super-pixel level indoor scene semantic annotation method is characterized in that: the method comprises the following steps:
(1) performing super-pixel segmentation on the color image of the indoor scene by using a simple linear iterative clustering segmentation algorithm;
(2) extracting superpixel kernel descriptor features by combining the superpixels obtained in the step (1) with the indoor scene depth image;
(3) constructing a neighborhood of the superpixel;
(4) constructing a super-pixel depth network SuperPixelNet and learning super-pixel multi-mode characteristics; giving out a super-pixel-level semantic annotation to the RGB-D image of the indoor scene by combining the multi-mode characteristics of the super-pixel and the neighborhood super-pixel;
in the step (4), the input of the superpixel depth network SuperPixelNet is each superpixel seg obtained by segmenting the indoor scene imagekAnd its neighborhood superpixel NS (seg)k) The output is a super pixel segkA score belonging to each semantic category, the score being a basis for determining a final semantic label for the superpixel; the network comprises three sub-networks: a multi-modal fusion learning sub-network, a superpixel neighborhood information fusion sub-network and a superpixel classification sub-network;
the multi-modal converged learning subnetwork comprises 7 branches Bi1.. 7, each characterized by a superpixel depth gradient
Figure FDA0002828257600000011
Superpixel color gradient feature
Figure FDA0002828257600000012
Super pixel color feature
Figure FDA0002828257600000013
Superpixel texture
Figure FDA0002828257600000014
And three classes of superpixel geometry
Figure FDA0002828257600000015
Is input; each branch input is a superpixel segkFeature representation of N superpixels in total with its N-1 neighborhood superpixels
Figure FDA0002828257600000016
Are all in 200-dimensional state,
Figure FDA0002828257600000017
is in a 3-dimensional mode and has the characteristics of high sensitivity,
Figure FDA0002828257600000018
is in the range of 7-dimension,
Figure FDA0002828257600000019
is 5-dimensional; the first four network branches BiThe structures of { i ═ 1, ·...., 4} are the same, and are all one-layer convolution conv-64, the convolution kernel size is 1 × 1, the convolution step size is 1, and the output channel size is 64 dimensions; the last three network legs BiThe structures of { i ═ 5, 6 and 7} are the same, and are all one layer of convolution conv-32, the size of a convolution kernel is 1 × 1, the convolution step size is 1, and the size of an output channel is 32 dimensions; then connecting the outputs of the three branches, and performing characteristic fusion by convolution layer conv-64 with convolution kernel size of 1 multiplied by 3, convolution step length of 1 and output channel size of 64 dimensions; finally, the output of the front four branches is connected with the characteristics obtained by fusing the rear three branches, and the characteristics are fused by a convolution layer conv-1024 with the convolution kernel size of 1 multiplied by 5, the convolution step length of 1 and the output channel size of 1024 dimensions, so that the multi-mode fusion characteristics of the superpixel are obtained;
the multi-modal fusion features of the N superpixels are used as input, enter a superpixel neighborhood information fusion sub-network, and are subjected to a layer of average pooling operation to obtain fusion features of the N superpixels; averaging the output of the pooling operation, and obtaining the final neighborhood characteristics through two layers of full-connection layer FC-256 and FC-128 with output dimensions of 256 and 128 respectively; associating neighborhood features with superpixels segkThe 1024-dimensional multi-modal fusion features are connected, so that the super-pixel features with neighborhood information are obtained;
the super-pixel classification sub-network consists of three layers of convolution layers, the sizes of the convolution layers are all 1 multiplied by 1, the convolution step lengths are all 1, the output dimensions are respectively 512, 256 and 13, a dropout layer is arranged between conv-256 and conv-13, the dropout probability is 0.5, and finally the super-pixel seg is outputkScores belonging to each semantic category; the value of N is 10, the value of the network super parameter batch size is set to be 16, the learning rate is set to be 5e-6, the initialization of all parameters in the network uses an Xavier initialization method, the rest convolution layers and the full connection layer use Relu as an activation function except the last layer which does not use the activation function, and the full connection layers FC-256 and FC-128 used the parameter 0.01 as the L2 regularization parameter, and batch normalization was added to all convolutional layers.
2. The method for semantic labeling of indoor scenes at the superpixel level according to claim 1, characterized in that: the step (1) comprises the following sub-steps:
(1.1) converting the image from an RGB color space to an LAB color space;
(1.2) firstly, determining a parameter K, namely the number of the super pixels obtained by segmentation;
(1.3) calculation of
Figure FDA0002828257600000021
Wherein N isPUniformly initializing K clustering centers c by taking S as a step length for the number of pixels contained in the imagejJ is more than or equal to 1 and less than or equal to K; setting clustering center label L (c)j)=j;
(1.4) clustering centers cjAny pixel point q epsilon Nb in the 3 multiplied by 3 neighborhood3(cj)={(xq,yq)|xj-2≤xq≤xj+2,yj-2≤yq≤yj+2}, calculating LAB color gradient thereof
Figure FDA0002828257600000031
If pixel point c in the neighborhoodkHas a minimum color gradient value of CD (c)k) Less than or equal to CD (q), then xj=xq,yj=yq
(1.5) giving each coordinate except the cluster center point in the image as (x)i,yi) The pixel point i of (a) sets a label l (i) ═ 1, and a distance d (i) ═ infinity;
(1.6) clustering centers cjAny pixel point i epsilon Nb in 2S multiplied by 2S neighborhood2S(cj)={(xi,yi)|xj-2S-1≤xi≤xj+2S+1,yj-2S-1≤yi≤yj+2S +1} and cjIs a distance of
Figure FDA0002828257600000032
Wherein (x)i,yi) And (l)i,ai,bi) Is the coordinate of pixel point i and the color value in LAB color space, (x)j,yj) And (l)j,aj,bj) Is the center of the cluster cjThe variable m is used for balancing the influence of the color distance and the space distance on the similarity of the pixels, and the larger m is, the larger the space distance influence is, and the more compact the superpixel is; the smaller m is, the larger the influence of the color distance is, and the super pixel is more attached to the edge of the image;
(1.7) if D (i, c)j) < d (i), set L (i) ═ L (c)j)=j,d(i)=D(i,cj);
(1.8) repeating the above steps (1.6) - (1.7) until all cluster centers c are traversedj
(1.9) all the pixels with label value j are marked as the jth superpixel SPj,SPj={(xi,yi) J | (i) ≦ j, j ≦ 1 ≦ K }, and calculate superpixels SPjC 'of center of gravity'j(x′j,y′j),c′jDefined as a super-pixel SPjOf new cluster centers, wherein
Figure FDA0002828257600000033
New clustering center c'jColor value in LAB color space (l'j,a'j,b'j) Is the average of the colors of the super-pixels,
Figure FDA0002828257600000041
(1.10) accumulating Euclidean distances between all new cluster centers and old cluster centers
Figure FDA0002828257600000042
(1.11) if E is greater than a given threshold, repeating the above steps (1.6) - (1.10); otherwise, the algorithm is finished to obtain K superpixels.
3. The method for semantic labeling of indoor scenes at the superpixel level according to claim 2, characterized in that: the step (2) comprises the following sub-steps:
(2.1) Patch feature calculation:
patch is defined as a 16x 16-sized grid, and slides from the upper left corner of the color image RGB and the Depth image Depth to the right and downwards in steps of n pixels, so as to finally form a dense grid on the color image and the Depth image; four types of features are calculated for each Patch: depth gradient features, color features, texture features;
(2.2) obtaining superpixel features:
super pixel feature FsegIs formula (5):
Figure FDA0002828257600000043
wherein the content of the first and second substances,
Figure FDA0002828257600000044
respectively representing a super-pixel depth gradient characteristic, a super-pixel color characteristic and a super-pixel texture characteristic, and represented by formula (6):
Figure FDA0002828257600000045
wherein, Fg_d(i),Fg_c(i),Fcol(i),Ftex(i) Denotes the characteristic of the Patch whose ith center position falls within the super pixel seg, and n denotes the number of Patch whose center position falls within the super pixel seg
Superpixel geometry
Figure FDA0002828257600000051
And
Figure FDA0002828257600000052
defined by formula (7):
Figure FDA0002828257600000053
wherein the super pixel area Aseg=∑s∈seg1, s are pixels within the super-pixel seg; super pixel perimeter PsegDefined by formula (8):
Figure FDA0002828257600000054
m, N represents the horizontal and vertical resolution of the RGB scene image respectively; seg, seg' represent different superpixels; n is a radical of4(s) is a set of four-neighbor domains of pixel s; b issegIs the set of boundary pixels of the super-pixel seg;
area to perimeter ratio R of super pixelsegIs of formula (9):
Figure FDA0002828257600000055
Figure FDA0002828257600000056
is based on the x-coordinate s of the pixel sxY coordinate syAnd the second-order Hu moment calculated by multiplying the x coordinate by the y coordinate is defined as formulas (10), (11) and (12)
Figure FDA0002828257600000057
Figure FDA0002828257600000058
Figure FDA0002828257600000059
In the formula (7)
Figure FDA00028282576000000510
The x coordinate mean, the y coordinate mean, the x coordinate mean square, and the y coordinate mean square of the pixels included in the super pixel are respectively expressed, and are defined as formula (13):
Figure FDA00028282576000000511
width, Height respectively represent the image Width and Height,
Figure FDA00028282576000000512
a calculation is made based on the normalized pixel coordinate values,
Figure FDA0002828257600000061
Dvarrespectively representing the depth values s of the pixels s within the superpixel segdAverage value of (1), depth value sdThe mean of the squares, the variance of the depth values, is defined as equation (14):
Figure FDA0002828257600000062
Dmissthe proportion of pixels in a super-pixel that lose depth information is defined as equation (15):
Figure FDA0002828257600000063
Nsegis the principal normal vector modulo length of the point cloud corresponding to the superpixel, where the principal normal vector of the point cloud corresponding to the superpixel is estimated by Principal Component Analysis (PCA).
4. The method of claim 3, wherein the indoor scene semantic labeling at a superpixel level comprises:
the depth gradient of step (2.1) is characterized by:
patch in depth image is noted as ZdFor each ZdComputing depth gradient feature Fg_dWherein the value of the t-th component is defined by equation (1):
Figure FDA0002828257600000064
wherein Z ∈ ZdRepresents the relative two-dimensional coordinate position of pixel z in depth Patch;
Figure FDA0002828257600000065
and
Figure FDA0002828257600000066
respectively representing the depth gradient direction and the gradient magnitude of the pixel z;
Figure FDA0002828257600000067
and
Figure FDA0002828257600000068
the depth gradient base vectors and the position base vectors are respectively, and the two groups of base vectors are predefined values; dgAnd dsRespectively representing the number of depth gradient base vectors and the number of position base vectors;
Figure FDA0002828257600000069
is that
Figure FDA00028282576000000610
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA00028282576000000611
represents the kronecker product;
Figure FDA00028282576000000612
and
Figure FDA00028282576000000613
respectively a depth gradient gaussian kernel function and a position gaussian kernel function,
Figure FDA00028282576000000614
and
Figure FDA00028282576000000615
for parameters corresponding to the Gaussian kernel function, the EMK algorithm is used for transforming the depth gradient feature, and the transformed feature vector is still marked as Fg_d
5. The method of claim 4, wherein the indoor scene semantic labeling at a superpixel level comprises: the color gradient characteristic of the step (2.1) is as follows:
patch in color image is noted as ZcFor each ZcCalculating color gradient feature Fg_cWherein the value of the t-th component is defined by equation (2):
Figure FDA0002828257600000071
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of a pixel z in the color image Patch;
Figure FDA0002828257600000072
and
Figure FDA0002828257600000073
respectively representing the gradient direction and the gradient magnitude of the pixel z;
Figure FDA0002828257600000074
and
Figure FDA0002828257600000075
color gradient base vectors and position base vectors are respectively, and the two groups of base vectors are predefined values; c. CgAnd csRespectively representing the number of color gradient base vectors and the number of position base vectors;
Figure FDA0002828257600000076
is that
Figure FDA0002828257600000077
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA0002828257600000078
which represents the kronecker product of,
Figure FDA0002828257600000079
and
Figure FDA00028282576000000710
respectively a color gradient gaussian kernel function and a position gaussian kernel function,
Figure FDA00028282576000000711
and
Figure FDA00028282576000000712
for parameters corresponding to the Gaussian kernel function, the color gradient characteristics are transformed by using an EMK algorithm, and the transformed characteristic vector is still marked as Fg_c
6. The method of claim 5, wherein the indoor scene semantic labeling at a superpixel level comprises: the color characteristics of the step (2.1) are as follows:
patch in color image is noted as ZcFor each ZcCalculating color characteristics FcolWherein the value of the t-th component is defined by equation (3):
Figure FDA00028282576000000713
wherein Z ∈ ZcRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; r (z) is a three-dimensional vector, which is the RGB value of pixel z;
Figure FDA00028282576000000714
and
Figure FDA00028282576000000715
color basis vectors and position basis vectors are respectively adopted, and the two groups of basis vectors are predefined values; c. CcAnd csRespectively representing the number of the color basis vectors and the number of the position basis vectors;
Figure FDA0002828257600000081
is that
Figure FDA0002828257600000082
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA0002828257600000083
which represents the kronecker product of,
Figure FDA0002828257600000084
and
Figure FDA0002828257600000085
respectively a color gaussian kernel function and a position gaussian kernel function,
Figure FDA0002828257600000086
and
Figure FDA0002828257600000087
for parameters corresponding to the Gaussian kernel function, the color features are transformed by using an EMK algorithm, and the transformed features are transformed toThe amount is still marked Fcol
7. The method of claim 6, wherein the indoor scene semantic labeling at a superpixel level comprises: the texture characteristics of the step (2.1) are as follows:
firstly, an RGB scene image is converted into a gray scale image, and Patch in the gray scale image is recorded as ZgFor each ZgCalculating texture feature FtexWherein the value of the t-th component is defined by equation (4):
Figure FDA0002828257600000088
wherein Z ∈ ZgRepresents the relative two-dimensional coordinate position of pixel z in the color image Patch; s (z) represents the standard deviation of the pixel gray values in a 3 × 3 region centered on pixel z;
LBP (z) is the local binary pattern feature, LBP, of pixel z;
Figure FDA0002828257600000089
and
Figure FDA00028282576000000810
respectively are a local binary pattern base vector and a position base vector, and the two groups of base vectors are predefined values; gbAnd gsRespectively representing the number of the base vectors of the local binary pattern and the number of the position base vectors;
Figure FDA00028282576000000811
is that
Figure FDA00028282576000000812
The mapping coefficient of the t-th principal component obtained by applying kernel principal component analysis KPCA,
Figure FDA00028282576000000813
which represents the kronecker product of,
Figure FDA00028282576000000814
and
Figure FDA00028282576000000815
respectively a local binary pattern gaussian kernel function and a position gaussian kernel function,
Figure FDA00028282576000000816
and
Figure FDA00028282576000000817
for parameters corresponding to the Gaussian kernel function, the texture features are transformed by using an EMK algorithm, and the transformed feature vector is still marked as Ftex
8. The method of claim 7, wherein the indoor scene semantic labeling at a superpixel level comprises: in the step (3), the super-pixel set obtained after the indoor scene image is subjected to SLIC segmentation is recorded as
Figure FDA00028282576000000818
Wherein segkRepresenting the kth super-pixel, super-pixel segkIs taken as a set
Figure FDA00028282576000000819
Inner pixel as set
Figure FDA00028282576000000820
Any super pixel segtE Im, if with superpixel segkHaving a common boundary
Figure FDA0002828257600000091
Then call segtIs segkAdjacent super-pixel of (seg)kAll neighboring superpixels of
Figure FDA0002828257600000092
Figure FDA0002828257600000093
The adjacent superpixel of all superpixels in
Figure FDA0002828257600000094
Figure FDA0002828257600000095
And
Figure FDA0002828257600000096
together constitute a superpixel segkNeighborhood superpixel set of
Figure FDA0002828257600000097
CN201910269599.1A 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level Active CN110096961B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910269599.1A CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910269599.1A CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Publications (2)

Publication Number Publication Date
CN110096961A CN110096961A (en) 2019-08-06
CN110096961B true CN110096961B (en) 2021-03-02

Family

ID=67444356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910269599.1A Active CN110096961B (en) 2019-04-04 2019-04-04 Indoor scene semantic annotation method at super-pixel level

Country Status (1)

Country Link
CN (1) CN110096961B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110634142B (en) * 2019-08-20 2024-02-02 长安大学 Complex vehicle road image boundary optimization method
CN112036466A (en) * 2020-08-26 2020-12-04 长安大学 Mixed terrain classification method
CN112241965A (en) * 2020-09-23 2021-01-19 天津大学 Method for generating superpixels and segmenting images based on deep learning
CN112669355B (en) * 2021-01-05 2023-07-25 北京信息科技大学 Method and system for splicing and fusing focusing stack data based on RGB-D super pixel segmentation
CN114239756B (en) * 2022-02-25 2022-05-17 科大天工智能装备技术(天津)有限公司 Insect pest detection method and system
CN115273645B (en) * 2022-08-09 2024-04-09 南京大学 Map making method for automatically clustering indoor surface elements
CN117137374B (en) * 2023-10-27 2024-01-26 张家港极客嘉智能科技研发有限公司 Sweeping robot recharging method based on computer vision

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903257A (en) * 2014-02-27 2014-07-02 西安电子科技大学 Image segmentation method based on geometric block spacing symbiotic characteristics and semantic information
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
EP2980754A1 (en) * 2014-07-28 2016-02-03 Thomson Licensing Method and apparatus for generating temporally consistent superpixels
WO2016016033A1 (en) * 2014-07-31 2016-02-04 Thomson Licensing Method and apparatus for interactive video segmentation
CN105513070A (en) * 2015-12-07 2016-04-20 天津大学 RGB-D salient object detection method based on foreground and background optimization
CN106022353A (en) * 2016-05-05 2016-10-12 浙江大学 Image semantic annotation method based on super pixel segmentation
CN107256399A (en) * 2017-06-14 2017-10-17 大连海事大学 A kind of SAR image coastline Detection Method algorithms based on Gamma distribution super-pixel algorithms and based on super-pixel TMF
CN107274419A (en) * 2017-07-10 2017-10-20 北京工业大学 A kind of deep learning conspicuousness detection method based on global priori and local context
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109345536A (en) * 2018-08-16 2019-02-15 广州视源电子科技股份有限公司 A kind of image superpixel dividing method and its device
CN109345549A (en) * 2018-10-26 2019-02-15 南京览众智能科技有限公司 A kind of natural scene image dividing method based on adaptive compound neighbour's figure

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2011265383A1 (en) * 2011-12-20 2013-07-04 Canon Kabushiki Kaisha Geodesic superpixel segmentation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903257A (en) * 2014-02-27 2014-07-02 西安电子科技大学 Image segmentation method based on geometric block spacing symbiotic characteristics and semantic information
EP2980754A1 (en) * 2014-07-28 2016-02-03 Thomson Licensing Method and apparatus for generating temporally consistent superpixels
WO2016016033A1 (en) * 2014-07-31 2016-02-04 Thomson Licensing Method and apparatus for interactive video segmentation
CN104809187A (en) * 2015-04-20 2015-07-29 南京邮电大学 Indoor scene semantic annotation method based on RGB-D data
CN105513070A (en) * 2015-12-07 2016-04-20 天津大学 RGB-D salient object detection method based on foreground and background optimization
CN106022353A (en) * 2016-05-05 2016-10-12 浙江大学 Image semantic annotation method based on super pixel segmentation
WO2017210690A1 (en) * 2016-06-03 2017-12-07 Lu Le Spatial aggregation of holistically-nested convolutional neural networks for automated organ localization and segmentation in 3d medical scans
CN107256399A (en) * 2017-06-14 2017-10-17 大连海事大学 A kind of SAR image coastline Detection Method algorithms based on Gamma distribution super-pixel algorithms and based on super-pixel TMF
CN107274419A (en) * 2017-07-10 2017-10-20 北京工业大学 A kind of deep learning conspicuousness detection method based on global priori and local context
CN107944428A (en) * 2017-12-15 2018-04-20 北京工业大学 A kind of indoor scene semanteme marking method based on super-pixel collection
CN109345536A (en) * 2018-08-16 2019-02-15 广州视源电子科技股份有限公司 A kind of image superpixel dividing method and its device
CN109345549A (en) * 2018-10-26 2019-02-15 南京览众智能科技有限公司 A kind of natural scene image dividing method based on adaptive compound neighbour's figure

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation;Charles R. Qi 等;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20171109;第77-85页 *
RGB-(D) Scene Label: Features and Algorithm;Xiaofeng Ren 等;《CVPR12》;20121231;第2759-2766页 *
SLIC Superpixels Compared to State-of-the-Art Superpixel Methods;Radhakrishna Achanta 等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20121130;第34卷(第11期);第2274-2281页 *
基于2D-3D语义传递的室内三维点云模型语义分割;熊汉江 等;《武汉大学学报信息科学版》;20181231;第43卷(第12期);第2303-2309页 *
基于RGB-D图像的室内场景语义分割方法;冯希龙;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160515;第2016年卷(第5期);第I138-1246页 *
非参数化RGB-D场景理解;费婷婷;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170615;第2017年卷(第6期);第I138-1368页 *

Also Published As

Publication number Publication date
CN110096961A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110096961B (en) Indoor scene semantic annotation method at super-pixel level
CN109829449B (en) RGB-D indoor scene labeling method based on super-pixel space-time context
Lei et al. A universal framework for salient object detection
Kohli et al. Simultaneous segmentation and pose estimation of humans using dynamic graph cuts
Xiao et al. Multiple view semantic segmentation for street view images
CN110111338B (en) Visual tracking method based on superpixel space-time saliency segmentation
CN109086777B (en) Saliency map refining method based on global pixel characteristics
CN107944428B (en) Indoor scene semantic annotation method based on super-pixel set
Xie et al. Object detection and tracking under occlusion for object-level RGB-D video segmentation
Nedović et al. Stages as models of scene geometry
Wang et al. Object instance detection with pruned Alexnet and extended training data
CN103729885A (en) Hand-drawn scene three-dimensional modeling method combining multi-perspective projection with three-dimensional registration
Couprie et al. Convolutional nets and watershed cuts for real-time semantic labeling of rgbd videos
CN110517270B (en) Indoor scene semantic segmentation method based on super-pixel depth network
Choi et al. A contour tracking method of large motion object using optical flow and active contour model
Zhang et al. Deep salient object detection by integrating multi-level cues
Pan et al. Multi-stage feature pyramid stereo network-based disparity estimation approach for two to three-dimensional video conversion
Cai et al. Rgb-d scene classification via multi-modal feature learning
CN109191485B (en) Multi-video target collaborative segmentation method based on multilayer hypergraph model
Zhang et al. Planeseg: Building a plug-in for boosting planar region segmentation
Li et al. Spatiotemporal road scene reconstruction using superpixel-based Markov random field
Couprie et al. Toward real-time indoor semantic segmentation using depth information
Chiu et al. See the difference: Direct pre-image reconstruction and pose estimation by differentiating hog
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
CN110070626B (en) Three-dimensional object retrieval method based on multi-view classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant