WO2021232609A1 - Rgb-d图像的语义分割方法、系统、介质及电子设备 - Google Patents

Rgb-d图像的语义分割方法、系统、介质及电子设备 Download PDF

Info

Publication number
WO2021232609A1
WO2021232609A1 PCT/CN2020/112278 CN2020112278W WO2021232609A1 WO 2021232609 A1 WO2021232609 A1 WO 2021232609A1 CN 2020112278 W CN2020112278 W CN 2020112278W WO 2021232609 A1 WO2021232609 A1 WO 2021232609A1
Authority
WO
WIPO (PCT)
Prior art keywords
rgb
image
semantic segmentation
image block
geometric
Prior art date
Application number
PCT/CN2020/112278
Other languages
English (en)
French (fr)
Inventor
屠长河
曹金明
冷汉超
李扬彦
陈颖
里奇斯·达尼
科恩·奥尔·丹尼尔
Original Assignee
山东大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东大学 filed Critical 山东大学
Publication of WO2021232609A1 publication Critical patent/WO2021232609A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to a method, system, medium and electronic device for semantic segmentation of RGB-D images.
  • RGB-D data As a basic task, semantic segmentation has many applications in computer vision. In recent years, the widespread use of depth sensors has significantly improved the usability of RGB-D data, and as a result, many semantic segmentation methods for RGB-D data have been produced. Due to the vigorous development of Convolutional Neural Networks (CNN), the accuracy of RGB image semantic segmentation has been greatly improved.
  • CNN Convolutional Neural Networks
  • RGB-D data the natural idea is to use depth information for semantic segmentation based on the CNN method. Among them, most methods use a symmetrical way to process RGB information and depth information, that is, connect the depth information as an additional channel to the RGB channel, and then feed it into a single CNN, or process the depth information and depth information through two independent CNN streams. The RGB information is then output in series for further processing.
  • the inventor of the present disclosure found that the use of the convolution operation assumes that the input is locally relevant, that is, when the convolution operation uses a sliding window to obtain the corresponding image block on the image as the operation unit, the height of the pixel in each image block Related.
  • the pixels on an image block are close in the image plane, they are not necessarily (geometrically) coherent in the 3D space. In this case, the pixels may have little correlation and do not meet the local consistency assumption. , Which reduces the efficiency of directly using convolution operations on them.
  • RGB-Depth Map 3D voxel format or point cloud format
  • the present disclosure provides a method, system, medium and electronic device for semantic segmentation of RGB-D images, which learns the pixel-related correlation of each image block from the three-dimensional geometric structure corresponding to each image block Then, the weighted image block is convolved, so that different types of pixels can be better identified, which greatly improves the accuracy of semantic segmentation.
  • the first aspect of the present disclosure provides a semantic segmentation method for RGB-D images.
  • a method for semantic segmentation of RGB-D images including the following steps:
  • the convolutional layer of the preset convolutional neural network learns geometric weights related to pixels of each image block in the RGB-D image, and then performs convolution on the weighted image block.
  • the second aspect of the present disclosure provides a semantic segmentation system for RGB-D images.
  • a semantic segmentation system for RGB-D images including:
  • the data acquisition module is configured to: acquire the RGB-D image to be processed
  • the semantic segmentation module is configured to: use a preset convolutional neural network to process the obtained RGB-D image to obtain a semantic segmentation result;
  • the convolutional layer of the preset convolutional neural network learns geometric weights related to pixels of each image block in the RGB-D image, and then performs convolution on the weighted image block.
  • a third aspect of the present disclosure provides a medium on which a program is stored, and when the program is executed by a processor, the steps in the method for semantic segmentation of RGB-D images as described in the first aspect of the present disclosure are realized.
  • the fourth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and capable of running on the processor, and the processor executes the program as described in the first aspect of the present disclosure.
  • the semantic segmentation method, system, medium and electronic device learn the pixel-related weight of each image block from the three-dimensional geometric structure corresponding to each image block, and then perform convolution on the weighted image block Even if the original color appearance of the image block is similar, by adding the geometric perception of the image block, different types of pixels can be better identified, which greatly improves the accuracy of semantic segmentation.
  • RGB values capture the appearance attributes in the projected image space
  • D (depth channel) is a geometric attribute. Combining these two types of information in a multiplicative manner enriches the resolution ability of local image blocks, and enables convolution to have stronger geometric perception in the learning process.
  • the semantic segmentation method, system, medium, and electronic device provided by the present disclosure only add a component that dynamically reweights the local pixel intensity value of the image block before inputting the image block into the standard encoder-decoder CNN.
  • the weighting is done by a simple multi-layer perceptron, which learns weights based on the network of depth channels.
  • FIG. 1 is a schematic flowchart of a method for semantic segmentation of an RGB-D image provided in Embodiment 1 of the present disclosure.
  • FIG. 2 is a flowchart of a common convolution for RGB-D data format provided by Embodiment 1 of the disclosure.
  • FIG. 3 is a flowchart of geometric weighted convolution for RGB-D data format provided by Embodiment 1 of the disclosure.
  • Embodiment 1 of the present disclosure provides a method for semantic segmentation of RGB-D images, as shown in FIG. 1, including the following steps:
  • the convolutional layer of the preset convolutional neural network learns geometric weights related to pixels of each image block in the RGB-D image, and then performs convolution on the weighted image block.
  • RGB-D semantic segmentation is an RGB-D image I with RGB channel I RGB and depth channel I D.
  • derived from the channel I D I HHA HHA than the original depth of the channel I D in the network can be more efficient representation geometric information, it is widely used.
  • the 3D coordinate I xyz corresponding to the pixel is also input. Similar to I HHA , I xyz is also derived from the depth channel. Geometrically weighted convolution operates on the (P RGB , P HHA , P xyz ) of these image blocks. P xyz is the coordinate of a point in 3D space. The 3D coordinates of the point corresponding to the center pixel of each image block are relatively expressed as p xyz .
  • the ordinary convolution of the RGB image block P RGB can be expressed as:
  • K represents the learnable kernel in the convolutional layer
  • f represents the feature extracted from the image block.
  • [ ⁇ , ⁇ ] represents the connection along the channel dimension
  • P RGB and P HHA are both tensors
  • the shape is k 1 ⁇ k 2 ⁇ 3.
  • the additive interactions between information (stored in HHA channels), more precisely, here are only linear combinations of RGB channels and HHA channels (corresponding channels are directly connected together), and nonlinear activation is applied to these combinations.
  • the geometric weighted convolution proposed in this embodiment is:
  • W geo is the geometric weight learned from P xyz (a tensor of shape k 1 ⁇ k 2 ), and ⁇ represents the product of the spatial position.
  • represents the product of the spatial position.
  • i, j, c are the index subscripts of the elements in the corresponding tensor.
  • W geo (i, j) represents the element in the i-th row and j-th column of W geo.
  • the only difference between formulas (2) and (3) is Geometric weighting obtained by multiplying with W geo, weighted Color block The original P RGB color block is more discriminative.
  • W geo aims to reflect the local geometric correlation within each image block. Therefore, in this embodiment, P xyz is converted to p xyz in the local coordinate system to obtain This example is from Instead of learning W geo from P xyz , where:
  • MLP( ⁇ ) is a multi-layer perceptron that will Connecting to the input used to learn W geo can improve performance because It is an important element that represents geometric information such as the L 2 distance, and feeds it into the MLP to make it more aware of higher-order geometric structures to generate more effective weights.
  • P xyz and P HHA are from the depth channel, but they are used in very different and complementary ways in geometrically weighted convolution.
  • P HHA is more like a representation of semantic scene layout, especially in that there is a channel that represents the height relative to the horizontal ground.
  • P xyz can be directly calculated by depth information, it focuses more on The local geometric information represented by the spatial location, and W geo focuses more on the local geometric information to solve the detailed problem of semantic segmentation.
  • the geometric weighted convolution proposed in this embodiment is a simple and lightweight module that can learn the weights of RGB image blocks through geometric information.
  • the geometrically weighted RGB image block can replace the RGB image block in the original convolution. Therefore, theoretically, the geometrically weighted convolutional layer can be easily inserted into any existing CNN structure to replace the ordinary convolutional layer with RGB image blocks as input.
  • the geometric weight convolution is selected to be inserted into the network of the style shown in FIG. 2 to prove the effectiveness of the proposed module.
  • the network structure after inserting geometric weighted convolution is shown in FIG. 3, and this embodiment uses U-Net and DeepLab series architecture to construct this style of RGB-D segmentation network.
  • the NYU-Dv2 data set contains 1449 RGB-D scene images, using the provided 40 classification settings, 795 images for training, and 654 images for testing.
  • the SUN-RGBD data set consists of 10355 RGB-D images, and each pixel label has 37 categories. According to the settings in the data set is divided into a training set of 5285 images and a test set of 5050 images.
  • N ij represents the number of pixels that belong to class i and are predicted to be class j in the test set, and i and j can be the same.
  • this embodiment also considers the number of network parameters and multiply and accumulate (MACC) operations, because they are actually closely related to memory and computing usage.
  • MEC multiply and accumulate
  • Figure 4 shows a qualitative comparison of the NYU-Dv2 test set.
  • geometric information can be used to extract the features of the object, especially the boundary details of the object.
  • the color of the pillow is very similar to the color of the sofa, especially in the case of poor lighting conditions.
  • the table legs are in shadow, which is almost indistinguishable from the RGB image.
  • the HHA channel and the RGB channel are connected in series and used in an additive manner, it is difficult to determine the correct pixel label. It is often difficult to subdivide detailed structures such as the horizontal bar of the chair in (c) in Figure 4.
  • the weights learned from the geometric information can effectively redistribute weights to the RGB image blocks to make them more geometrically aware, thereby solving the problems in these difficult situations.
  • the gradually changing colors on the box on the table in (a) in Fig. 4 and the cabinet in (b) in Fig. 4 make it more difficult to accurately segment the boundary.
  • the weights learned from geometry help the network learn these characteristics and make precise cuts according to geometric hints.
  • Embodiment 2 of the present disclosure provides a semantic segmentation system for RGB-D images, including:
  • the data acquisition module is configured to: acquire the to-be-processed RGB-D image;
  • the semantic segmentation module is configured to: use a preset convolutional neural network to process the obtained RGB-D image to obtain a semantic segmentation result;
  • the convolutional layer of the preset convolutional neural network learns geometric weights related to pixels of each image block in the RGB-D image, and then performs convolution on the weighted image block.
  • the working method of the system is the same as the semantic segmentation method of the RGB-D image in Embodiment 1, and will not be repeated here.
  • Embodiment 3 of the present disclosure provides a medium on which a program is stored, and when the program is executed by a processor, the steps in the method for semantic segmentation of RGB-D images as described in Embodiment 1 of the present disclosure are realized.
  • Embodiment 4 of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and capable of running on the processor.
  • the processor executes the program as described in Embodiment 1 of the present disclosure.
  • the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种RGB-D图像的语义分割方法、系统、介质及电子设备,属于图像处理技术领域,包括以下步骤:获取待处理的RGB-D图像;采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积;通过从每个图像块对应的三维几何结构中学习每个图像块的与像素相关的权重,然后对加权后的图像块进行卷积,使得不同类别的像素可以更好的被鉴别开。

Description

RGB-D图像的语义分割方法、系统、介质及电子设备 技术领域
本公开涉及图像处理技术领域,特别涉及一种RGB-D图像的语义分割方法、系统、介质及电子设备。
背景技术
本部分的陈述仅仅是提供了与本公开相关的背景技术,并不必然构成现有技术。
语义分割作为一项基本任务,在计算机视觉中有众多应用。近年来,深度传感器的广泛使用显著提高了RGB-D数据的可用性,由此,产生了很多针对RGB-D数据的语义分割方法。由于卷积神经网络(Convolutional Neural Networks,CNN)的蓬勃发展极大地提高了RGB图像语义分割的准确性,对于RGB-D数据,很自然的想法就是基于CNN的方法来利用深度信息进行语义分割。在这其中,大多数方法使用对称方式处理RGB信息和深度信息,也就是说,将深度信息作为附加通道连接到RGB通道,然后馈入单个CNN,或者通过两个独立的CNN流处理深度信息和RGB信息,然后将其输出串联以用于进一步处理。
本公开发明人发现,卷积操作的使用是假定输入是具有局部相关性的,即当卷积操作使用滑动窗口在图像上取得对应的图像块作为操作单位时,每个图像块中的像素高度相关。然而,尽管在一个图像块上的像素在图像平面上接近,但是它们不一定在3D空间中(在几何上)是连贯的,这种情况的像素可能相关性很小,不符合局部一致性假设,这使得直接对它们使用卷积操作的效率降低, 使用对一组相关值进行平均的权重来对一组不相关值进行平均,这显然不是最优的方案;也有方法是直接把RGB-D图像(RGB-Depth Map)转换成3D体素格式或者点云格式,再在新的数据格式上使用相应的3D卷积或者点云的网络结构。但是这类方法往往网络框架比较复杂,且耗费巨大的内存和计算需求。
发明内容
为了解决现有技术的不足,本公开提供了一种RGB-D图像的语义分割方法、系统、介质及电子设备,从每个图像块对应的三维几何结构中学习每个图像块的与像素相关的权重,然后对加权后的图像块进行卷积,使得不同类别的像素可以更好的被鉴别开,极大的提高了语义分割的准确度。
为了实现上述目的,本公开采用如下技术方案:
本公开第一方面提供了一种RGB-D图像的语义分割方法。
一种RGB-D图像的语义分割方法,包括以下步骤:
获取待处理的RGB-D图像;
采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
本公开第二方面提供了一种RGB-D图像的语义分割系统。
一种RGB-D图像的语义分割系统,包括:
数据获取模块,被配置为:获取待处理的RGB-D图像;
语义分割模块,被配置为:采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
本公开第三方面提供了一种介质,其上存储有程序,该程序被处理器执行时实现如本公开第一方面所述的RGB-D图像的语义分割方法中的步骤。
本公开第四方面提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,所述处理器执行所述程序时实现如本公开第一方面所述的RGB-D图像的语义分割方法中的步骤。
与现有技术相比,本公开的有益效果是:
1、本公开提供的语义分割方法、系统、介质及电子设备,从每个图像块对应的三维几何结构中学习每个图像块的与像素相关的权重,然后对加权后的图像块进行卷积,即使图像块的原始颜色外观相似,通过添加图像块的几何感知后,使得不同类别的像素可以更好的被鉴别开,极大的提高了语义分割的准确度。
2、本公开提供的语义分割方法、系统、介质及电子设备,对RGB和几何信息进行非对称处理,因为它们在语义上本质上是不同的:RGB值捕获投影图像空间中的外观属性,而D(深度通道)是一个几何属性。以乘法的方式融合这两类信息,丰富了局部图像块的分辨能力,使卷积在学习过程中具有更强的几何感知能力。
3、本公开提供的语义分割方法、系统、介质及电子设备,只是添加一个组件,该组件在将图像块输入标准编码器-解码器CNN之前动态地重新加权图像块的局部像素强度值,重新加权由一个简单的多层感知器完成的,该感知器根据深度通道的网络学习权重。
附图说明
构成本公开的一部分的说明书附图用来提供对本公开的进一步理解,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。
图1为为本公开实施例1提供的RGB-D图像的语义分割方法的流程示意图。
图2为本公开实施例1提供的用于RGB-D数据格式的普通卷积流程图。
图3为本公开实施例1提供的用于RGB-D数据格式的几何加权卷积流程图。
图4为本公开实施例1提供的NYU-Dv2数据集的语义分割可视化结果。
具体实施方式
下面结合附图与实施例对本公开作进一步说明。
应该指出,以下详细说明都是例示性的,旨在对本公开提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本公开的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。
在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
实施例1:
本公开实施例1提供了一种RGB-D图像的语义分割方法,如图1所示,包括以下步骤:
获取待处理的RGB-D图像;
采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
具体实现的方案分为以下几个部分:
(1)网络输入
RGB-D语义分割的原始输入是具有RGB通道I RGB和深度通道I D的RGB-D图像I。然而在实际应用中,从I D派生的HHA通道I HHA比原始深度通道I D在网络中可以更有效的表示几何信息,因此被广泛使用。
除此之外,还输入了对应于像素的3D坐标I xyz,与I HHA类似,I xyz也是从深度通道派生。几何加权卷积在这些图像块的(P RGB,P HHA,P xyz)上操作,P xyz是3D空间中的点的坐标,将每个图像块中心像素对应的点的3D坐标相对的表示为p xyz
(2)几何加权卷积
对于RGB图像块P RGB的普通卷积可以表示为:
f=Conv(K,P RGB)  (1)
其中K表示卷积层中可学习的内核,f表示从图像块中提取的特征。图2中的方法可以表示为:
f=Conv(K,[P RGB,P HHA])  (2)
其中[·,·]表示沿通道维度的连接,P RGB和P HHA均为张量,形状为k 1×k 2×3,在此公式中仅启用颜色信息(存储在RGB通道中)和几何信息(存储在HHA通道中)之间的加性交互,更准确地说,这里仅为RGB通道和HHA通道的线性组 合(对应通道直接连接在一起),并在这些组合上应用非线性激活。
本实施例提出的几何加权卷积,如图3所示,为:
f=Conv(K,[P RGB·W geo,P HHA])  (3)
其中W geo是从P xyz中学到的几何权重(形状为k 1×k 2的张量),而·表示空间位置上的乘积。将加权的RGB色块表示为
Figure PCTCN2020112278-appb-000001
更准确地说,空间位置上的乘法可以表示为:
Figure PCTCN2020112278-appb-000002
其中i,j,c是相应张量中元素的索引下标,例如W geo(i,j)表示W geo中第i行第j列的元素,公式(2)和(3)之间的唯一区别是通过与W geo相乘获得的几何加权,加权的
Figure PCTCN2020112278-appb-000003
色块原始P RGB色块更具判别力。
在公式(3)中,对RGB与几何信息之间的加性和乘性交互都进行了建模,其中,P RGB·W geo是乘性建模,因为W geo是由几何信息学习得到的,[P RGB·W geo,P HHA]是将乘性建模后再加性建模。
W geo旨在反映每个图像块内部的局部几何相关性,因此本实施例中将P xyz转换为局部坐标系中的p xyz,得出
Figure PCTCN2020112278-appb-000004
本实施例是从
Figure PCTCN2020112278-appb-000005
而不是从P xyz中学习W geo,其中:
Figure PCTCN2020112278-appb-000006
其中
Figure PCTCN2020112278-appb-000007
Figure PCTCN2020112278-appb-000008
的元素平方,而MLP(·)是多层感知器,将
Figure PCTCN2020112278-appb-000009
连接到用于学习W geo的输入可以改善性能,这是因为
Figure PCTCN2020112278-appb-000010
是表示L 2距离之类的几何信息的重要元素,并将其馈入MLP使其更了解高阶几何结构以产生更有效的权重。
P xyz(用以学习W geo)和P HHA均来自深度通道,但是在几何加权卷积中它们以非常不同和互补的方式使用。P HHA更像是语义场景布局的一种表示形式,尤 其是其中有一个通道表示的是相对于水平地面的高度(Height),而P xyz虽然是可以通过深度信息直接计算得到,但是更着重于空间位置所代表的局部几何信息,而W geo更着重于局部几何信息,以解决语义分割的细节问题。
(3)网络体系结构
本实施例所提出的几何加权卷积是一个简单轻量级的模块,可以通过几何信息学习RGB图像块的权重。通过几何加权的RGB图像块可以替代原始卷积中的RGB图像块。因此,从理论上讲,几何加权卷积层可以轻松插入任何现存的CNN结构中,以替代以RGB图像块为输入的普通卷积层。
本实施例选择将几何权重卷积插入图2所示样式的网络,以证明提出模块的有效性。插入几何加权卷积后的网络结构如图3中所示,本实施例使用U-Net和DeepLab series体系结构构建这种样式的RGB-D分割网络。
为了验证提出的方法的有效性,在两个标准RGB-D数据集上进行了广泛的实验:NYU-Dv2和SUN-RGBD。NYU-Dv2数据集包含1449个RGB-D场景图像,采用提供的40个分类的设置,795张图像用于训练,而654张图像用于测试。SUN-RGBD数据集由10355个RGB-D图像组成,每个像素标签有37个类别。按照中的设置将数据集分为5285张图像的训练集和5050张图像的测试集。
评估指标:假设共有K+1个类,N ij表示属于i类并在测试集中被预测为j类的像素数,i和j可以相同。
本实施例中,使用三种常用指标评估了该方法的性能:
Figure PCTCN2020112278-appb-000011
Figure PCTCN2020112278-appb-000012
Figure PCTCN2020112278-appb-000013
除了以上与性能相关的指标外,本实施例还考虑了网络参数的数量和乘法累加(MACC)操作,因为它们实际上与内存和计算使用量密切相关。
除了以上与性能相关的指标外,还考虑了网络参数的数量和乘法累加(MACC)操作,因为它们实际上与内存和计算使用量密切相关。
NYU-Dv2数据集的实验结果:GWConv在NYU-Dv2数据集的结果展示在表1,并与几种最新方法进行了比较。
图4显示了对NYU-Dv2测试集的定性比较。如图4所示,通过利用几何加权卷积,可以很好地利用几何信息来提取物体的特征,特别是物体的边界细节。例如,在图4中的(d)中,枕头的颜色与沙发的颜色非常相似,尤其是在照明条件较差的情况下。在图4中的(e)中也存在类似情况,比如桌脚处于阴影中,从RGB图像上几乎无法区分。在这些情况下,即使将HHA通道与RGB通道串联并以加法方式使用,也很难确定正确的像素标签。像图4中的(c)中的椅子的水平横杆这样的细节结构通常很难进行细分。它们倾向于被邻近区域“平滑“并分类为与它们共享相同的标签。在本实施例的GWConv方法中,从几何信息中学习的权重可以有效的对RGB图像块重新分配权重,使它们更具几何意识,从而解决了这些困难情况下的问题。在图4中的(a)中的桌子上的盒子和图4中的(b)中的橱柜上逐渐变化的颜色给进行准确的分割边界增加了难度。从几何学到的权重帮助网络学习到这些特性,并根据几何提示进行精确切割。
表1:GWConv和其他方法在NYU-Dv2数据集的比较
Method Pixel Mean Mean #Param.(M) MACC(B)
  Acc.(%) Acc.(%) IoU.(%)    
FCN 65.4 46.1 34.0 275.4 352.2
Multi-Scale Conv 65.6 45.1 34.1 52.2 151.4
HHA-3DGNN - 55.2 42.0 47.3 115.7
CFN(VGG16) - - 41.7 20.48 73.5
CFN(RefineNet-152) - - 47.7 135.4 249.5
Gate Fusion 71.9 60.7 45.9 48.3 425.5
Depth-Aware - 56.3 43.9 47.0 115.7
Baseline 74.1 60.8 47.8 40.8 49.4
GWConv 74.1+0.9 60.8+0.9 47.8+0.9 40.8+0.001 49.4+0.07
SUN-RGBD数据集的实验结果:GWConv在SUN-RGBD数据集的结果展示在表2,并与几种最新方法进行了比较。同样,GWConv在基线方法上带来了显著改善(+0.6 Pixel Acc和Mean Acc以及+1.2 Mean IoU)
Figure PCTCN2020112278-appb-000014
Figure PCTCN2020112278-appb-000015
实施例2:
本公开实施例2提供了一种RGB-D图像的语义分割系统,包括:
数据获取模块,被配置为:获取待处理的RGB-D图像;
语义分割模块,被配置为:采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
所述系统的工作方法与实施例1中的RGB-D图像的语义分割方法相同,这里不再赘述。
实施例3:
本公开实施例3提供了一种介质,其上存储有程序,该程序被处理器执行时实现如本公开实施例1所述的RGB-D图像的语义分割方法中的步骤。
实施例4:
本公开实施例4提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,所述处理器执行所述程序时实现如本公开实施例1所述的RGB-D图像的语义分割方法中的步骤。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random AccessMemory,RAM)等。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则 之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种RGB-D图像的语义分割方法,其特征在于,包括以下步骤:
    获取待处理的RGB-D图像;
    采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
    其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
  2. 如权利要求1所述的RGB-D图像的语义分割方法,其特征在于,所述预设卷积神经网络的卷积层,具体为:
    f=Conv(K,[P RGB·W geo,P HHA])
    其中,[·,·]表示沿通道维度的连接,P RGB为图像块的RGB通道张量,P HHA为图像块的HHA通道张量,K表示卷积层中可学习的内核,f表示从图像块中提取的特征,W geo为学习到的几何权重。
  3. 如权利要求1所述的RGB-D图像的语义分割方法,其特征在于,加权后的图像块为图像块的RGB通道张量与学习到的几何权重在空间位置上的乘积,具体为:
    Figure PCTCN2020112278-appb-100001
    其中,P RGB为图像块的RGB通道张量,W geo为学习到的几何权重,i,j,c分别是张量中元素的索引。
  4. 如权利要求1所述的RGB-D图像的语义分割方法,其特征在于,与像素相关的几何权重反映每个图像块内部的局部几何相关性,通过图像块的3D空间中的点的坐标计算几何权重。
  5. 如权利要求4所述的RGB-D图像的语义分割方法,其特征在于,所述 几何权重的计算方式,具体为:
    Figure PCTCN2020112278-appb-100002
    其中,而MLP(·)是多层感知器,
    Figure PCTCN2020112278-appb-100003
    为图像块的3D空间中的点的坐标与图像块中心像素对应的点的3D坐标的差值。
  6. 如权利要求4所述的RGB-D图像的语义分割方法,其特征在于,图像块的3D空间中的点的坐标和HHA通道均通过RGB-D图像的深度通道获得。
  7. 如权利要求1所述的RGB-D图像的语义分割方法,其特征在于,获取的RGB-D图像为具有RGB通道和深度通道的图像。
  8. 一种RGB-D图像的语义分割系统,其特征在于,包括:
    数据获取模块,被配置为:获取待处理的RGB-D图像;
    语义分割模块,被配置为:采用预设的卷积神经网络对得到的RGB-D图像进行处理,得到语义分割结果;
    其中,所述预设卷积神经网络的卷积层学习RGB-D图像中的每个图像块的与像素相关的几何权重,然后对加权后的图像块进行卷积。
  9. 一种介质,其上存储有程序,其特征在于,该程序被处理器执行时实现如权利要求1-7任一项所述的RGB-D图像的语义分割方法中的步骤。
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7任一项所述的RGB-D图像的语义分割方法中的步骤。
PCT/CN2020/112278 2020-05-20 2020-08-28 Rgb-d图像的语义分割方法、系统、介质及电子设备 WO2021232609A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010431203.1 2020-05-20
CN202010431203.1A CN111738265B (zh) 2020-05-20 2020-05-20 Rgb-d图像的语义分割方法、系统、介质及电子设备

Publications (1)

Publication Number Publication Date
WO2021232609A1 true WO2021232609A1 (zh) 2021-11-25

Family

ID=72647472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112278 WO2021232609A1 (zh) 2020-05-20 2020-08-28 Rgb-d图像的语义分割方法、系统、介质及电子设备

Country Status (2)

Country Link
CN (1) CN111738265B (zh)
WO (1) WO2021232609A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627505A (zh) * 2022-03-18 2022-06-14 中国农业大学 一种奶牛清洁度自动评分方法、系统、存储介质及设备
CN114638842A (zh) * 2022-03-15 2022-06-17 桂林电子科技大学 一种基于mlp的医学图像分割方法
CN117617888A (zh) * 2024-01-26 2024-03-01 湖南火眼医疗科技有限公司 一种近视屈光度的预测系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673531B (zh) * 2021-08-23 2023-09-22 山东大学 基于形状感知卷积的rgb-d图像语义分割方法及系统
CN116907677B (zh) * 2023-09-15 2023-11-21 山东省科学院激光研究所 用于混凝土结构的分布式光纤温度传感系统及其测量方法
CN117333635B (zh) * 2023-10-23 2024-04-26 中国传媒大学 一种基于单幅rgb图像的交互双手三维重建方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794733A (zh) * 2014-01-20 2015-07-22 株式会社理光 对象跟踪方法和装置
CN105513033A (zh) * 2015-12-07 2016-04-20 天津大学 一种非局部联合稀疏表示的超分辨率重建方法
US20180330207A1 (en) * 2016-01-08 2018-11-15 Siemens Healthcare Gmbh Deep Image-to-Image Network Learning for Medical Image Analysis
CN110033483A (zh) * 2019-04-03 2019-07-19 北京清微智能科技有限公司 基于dcnn深度图生成方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664974A (zh) * 2018-04-03 2018-10-16 华南理工大学 一种基于rgbd图像与全残差网络的语义分割方法
CN108829826B (zh) * 2018-06-14 2020-08-07 清华大学深圳研究生院 一种基于深度学习和语义分割的图像检索方法
CN109271990A (zh) * 2018-09-03 2019-01-25 北京邮电大学 一种针对rgb-d图像的语义分割方法及装置
CN109447923A (zh) * 2018-09-27 2019-03-08 中国科学院计算技术研究所 一种语义场景补全系统与方法
CN109711413B (zh) * 2018-12-30 2023-04-07 陕西师范大学 基于深度学习的图像语义分割方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794733A (zh) * 2014-01-20 2015-07-22 株式会社理光 对象跟踪方法和装置
CN105513033A (zh) * 2015-12-07 2016-04-20 天津大学 一种非局部联合稀疏表示的超分辨率重建方法
US20180330207A1 (en) * 2016-01-08 2018-11-15 Siemens Healthcare Gmbh Deep Image-to-Image Network Learning for Medical Image Analysis
CN110033483A (zh) * 2019-04-03 2019-07-19 北京清微智能科技有限公司 基于dcnn深度图生成方法及系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638842A (zh) * 2022-03-15 2022-06-17 桂林电子科技大学 一种基于mlp的医学图像分割方法
CN114638842B (zh) * 2022-03-15 2024-03-22 桂林电子科技大学 一种基于mlp的医学图像分割方法
CN114627505A (zh) * 2022-03-18 2022-06-14 中国农业大学 一种奶牛清洁度自动评分方法、系统、存储介质及设备
CN117617888A (zh) * 2024-01-26 2024-03-01 湖南火眼医疗科技有限公司 一种近视屈光度的预测系统及方法
CN117617888B (zh) * 2024-01-26 2024-04-05 湖南火眼医疗科技有限公司 一种近视屈光度的预测系统及方法

Also Published As

Publication number Publication date
CN111738265B (zh) 2022-11-08
CN111738265A (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2021232609A1 (zh) Rgb-d图像的语义分割方法、系统、介质及电子设备
Yang et al. Projecting your view attentively: Monocular road scene layout estimation via cross-view transformation
WO2020206903A1 (zh) 影像匹配方法、装置及计算机可读存储介质
AU2016201655B2 (en) Estimating depth from a single image
US9177391B1 (en) Image-based color palette generation
US9401032B1 (en) Image-based color palette generation
Liang et al. Objective quality prediction of image retargeting algorithms
US9898860B2 (en) Method, apparatus and terminal for reconstructing three-dimensional object
JP2015215895A (ja) 深度画像の深度値復元方法及びシステム
CN104899563A (zh) 一种二维人脸关键特征点定位方法及系统
CN110633628B (zh) 基于人工神经网络的rgb图像场景三维模型重建方法
WO2022160676A1 (zh) 热力图生成模型的训练方法、装置、电子设备和存储介质
Abouelaziz et al. 3D visual saliency and convolutional neural network for blind mesh quality assessment
CN114004754B (zh) 一种基于深度学习的场景深度补全系统及方法
US20230326173A1 (en) Image processing method and apparatus, and computer-readable storage medium
Liu et al. Painting completion with generative translation models
EP3474185A1 (en) Classification of 2d images according to types of 3d arrangement
WO2021167586A1 (en) Systems and methods for object detection including pose and size estimation
Zhu et al. Automatic multi-view registration of unordered range scans without feature extraction
Zou et al. Sketch-based 3-D modeling for piecewise planar objects in single images
Cao et al. Single view 3D reconstruction based on improved RGB-D image
CN117095300B (zh) 建筑图像处理方法、装置、计算机设备和存储介质
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
US11158122B2 (en) Surface geometry object model training and inference
Yuan et al. Explore double-opponency and skin color for saliency detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20936456

Country of ref document: EP

Kind code of ref document: A1