CN117058380B - Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention - Google Patents

Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention Download PDF

Info

Publication number
CN117058380B
CN117058380B CN202311022399.9A CN202311022399A CN117058380B CN 117058380 B CN117058380 B CN 117058380B CN 202311022399 A CN202311022399 A CN 202311022399A CN 117058380 B CN117058380 B CN 117058380B
Authority
CN
China
Prior art keywords
feature map
convolution
processing
size
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311022399.9A
Other languages
Chinese (zh)
Other versions
CN117058380A (en
Inventor
张新钰
谢涛
王力
李效宇
刘德东
郭世纯
李志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xuetuling Education Technology Co ltd
Original Assignee
Beijing Xuetuling Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xuetuling Education Technology Co ltd filed Critical Beijing Xuetuling Education Technology Co ltd
Priority to CN202311022399.9A priority Critical patent/CN117058380B/en
Publication of CN117058380A publication Critical patent/CN117058380A/en
Application granted granted Critical
Publication of CN117058380B publication Critical patent/CN117058380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention, which relate to the technical field of automatic driving and comprise the following steps: processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map; processing the two-dimensional image by using a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map; processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map; processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map; and processing the sixth feature map by using the channel attention model which is trained in advance to obtain a point cloud segmentation result. The method and the device can simultaneously extract the remarkable characteristics of the large target and the small target, and have small calculated amount.

Description

Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
Technical Field
The application relates to the technical field of automatic driving, in particular to a multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention.
Background
At present, two semantic segmentation methods of point cloud data exist, namely, the first method is to directly process the point cloud data and directly transmit the point cloud data into a neural network for learning through a PointNet framework; the second method is to voxel the point cloud data, and because the point cloud data are sparse and huge, the two methods require huge calculation cost and are not suitable for real-time application.
In addition, the 3D point cloud data can be converted into 2D image data through a spherical surface in the prior art, and then the characteristic of the target is extracted by adopting efficient convolution and deconvolution operation, which achieves remarkable performance on large-size objects (such as automobiles), however, the performance on small-size objects (such as pedestrians) is poor, because the method cannot extract the remarkable characteristics of the large objects and the small objects at the same time.
Disclosure of Invention
In view of the above, the present application provides a multi-scale lightweight three-dimensional point cloud segmentation method and apparatus based on self-attention, so as to solve the above technical problems.
In a first aspect, an embodiment of the present application provides a multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention, including:
converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map;
processing the two-dimensional image by using a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map;
processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map;
processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map;
and processing the sixth feature map by using the channel attention model which is trained in advance to obtain a point cloud segmentation result.
Further, converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation; comprising the following steps:
acquiring three-dimensional coordinates (x, y, z) of each point in the three-dimensional point cloud data;
according to a spherical transformation formula, calculating zenith angle alpha and azimuth angle beta of each point:
calculating the line pixels of each point on the two-dimensional image according to the zenith angle alpha and the azimuth angle beta of the pointAnd column pixels->
Wherein Δα and Δβ represent the row resolution and column resolution of the discretized point cloud;
thereby obtaining a two-dimensional image X with the size of H×W×C input Where H, W and C represent the height, width and number of channels, respectively, of the two-dimensional image.
Further, the multi-scale hole convolution model includes: a first convolution layer of a 3 x 3 convolution kernel, a parallel multi-channel hole convolution unit, and a global average pooling layer, and a first adder; the multi-channel hole convolution unit comprises four parallel first hole convolution branches, a second hole convolution branch, a third hole convolution branch and a fourth hole convolution branch and a splicing unit; the first cavity convolution branch comprises a second convolution layer with a convolution kernel size of 1 multiplied by 1 and a first cavity convolution layer with a convolution kernel size of 3 multiplied by 3, wherein rate=1 which are connected; the second hole convolution branch comprises a first 3×3 average pooling layer and a second hole convolution layer with a convolution kernel size of 3×3 and rate=12 which are connected; the third hole convolution branch comprises a second average pooling layer of 5×5 and a third hole convolution layer with a convolution kernel size of 3×3 and rate=24 which are connected; the fourth hole convolution branch comprises a 7×7 third average pooling layer and a fourth hole convolution layer with a convolution kernel size of 3×3 and rate=36 which are connected;
processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map; comprising the following steps:
two-dimensional image X using a first convolution layer input Processing to obtain a characteristic diagram X with the size of H multiplied by W multiplied by C;
processing the feature image X by using a first cavity convolution branch to obtain a feature image X with the size ofIs characterized by (a)
Processing the feature map X by using a second cavity convolution branch to obtain a feature map X with the size ofIs characterized by (a)
Processing the feature map X by using a third cavity convolution branch to obtain a feature map X with the size ofIs characterized by (a)
Processing the feature image X by using a fourth cavity convolution branch to obtain a feature image X with the size ofIs characterized by (a)
Feature map alignment using stitching unitFeature map->Feature map->And feature map->Splicing in the channel dimension to obtain a feature map of H W C>
Processing the feature map X by using a global average pooling layer to obtain a feature map with the size of 1 multiplied by C, and expanding the feature map into a feature map with the size of H multiplied by W multiplied by C by a broadcasting mechanism
The first adder is used for comparing the characteristic diagram X with the characteristic diagramAnd feature map->Performing addition operation to obtain a first characteristic diagram Y with the size of H multiplied by W multiplied by C 1
Further, the width dimension downsampling model comprises a first Fire module, a second Fire module, a third convolution layer of a 1×1 convolution kernel, a third Fire module, a fourth convolution layer of the 1×1 convolution kernel, a fifth Fire module and a sixth Fire module which are sequentially connected;
processing the two-dimensional image by using a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map; comprising the following steps:
first feature map Y by using first Fire module 1 Processing the output result of the first Fire module by using the second Fire module to obtain a second characteristic diagram Y with the size of G multiplied by W multiplied by C 2
Second feature map Y of third convolutional layer pair with 1×1 convolutional kernel 2 Processing to obtain a product with a size ofFeature map of->
Feature map by using third Fire moduleProcessing, namely processing the output result of the third Fire module by using the fourth Fire module to obtain the size of +.>Third feature map Y of (2) 3
Third feature map Y of a fourth convolutional layer pair with a 1×1 convolutional kernel 3 Processing to obtain a product with a size ofFeature map of->
Feature map using fifth Fire moduleProcessing, namely processing the output result of the fifth Fire module by using the sixth Fire module to obtain the size of +.>Fourth feature map Y of (2) 4
Further, the spatial attention model includes: a fifth convolution layer of four parallel 1 x 1 convolution kernels, a sixth convolution layer of 1 x 1 convolution kernels, a seventh convolution layer of 1 x 1 convolution kernels and an eighth convolution layer of 1 x 1 convolution kernels, a second adder and a spatial attention module;
processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain the second feature map, the third feature map and the fourth feature map; comprising the following steps:
two-dimensional image X using a fifth convolution layer input Processing to obtain a product with a size of Feature map Z of (2) 1
Second feature map Y using a sixth convolution layer 2 Processing to obtain a product with a size ofFeature map Z of (2) 2
Third feature map Y with seventh convolution layer 3 Processing to obtain a product with a size ofFeature map Z of (2) 3
Fourth feature map Y with eighth convolutional layer 4 Processing to obtain a product with a size ofFeature map Z of (2) 4
Using a second adder to compare the characteristic diagram Z 1 Feature map Z 2 Feature map Z 3 And feature map Z 4 Performing addition operation to obtain a size ofFeature map Z of (2) 5
Feature map Z with spatial attention module 5 Processing to obtain a product with a size ofIs a fifth feature map Z of (a).
Further, the width dimension upsampling model includes: a double up-sampling layer, a quadruple up-sampling layer and a roll-back integration branch in parallel; the deconvolution integral branch comprises a third adder, a first Fire deconvolution layer, a fourth adder, a second Fire deconvolution layer, a fifth adder, a third Fire deconvolution layer and a ninth convolution layer of a 1 multiplied by 1 convolution kernel which are sequentially connected;
processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map; comprising the following steps:
fourth feature map Y with third adder 4 Performing addition operation with the fifth characteristic diagram Z to obtain a characteristic diagram Q 1
Processing the feature map Q by using a first Fire deconvolution layer to obtain a feature map Q with a size ofFeature map Q of (2) 2
Processing the fifth feature map Z by using the double up-sampling layer to obtain a size ofFeature map Q 3
Using a fourth adder to make the characteristic diagram Q 2 And feature map Q 3 Performing addition operation to obtain a size ofFeature map Q of (2) 4
Feature map Q using a second Fire deconvolution layer 4 Processing to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 5
Processing the fifth feature map Z by using a quadruple upsampling layer to obtain a feature map Q with the size of H multiplied by W multiplied by C 6
Feature map Q using fifth adder 5 And feature map Q 6 Performing addition operation to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 7
Feature map Q using a third Fire deconvolution layer 7 Processing to obtain a characteristic diagram Q with the size of H multiplied by 2W multiplied by C 8
Ninth convolution layer pair feature map Q using 1 x 1 convolution kernel 8 The sixth feature map Q is processed to obtain a size h×w×k, where K represents the number of classes of the division object.
Further, the method further comprises: and performing joint training on the multi-scale cavity convolution model, the width dimension downsampling model, the spatial attention model, the width dimension upsampling model and the channel attention model.
In a second aspect, embodiments of the present application provide a multi-scale lightweight three-dimensional point cloud segmentation apparatus based on self-attention, including:
the preprocessing unit is used for converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
the first processing unit is used for processing the two-dimensional image by utilizing the multi-scale cavity convolution model which is trained in advance to obtain a first feature map;
the downsampling unit is used for processing the two-dimensional image by utilizing a width dimension downsampling model which is trained in advance to obtain a second characteristic image, a third characteristic image and a fourth characteristic image;
the second processing unit is used for processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map;
the up-sampling unit is used for processing the fourth characteristic diagram and the fifth characteristic diagram by utilizing a pre-trained width dimension up-sampling model to obtain a sixth characteristic diagram;
and the point cloud segmentation unit is used for processing the sixth feature map by utilizing the channel attention model which is trained in advance to obtain a point cloud segmentation result.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the methods of the embodiments of the present application when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed by a processor, implement a method of embodiments of the present application.
The method adopts the combination of a spatial attention mechanism and a channel attention mechanism to extract semantic segmentation features of targets with different sizes, and utilizes multi-scale cavity convolution to obtain context information of the whole target on a plurality of scales, so that the salient features of large objects and small objects are extracted simultaneously. In order to reduce the parameters and the calculation cost, fireModule and FireDeconv (convolution and deconvolution modules) are adopted to realize a lightweight network.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention according to an embodiment of the present application;
FIG. 2 is a block diagram of a multi-scale cavity convolution model, a width dimension downsampling model, a spatial attention model, a width dimension upsampling model, and a channel attention model provided in an embodiment of the present application;
fig. 3 is a functional block diagram of a multi-scale light three-dimensional point cloud segmentation apparatus based on self-attention according to an embodiment of the present application;
fig. 4 is a functional block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
First, the design concept of the embodiment of the present application will be briefly described.
In autopilot, the camera can only capture the appearance information of the scene, and its spatial information cannot be estimated directly, and even a binocular depth camera, its positioning accuracy is much less than that of a lidar. Moreover, the detection based on the camera data is greatly influenced by the external environment (such as extreme weather, etc.), so that the robustness of the segmentation system cannot be ensured.
At present, two common methods for realizing semantic segmentation through point cloud data are available, the first method is to directly process the point cloud data and directly transmit the point cloud data into a neural network for learning through a Pointnet framework; the second method is to voxel the point cloud data, and because the point cloud data are sparse and huge, the two methods require huge calculation cost and are not suitable for real-time application. In addition, the 3D point cloud data can be converted into 2D image data through a spherical surface in the prior art, and then the characteristic of the target is extracted by adopting efficient convolution and deconvolution operation, which achieves remarkable performance on large-size objects (such as automobiles), however, the performance on small-size objects (such as pedestrians) is poor, because the method cannot extract the remarkable characteristics of the large objects and the small objects at the same time.
In order to solve the problems, the application provides a multi-scale lightweight point cloud segmentation method based on an attention mechanism, which adopts the combination of a spatial attention mechanism and a channel attention mechanism to extract semantic segmentation characteristics of targets with different sizes, and utilizes multi-scale cavity convolution to obtain context information of an overall target on a plurality of scales, so that the salient characteristics of a large object and a small object are extracted simultaneously. In order to reduce the parameters and the calculation cost, fireModule and FireDeconv (convolution and deconvolution modules) are adopted to realize a lightweight network.
After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described below.
As shown in fig. 1, an embodiment of the present application provides a multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention, including:
step 101: converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
in order to efficiently process the point cloud data, the three-dimensional point cloud data is converted into two-dimensional picture data through spherical transformation.
Specifically, the method comprises the following steps:
acquiring three-dimensional coordinates (x, y, z) of each point in the three-dimensional point cloud data;
according to a spherical transformation formula, calculating zenith angle alpha and azimuth angle beta of each point:
calculating the line pixels of each point on the two-dimensional image according to the zenith angle alpha and the azimuth angle beta of the pointSum column pixel
Wherein Δα and Δβ represent the row resolution and column resolution of the discretized point cloud;
thereby obtaining a two-dimensional image X with the size of H×W×C input Where H, W and C represent the height, width and number of channels, respectively, of the two-dimensional image.
Step 102: processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map;
as shown in fig. 2, the multi-scale hole convolution model includes: a first convolution layer of a 3 x 3 convolution kernel, a parallel multi-channel hole convolution unit, and a global average pooling layer, and a first adder; the multi-channel hole convolution unit comprises four parallel first hole convolution branches, a second hole convolution branch, a third hole convolution branch and a fourth hole convolution branch and a splicing unit; the first hole convolution branch comprises a second convolution layer with a convolution kernel size of 1×1 and a first hole convolution layer (Dilated Convolution) with a convolution kernel size of 3×3 and rate=1 which are connected; the second hole convolution branch comprises a first 3×3 average pooling layer and a second hole convolution layer with a convolution kernel size of 3×3 and rate=12 which are connected; the third hole convolution branch comprises a second average pooling layer of 5×5 and a third hole convolution layer with a convolution kernel size of 3×3 and rate=24 which are connected; the fourth hole convolution branch comprises a 7×7 third average pooling layer and a fourth hole convolution layer with a convolution kernel size of 3×3 and rate=36 which are connected;
the method specifically comprises the following steps:
two-dimensional image X using a first convolution layer input Processing to obtain a characteristic diagram X with the size of H multiplied by W multiplied by C;
inputting X obtained by preprocessing input ∈R H×W×C Through the convolution layer with the convolution kernel size of 3 multiplied by 3, X epsilon R is output H ×W×C Expressed as: x=conv 3×3 (X input );
Processing the feature image X by using a first cavity convolution branch to obtain a feature image X with the size ofIs characterized by (a)
Processing the feature map X by using a second cavity convolution branch to obtain a feature map X with the size ofIs characterized by (a)
Processing the feature map X by using a third cavity convolution branch to obtain a feature map X with the size ofIs characterized by (a)
Processing the feature image X by using a fourth cavity convolution branch to obtain a feature image X with the size ofIs characterized by (a)
Feature map alignment using stitching unitFeature map->Feature map->And feature map->Splicing in the channel dimension to obtain a feature map of H W C>
Processing the feature map X by using a global average pooling layer to obtain a feature map with the size of 1 multiplied by C, and expanding the feature map into a feature map with the size of H multiplied by W multiplied by C by a broadcasting mechanismThe broadcasting mechanism can copy the number in one channel into H×W numbers;
the first adder is used for comparing the characteristic diagram X with the characteristic diagramAnd feature map->Performing addition operation to obtain a first characteristic diagram Y with the size of H multiplied by W multiplied by C 1
Step 103: processing the two-dimensional image by using a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map;
the width dimension downsampling model comprises a first Fire module, a second Fire module, a third convolution layer of a 1 multiplied by 1 convolution kernel, a third Fire module, a fourth convolution layer of the 1 multiplied by 1 convolution kernel, a fifth Fire module and a sixth Fire module which are connected in sequence;
processing the two-dimensional image by using a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map; comprising the following steps:
using a first FireModule (FireModule) to map the first feature map Y 1 Processing the output result of the first Fire module by using the second Fire module to obtain a second characteristic diagram Y with the size of H multiplied by W multiplied by C 2
Second feature map Y of third convolutional layer (lateral step size set to 2, longitudinal step size set to 1) pair with 1×1 convolutional kernel 2 Processing to obtain a product with a size ofFeature map of->
Feature map by using third Fire moduleProcessing, namely processing the output result of the third Fire module by using the fourth Fire module to obtain the size of +.>Third feature map Y of (2) 3
Third feature map Y of a fourth convolutional layer pair with a 1×1 convolutional kernel 3 Processing to obtain a product with a size ofFeature map of->
Feature map using fifth Fire moduleProcessing, namely processing the output result of the fifth Fire module by using the sixth Fire module to obtain the size of +.>Fourth feature map Y of (2) 4
Step 104: processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map;
as shown in fig. 2, the spatial attention model includes: a fifth of four parallel 1 x 1 convolution kernels, a sixth of 1 x 1 convolution kernels, a seventh of 1 x 1 convolution kernels and an eighth of 1 x 1 convolution kernels, a second adder and a spatial attention module (Spatial Attention Modle);
in this embodiment, the steps include:
two-dimensional image X using a fifth convolution layer input Processing to obtain a product with a size of Feature map Z of (2) 1
Second feature map Y using a sixth convolution layer 2 Processing to obtain a product with a size ofFeature map Z of (2) 2
Third feature map Y with seventh convolution layer 3 Processing to obtain a product with a size ofFeature map Z of (2) 3
Fourth feature map Y with eighth convolutional layer 4 Processing to obtain a product with a size ofFeature map Z of (2) 4
Using a second adder to compare the characteristic diagram Z 1 Feature map Z 2 Feature map Z 3 And feature map Z 4 Performing addition operation to obtain a size ofFeature map Z of (2) 5
Feature map Z with spatial attention module 5 Processing to obtain a product with a size ofIs a fifth feature map Z of (a).
Step 105: processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map;
as shown in fig. 2, the width dimension up-sampling model includes: a double up-sampling layer, a quadruple up-sampling layer and a roll-back integration branch in parallel; the deconvolution integral branch comprises a third adder, a first Fire deconvolution layer (FireDeconv), a fourth adder, a second Fire deconvolution layer, a fifth adder, a third Fire deconvolution layer and a ninth convolution layer of a 1 multiplied by 1 convolution kernel which are sequentially connected;
processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map; comprising the following steps:
fourth feature map Y with third adder 4 Performing addition operation with the fifth characteristic diagram Z to obtain a characteristic diagram Q 1
Processing the feature map Q by using a first Fire deconvolution layer to obtain a feature map Q with a size ofFeature map Q of (2) 2
Processing the fifth feature map Z by using the double up-sampling layer to obtain a size ofFeature map Q 3
Using a fourth adder to make the characteristic diagram Q 2 And feature map Q 3 Performing addition operation to obtain a size ofFeature map Q of (2) 4
Feature map Q using a second Fire deconvolution layer 4 Processing to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 5
Processing the fifth feature map Z by using a quadruple upsampling layer to obtain a feature map Q with the size of H multiplied by W multiplied by C 6
Feature map Q using fifth adder 5 And feature map Q 6 Performing addition operation to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 7
Feature map Q using a third Fire deconvolution layer 7 Processing to obtain a characteristic diagram Q with the size of H multiplied by 2W multiplied by C 8
Ninth convolution layer pair feature map Q using 1 x 1 convolution kernel 8 Processing to obtain a product with a size of H×The sixth feature map Q of w×k indicates the number of classes of the division target.
Step 106: and processing the sixth feature map by using a pre-trained channel attention model (Channel Attention Model) to obtain a point cloud segmentation result.
Based on the foregoing embodiments, the embodiment of the present application provides a multi-scale lightweight three-dimensional point cloud segmentation apparatus based on self-attention, and referring to fig. 3, the multi-scale lightweight three-dimensional point cloud segmentation apparatus 200 based on self-attention provided in the embodiment of the present application at least includes:
a preprocessing unit 201, configured to convert original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
a first processing unit 202, configured to process the two-dimensional image by using a multi-scale cavity convolution model that is trained in advance, so as to obtain a first feature map;
a downsampling unit 203, configured to process the two-dimensional image by using a pre-trained width dimension downsampling model, so as to obtain a second feature map, a third feature map and a fourth feature map;
a second processing unit 204, configured to process the two-dimensional image, the second feature map, the third feature map, and the fourth feature map by using the pre-trained spatial attention model, so as to obtain a fifth feature map;
an up-sampling unit 205, configured to process the fourth feature map and the fifth feature map by using a pre-trained width dimension up-sampling model, so as to obtain a sixth feature map;
the point cloud segmentation unit 206 is configured to process the sixth feature map by using the pre-trained channel attention model, so as to obtain a point cloud segmentation result.
It should be noted that, the principle of solving the technical problem of the multi-scale light three-dimensional point cloud segmentation apparatus 200 based on self-attention provided in the embodiment of the present application is similar to that of the method provided in the embodiment of the present application, so that the implementation of the multi-scale light three-dimensional point cloud segmentation apparatus 200 based on self-attention provided in the embodiment of the present application can be referred to the implementation of the method provided in the embodiment of the present application, and the repetition is omitted.
Based on the foregoing embodiments, the embodiment of the present application further provides an electronic device, as shown in fig. 4, where the electronic device 300 provided in the embodiment of the present application includes at least: the multi-scale light three-dimensional point cloud segmentation method based on self-attention provided by the embodiment of the application is realized when the processor 301 executes the computer program.
The electronic device 300 provided by the embodiments of the present application may also include a bus 303 that connects the different components, including the processor 301 and the memory 302. Bus 303 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.
The Memory 302 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.
The memory 302 may also include a program tool 3025 having a set (at least one) of program modules 3024, the program modules 3024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with the electronic device 300 (e.g., cell phone, computer, etc.), and/or any device that enables the electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may occur through an Input/Output (I/O) interface 305. Also, electronic device 300 may communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via network adapter 306. As shown in fig. 4, the network adapter 306 communicates with other modules of the electronic device 300 over the bus 303. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.
It should be noted that the electronic device 300 shown in fig. 4 is only an example, and should not impose any limitation on the functions and application scope of the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, which stores computer instructions that when executed by a processor realize the multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention. Specifically, the executable program may be built into or installed in the electronic device 300, so that the electronic device 300 may implement the multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention provided in the embodiments of the present application by executing the built-in or installed executable program.
The self-attention-based multi-scale lightweight three-dimensional point cloud segmentation method provided by the embodiments of the present application may also be implemented as a program product comprising program code for causing the electronic device 300 to perform the self-attention-based multi-scale lightweight three-dimensional point cloud segmentation method provided by the embodiments of the present application when the program product is executable on the electronic device 300.
The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product provided by the embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solution of the present application and not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that the modifications and equivalents may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present application.

Claims (9)

1. A multi-scale lightweight three-dimensional point cloud segmentation method based on self-attention is characterized by comprising the following steps:
converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map;
processing the first feature map by utilizing a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map;
processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map;
processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map;
processing the sixth feature map by using a channel attention model which is trained in advance to obtain a point cloud segmentation result;
the multi-scale cavity convolution model includes: a first convolution layer of a 3 x 3 convolution kernel, a parallel multi-channel hole convolution unit, and a global average pooling layer, and a first adder; the multi-channel hole convolution unit comprises four parallel first hole convolution branches, a second hole convolution branch, a third hole convolution branch and a fourth hole convolution branch and a splicing unit; the first cavity convolution branch comprises a second convolution layer with a convolution kernel size of 1 multiplied by 1 and a first cavity convolution layer with a convolution kernel size of 3 multiplied by 3, wherein rate=1 which are connected; the second hole convolution branch comprises a first 3×3 average pooling layer and a second hole convolution layer with a convolution kernel size of 3×3 and rate=12 which are connected; the third hole convolution branch comprises a second average pooling layer of 5×5 and a third hole convolution layer with a convolution kernel size of 3×3 and rate=24 which are connected; the fourth hole convolution branch comprises a 7×7 third average pooling layer and a fourth hole convolution layer with a convolution kernel size of 3×3 and rate=36 which are connected;
processing a two-dimensional image by using a multi-scale cavity convolution model which is trained in advance to obtain a first feature map; comprising the following steps:
two-dimensional image X using a first convolution layer input Processing to obtain a characteristic diagram X with the size of H multiplied by W multiplied by C;
processing the feature image X by using a first cavity convolution branch to obtain a feature image X with the size ofFeature map of->
Processing the feature map X by using a second cavity convolution branch to obtain a feature map X with the size ofFeature map of->
Processing the feature map X by using a third cavity convolution branch to obtain a feature map X with the size ofFeature map of->
Processing the feature image X by using a fourth cavity convolution branch to obtain a feature image X with the size ofFeature map of->
Feature map alignment using stitching unitFeature map->Feature map->And feature map->Splicing in the channel dimension to obtain a feature map of H W C>
Processing the feature map X by using a global average pooling layer to obtain a feature map with the size of 1 multiplied by C, and expanding the feature map into a feature map with the size of H multiplied by W multiplied by C by a broadcasting mechanism
The first adder is used for comparing the characteristic diagram X with the characteristic diagramAnd feature map->Performing addition operation to obtain a first characteristic diagram Y with the size of H multiplied by W multiplied by C 1
2. The method of claim 1, wherein the original three-dimensional point cloud data is converted into a two-dimensional image by spherical transformation; comprising the following steps:
acquiring three-dimensional coordinates (x, y, z) of each point in the three-dimensional point cloud data;
according to a spherical transformation formula, calculating zenith angle alpha and azimuth angle beta of each point:
calculating the line pixels of each point on the two-dimensional image according to the zenith angle alpha and the azimuth angle beta of the pointAnd column pixels->
Wherein Δα and Δβ represent the row resolution and column resolution of the discretized point cloud;
thereby obtaining a two-dimensional image X with the size of H×W×C input Where H, W and C represent the height, width and number of channels, respectively, of the two-dimensional image.
3. The method of claim 2, wherein the width dimension downsampling model comprises a first Fire module, a second Fire module, a third convolution layer of a 1 x 1 convolution kernel, a third Fire module, a fourth convolution layer of a 1 x 1 convolution kernel, a fifth Fire module, and a sixth Fire module connected in sequence;
processing the first feature map by utilizing a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map; comprising the following steps:
first feature map Y by using first Fire module 1 Processing the output result of the first Fire module by using the second Fire module to obtain a second characteristic diagram Y with the size of H multiplied by W multiplied by C 2
Second feature map Y of third convolutional layer pair with 1×1 convolutional kernel 2 Processing to obtain a product with a size ofFeature map of->
Feature map by using third Fire moduleProcessing, namely processing the output result of the third Fire module by using the fourth Fire module to obtain the size of +.>Third feature map Y of (2) 3
Third feature map Y of a fourth convolutional layer pair with a 1×1 convolutional kernel 3 Processing to obtain a product with a size ofFeature map of->
Feature map using fifth Fire moduleProcessing, namely processing the output result of the fifth Fire module by using the sixth Fire module to obtain the size of +.>Fourth feature map Y of (2) 4
4. A method according to claim 3, wherein the spatial attention model comprises: a fifth convolution layer of four parallel 1 x 1 convolution kernels, a sixth convolution layer of 1 x 1 convolution kernels, a seventh convolution layer of 1 x 1 convolution kernels and an eighth convolution layer of 1 x 1 convolution kernels, a second adder and a spatial attention module;
processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain the second feature map, the third feature map and the fourth feature map; comprising the following steps:
two-dimensional image X using a fifth convolution layer input Processing to obtain a product with a size of Feature map Z of (2) 1
Second feature map Y using a sixth convolution layer 2 Processing to obtain a product with a size ofFeature map Z of (2) 2
Third feature map Y with seventh convolution layer 3 Processing to obtain a product with a size ofFeature map Z of (2) 3
Fourth feature map Y with eighth convolutional layer 4 Processing to obtain a product with a size ofFeature map Z of (2) 4
Using a second adder to compare the characteristic diagram Z 1 Feature map Z 2 Feature map Z 3 And feature map Z 4 Performing addition operation to obtain a size ofFeature map Z of (2) 5
Feature map Z with spatial attention module 5 Processing to obtain a product with a size ofIs a fifth feature map Z of (a).
5. The method of claim 4, wherein the width-dimensional upsampling model comprises: a double up-sampling layer, a quadruple up-sampling layer and a roll-back integration branch in parallel; the deconvolution integral branch comprises a third adder, a first Fire deconvolution layer, a fourth adder, a second Fire deconvolution layer, a fifth adder, a third Fire deconvolution layer and a ninth convolution layer of a 1 multiplied by 1 convolution kernel which are sequentially connected;
processing the fourth feature map and the fifth feature map by utilizing a pre-trained width dimension up-sampling model to obtain a sixth feature map; comprising the following steps:
fourth feature map Y with third adder 4 Performing addition operation with the fifth characteristic diagram Z to obtain a characteristic diagram Q 1
Feature map Q using a first Fire deconvolution layer 1 Processing to obtain a product with a size ofFeature map Q of (2) 2
Processing the fifth feature map Z by using the double up-sampling layer to obtain a size ofFeature map Q 3
Using a fourth adder to make the characteristic diagram Q 2 And feature map Q 3 Performing addition operation to obtain a size ofFeature map Q of (2) 4
Feature map Q using a second Fire deconvolution layer 4 Processing to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 5
Fifth feature map with quadruple upsampling layerZ is processed to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 6
Feature map Q using fifth adder 5 And feature map Q 6 Performing addition operation to obtain a characteristic diagram Q with the size of H multiplied by W multiplied by C 7
Feature map Q using a third Fire deconvolution layer 7 Processing to obtain a characteristic diagram Q with the size of H multiplied by 2W multiplied by C 8
Ninth convolution layer pair feature map Q using 1 x 1 convolution kernel 8 The sixth feature map Q is processed to obtain a size h×w×k, where K represents the number of classes of the division object.
6. The method of claim 5, wherein the method further comprises: and performing joint training on the multi-scale cavity convolution model, the width dimension downsampling model, the spatial attention model, the width dimension upsampling model and the channel attention model.
7. A multi-scale lightweight three-dimensional point cloud segmentation apparatus based on self-attention, comprising:
the preprocessing unit is used for converting the original three-dimensional point cloud data into a two-dimensional image through spherical transformation;
the first processing unit is used for processing the two-dimensional image by utilizing the multi-scale cavity convolution model which is trained in advance to obtain a first feature map;
the downsampling unit is used for processing the first feature map by utilizing a pre-trained width dimension downsampling model to obtain a second feature map, a third feature map and a fourth feature map;
the second processing unit is used for processing the two-dimensional image, the second feature map, the third feature map and the fourth feature map by utilizing the pre-trained spatial attention model to obtain a fifth feature map;
the up-sampling unit is used for processing the fourth characteristic diagram and the fifth characteristic diagram by utilizing a pre-trained width dimension up-sampling model to obtain a sixth characteristic diagram;
the point cloud segmentation unit is used for processing the sixth feature map by utilizing the channel attention model which is trained in advance to obtain a point cloud segmentation result;
the multi-scale cavity convolution model includes: a first convolution layer of a 3 x 3 convolution kernel, a parallel multi-channel hole convolution unit, and a global average pooling layer, and a first adder; the multi-channel hole convolution unit comprises four parallel first hole convolution branches, a second hole convolution branch, a third hole convolution branch and a fourth hole convolution branch and a splicing unit; the first cavity convolution branch comprises a second convolution layer with a convolution kernel size of 1 multiplied by 1 and a first cavity convolution layer with a convolution kernel size of 3 multiplied by 3, wherein rate=1 which are connected; the second hole convolution branch comprises a first 3×3 average pooling layer and a second hole convolution layer with a convolution kernel size of 3×3 and rate=12 which are connected; the third hole convolution branch comprises a second average pooling layer of 5×5 and a third hole convolution layer with a convolution kernel size of 3×3 and rate=24 which are connected; the fourth hole convolution branch comprises a 7×7 third average pooling layer and a fourth hole convolution layer with a convolution kernel size of 3×3 and rate=36 which are connected;
the first processing unit is specifically configured to:
two-dimensional image X using a first convolution layer input Processing to obtain a characteristic diagram X with the size of H multiplied by W multiplied by C;
processing the feature image X by using a first cavity convolution branch to obtain a feature image X with the size ofFeature map of->
Processing the feature map X by using a second cavity convolution branch to obtain a feature map X with the size ofFeature map of->
Processing the feature map X by using a third cavity convolution branch to obtain a feature map X with the size ofFeature map of->
Processing the feature image X by using a fourth cavity convolution branch to obtain a feature image X with the size ofFeature map of->
Feature map alignment using stitching unitFeature map->Feature map->And feature map->Splicing in the channel dimension to obtain a feature map of H W C>
Processing the feature map X by using a global average pooling layer to obtain a feature map with the size of 1 multiplied by C, and expanding the feature map into a feature map with the size of H multiplied by W multiplied by C by a broadcasting mechanism
The first adder is used for comparing the characteristic diagram X with the characteristic diagramAnd feature map->Performing addition operation to obtain a first characteristic diagram Y with the size of H multiplied by W multiplied by C 1
8. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-6 when the computer program is executed.
9. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-6.
CN202311022399.9A 2023-08-15 2023-08-15 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention Active CN117058380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311022399.9A CN117058380B (en) 2023-08-15 2023-08-15 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311022399.9A CN117058380B (en) 2023-08-15 2023-08-15 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention

Publications (2)

Publication Number Publication Date
CN117058380A CN117058380A (en) 2023-11-14
CN117058380B true CN117058380B (en) 2024-03-26

Family

ID=88652933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311022399.9A Active CN117058380B (en) 2023-08-15 2023-08-15 Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention

Country Status (1)

Country Link
CN (1) CN117058380B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259983A (en) * 2020-02-13 2020-06-09 电子科技大学 Image semantic segmentation method based on deep learning and storage medium
CN111950467A (en) * 2020-08-14 2020-11-17 清华大学 Fusion network lane line detection method based on attention mechanism and terminal equipment
CN112232391A (en) * 2020-09-29 2021-01-15 河海大学 Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN113128348A (en) * 2021-03-25 2021-07-16 西安电子科技大学 Laser radar target detection method and system fusing semantic information
CN113592794A (en) * 2021-07-16 2021-11-02 华中科技大学 Spine image segmentation method of 2D convolutional neural network based on mixed attention mechanism
CN114119635A (en) * 2021-11-23 2022-03-01 电子科技大学成都学院 Fatty liver CT image segmentation method based on cavity convolution
CN114155265A (en) * 2021-12-01 2022-03-08 南京林业大学 Three-dimensional laser radar road point cloud segmentation method based on YOLACT
CN114743007A (en) * 2022-04-20 2022-07-12 湘潭大学 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion
CN115294075A (en) * 2022-08-11 2022-11-04 重庆师范大学 OCTA image retinal vessel segmentation method based on attention mechanism
CN115457498A (en) * 2022-09-22 2022-12-09 合肥工业大学 Urban road semantic segmentation method based on double attention and dense connection
CN116189131A (en) * 2023-03-03 2023-05-30 清华大学 Multi-scale feature fusion complex environment real-time target detection method and device
CN116310349A (en) * 2023-05-25 2023-06-23 西南交通大学 Large-scale point cloud segmentation method, device, equipment and medium based on deep learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259983A (en) * 2020-02-13 2020-06-09 电子科技大学 Image semantic segmentation method based on deep learning and storage medium
CN111950467A (en) * 2020-08-14 2020-11-17 清华大学 Fusion network lane line detection method based on attention mechanism and terminal equipment
CN112232391A (en) * 2020-09-29 2021-01-15 河海大学 Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN113128348A (en) * 2021-03-25 2021-07-16 西安电子科技大学 Laser radar target detection method and system fusing semantic information
CN113592794A (en) * 2021-07-16 2021-11-02 华中科技大学 Spine image segmentation method of 2D convolutional neural network based on mixed attention mechanism
CN114119635A (en) * 2021-11-23 2022-03-01 电子科技大学成都学院 Fatty liver CT image segmentation method based on cavity convolution
CN114155265A (en) * 2021-12-01 2022-03-08 南京林业大学 Three-dimensional laser radar road point cloud segmentation method based on YOLACT
CN114743007A (en) * 2022-04-20 2022-07-12 湘潭大学 Three-dimensional semantic segmentation method based on channel attention and multi-scale fusion
CN115294075A (en) * 2022-08-11 2022-11-04 重庆师范大学 OCTA image retinal vessel segmentation method based on attention mechanism
CN115457498A (en) * 2022-09-22 2022-12-09 合肥工业大学 Urban road semantic segmentation method based on double attention and dense connection
CN116189131A (en) * 2023-03-03 2023-05-30 清华大学 Multi-scale feature fusion complex environment real-time target detection method and device
CN116310349A (en) * 2023-05-25 2023-06-23 西南交通大学 Large-scale point cloud segmentation method, device, equipment and medium based on deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Multiscale Location Attention Network for Building and Water Segmentation of Remote Sensing Image;Xin Dai et al.;《IEEE Transactions on Geoscience and Remote Sensing》;20230516;第61卷;第2-7页 *
SSCA-Net: Simultaneous Self- and Channel-Attention Neural Network for Multiscale Structure-Preserving Vessel Segmentation;Jiajia Ni et al.;《BioMed Research International》;20210331;第2021卷;第3-15页 *
基于CA-TransUNet的遥感图像道路分割;龚轩等;《计算机与现代化》;20230715(第7期);第113-116页 *
基于卷积神经网络的激光雷达点云目标分割;张青等;《通信技术》;20210710;第54卷(第7期);第1635-1639页 *
基于注意力机制与扩张卷积网络的腹腔动脉分割研究;纪玲玉;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20230115(第1期);第E060-433页 *

Also Published As

Publication number Publication date
CN117058380A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
JP6745328B2 (en) Method and apparatus for recovering point cloud data
US9916679B2 (en) Deepstereo: learning to predict new views from real world imagery
WO2019223382A1 (en) Method for estimating monocular depth, apparatus and device therefor, and storage medium
US20190080455A1 (en) Method and device for three-dimensional feature-embedded image object component-level semantic segmentation
US12125247B2 (en) Processing images using self-attention based neural networks
JP7166388B2 (en) License plate recognition method, license plate recognition model training method and apparatus
CN112699806B (en) Three-dimensional point cloud target detection method and device based on three-dimensional heat map
DE102019106123A1 (en) Three-dimensional (3D) pose estimation from the side of a monocular camera
JP7273129B2 (en) Lane detection method, device, electronic device, storage medium and vehicle
WO2021027692A1 (en) Visual feature library construction method and apparatus, visual positioning method and apparatus, and storage medium
CN110827341A (en) Picture depth estimation method and device and storage medium
US20220351495A1 (en) Method for matching image feature point, electronic device and storage medium
JP2023095806A (en) Three-dimensional data augmentation, model training and detection method, device, and autonomous vehicle
CN117058380B (en) Multi-scale lightweight three-dimensional point cloud segmentation method and device based on self-attention
WO2024104365A1 (en) Device temperature measurement method and related device
CN112085842B (en) Depth value determining method and device, electronic equipment and storage medium
CN116977959A (en) All-day-time multi-mode fusion method and device based on information entropy
CN113610856B (en) Method and device for training image segmentation model and image segmentation
JP2023027227A (en) Image processing method and device, electronic apparatus, storage medium and computer program
CN113537359A (en) Training data generation method and device, computer readable medium and electronic equipment
CN116612129B (en) Low-power consumption automatic driving point cloud segmentation method and device suitable for severe environment
CN115984583B (en) Data processing method, apparatus, computer device, storage medium, and program product
CN113628190B (en) Depth map denoising method and device, electronic equipment and medium
CN116682088A (en) Automatic driving 3D target detection method and device based on object imaging method
CN116739903A (en) Target tracking method, device and readable medium combining classification reinforcement and refinement fine tuning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Xie Tao

Inventor after: Wang Li

Inventor after: Li Xiaoyu

Inventor after: Liu Dedong

Inventor after: Guo Shichun

Inventor after: Li Zhiwei

Inventor before: Zhang Xinyu

Inventor before: Xie Tao

Inventor before: Wang Li

Inventor before: Li Xiaoyu

Inventor before: Liu Dedong

Inventor before: Guo Shichun

Inventor before: Li Zhiwei

CB03 Change of inventor or designer information