CN114565754A

CN114565754A - Point cloud semantic segmentation method, device, equipment and medium based on attention mechanism

Info

Publication number: CN114565754A
Application number: CN202111537405.5A
Authority: CN
Inventors: 刘一澄; 张锲石; 程俊; 马宁; 康宇航; 高向阳
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-05-31

Abstract

The application provides a point cloud semantic segmentation method, a point cloud semantic segmentation device, point cloud semantic segmentation equipment and a point cloud semantic segmentation medium, wherein the point cloud semantic segmentation method comprises the following steps: acquiring point cloud data, and extracting the characteristics of each point in the point cloud data through a first MLP layer; adopting a preset number of coding layers to perform down-sampling on the point cloud data to obtain intermediate feature mapping corresponding to each coding layer; the method comprises the steps that a preset number of decoding layers are adopted to conduct upsampling on intermediate feature mapping, and upsampling feature mapping corresponding to each decoding layer is obtained; connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer for summarizing to obtain summarized feature mapping; and mapping the summarized features to a final result by adopting a second MLP layer. The scheme can ensure the effectiveness of sampling and can also improve the sampling rate.

Description

Point cloud semantic segmentation method, device, equipment and medium based on attention mechanism

Technical Field

The invention belongs to the technical field of optimized scheduling, and particularly relates to a point cloud semantic segmentation method, device, equipment and medium based on an attention mechanism.

Background

Scene semantic segmentation has important application value in the fields of unmanned driving and the like, such as path planning, autonomous navigation and the like. The existing 3D point cloud segmentation method has a good segmentation effect on small-scale 3D point clouds, but has a great progress space on the segmentation effect on larger-scale point clouds (such as millions). The method has important significance for the development of the field of unmanned driving by carrying out more efficient segmentation on the large scene point cloud.

Due to the fact that the large-scale point cloud data volume is large, the small-scale point cloud is relatively complex to process, and the requirement on hardware is high. The farthest sampling for the downsampling processing of the small-scale point cloud is not suitable for the large-scale point cloud, and the segmentation rate of the point cloud is slow.

Disclosure of Invention

The embodiment of the specification aims to provide a point cloud semantic segmentation method, a point cloud semantic segmentation device, point cloud semantic segmentation equipment and a point cloud semantic segmentation medium based on an attention mechanism.

In order to solve the above technical problem, the embodiments of the present application are implemented as follows:

in a first aspect, the present application provides a point cloud semantic segmentation method based on an attention mechanism, including:

acquiring point cloud data, and extracting the characteristics of each point in the point cloud data through a first MLP layer;

adopting a preset number of coding layers to perform down-sampling on the point cloud data to obtain intermediate feature mapping corresponding to each coding layer; the coding layer comprises a first KNN local feature extraction module, a first residual error attention module and a random down-sampling module;

the method comprises the steps that a preset number of decoding layers are adopted to conduct upsampling on intermediate feature mapping, and upsampling feature mapping corresponding to each decoding layer is obtained; the decoding layer comprises a second KNN local feature extraction module, a second residual error attention module and an up-sampling module;

connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer for summarizing to obtain summarized feature mapping;

and mapping the summarized features to a final result by adopting a second MLP layer.

In one embodiment, downsampling the point cloud data in each coding layer to obtain an intermediate feature map corresponding to each coding layer includes:

obtaining a first enhanced feature vector through a first KNN local feature extraction module;

transmitting the first enhanced feature vector to a first residual attention module to obtain a first residual feature vector;

and carrying out random downsampling on the first residual error feature vector through a random downsampling module to obtain intermediate feature mapping.

In one embodiment, obtaining, by the first KNN local feature extraction module, a first enhanced feature vector includes:

collecting K adjacent points of the first query point through a first KNN local feature extraction module;

carrying out position coding on the K adjacent points;

and connecting the K position codes and the corresponding K adjacent point features in series to obtain a corresponding first enhanced feature vector.

In one embodiment, the passing the first enhanced feature vector to the first residual attention module to obtain a first residual feature vector includes:

transmitting the first enhanced feature vector to a first residual attention module, and gathering features of adjacent points to obtain a gathered feature vector;

and subtracting the aggregation characteristic vector from the first enhancement characteristic vector to obtain a first residual characteristic vector.

In one embodiment, upsampling the intermediate feature map by each decoding layer to obtain an upsampled feature map corresponding to each decoding layer includes:

obtaining a second enhanced feature vector through a second KNN local feature extraction module;

transmitting the second feature enhancement vector to a second residual attention module to obtain a second residual feature vector;

and upsampling the second residual error feature vector through adjacent interpolation to obtain upsampled feature mapping.

In one embodiment, the connecting the upsampled feature map and the intermediate feature map generated by the corresponding coding layer for summarizing to obtain a summarized feature map includes:

and connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer through jump connection for summarizing to obtain summarized feature mapping.

In a second aspect, the present application provides an attention-based point cloud semantic segmentation apparatus, including:

the acquisition module is used for acquiring point cloud data and extracting the characteristics of each point in the point cloud data through the first MLP layer;

the point cloud data acquisition module is used for acquiring point cloud data of a plurality of coding layers, and acquiring intermediate feature mapping corresponding to each coding layer; the coding layer comprises a first KNN local feature extraction module, a first residual error attention module and a random down-sampling module;

the up-sampling module is used for up-sampling the intermediate feature mapping by adopting a preset number of decoding layers to obtain up-sampling feature mapping corresponding to each decoding layer; the decoding layer comprises a second KNN local feature extraction module, a second residual error attention module and an up-sampling module;

the summarizing module is used for connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer for summarizing to obtain summarized feature mapping;

and the mapping module is used for mapping the summarized features to a final result by adopting a second MLP layer.

In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the point cloud semantic segmentation method based on attention mechanism as in the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the attention-based point cloud semantic segmentation method according to the first aspect.

As can be seen from the technical solutions provided in the embodiments of the present specification, the solution: by designing a mode of combining a random down-sampling module, a KNN local feature extraction module and a residual attention module, and aggregating corresponding features of nearest K points through a KNN algorithm and a residual attention mechanism, the loss of a large amount of important point information possibly caused by random sampling is avoided, the effectiveness of sampling is ensured, and the sampling rate is improved.

By the adoption of the attention mechanism-based point cloud semantic segmentation method, accuracy of vehicle track prediction and flexibility of a model can be improved, and robustness of an algorithm is enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic flow chart of a point cloud semantic segmentation method based on an attention mechanism according to the present disclosure;

FIG. 2 is a schematic flow chart of a point cloud semantic segmentation method based on an attention mechanism according to the present disclosure;

FIG. 3 is a schematic diagram of a residual attention module provided herein;

FIG. 4 is a schematic structural diagram of a point cloud semantic segmentation apparatus based on an attention mechanism according to the present application;

fig. 5 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments described herein without departing from the scope or spirit of the application. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.

As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.

In the present application, "parts" are in parts by mass unless otherwise specified.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 1, a schematic flow chart of a point cloud semantic segmentation method based on an attention mechanism provided in an embodiment of the present application is shown.

As shown in fig. 1, a point cloud semantic segmentation method based on an attention mechanism may include:

s110, point cloud data are obtained, and the features of each point in the point cloud data are extracted through the first MLP layer.

Specifically, the point cloud data may include large-scale point cloud data and small-scale point cloud data.

Illustratively, the size of the point cloud data is nxdin, where N is the number of points in the point cloud data and din is the feature dimension of each point.

S120, down-sampling the point cloud data by adopting a preset number of coding layers to obtain intermediate feature mapping corresponding to each coding layer; the coding layer comprises a first KNN (K-nearest neighbor) local feature extraction module, a first residual attention module and a random down-sampling module.

Specifically, each coding layer consists of a random sampling operation and an attention module, and since random sampling may cause discarding of characteristics of a plurality of useful points, the problem is solved by using a random downsampling module in combination with a first KNN local characteristic extraction module and a first residual attention module, and effective downsampling of point clouds is completed.

The preset number may be set according to actual requirements, for example, the preset number is 4, that is, 4 encoding layers are adopted to gradually reduce the number of point clouds and correspondingly increase the feature dimension of each point. The following embodiments are described using four coding layers as examples.

The down-sampling rate of each coding layer is quadruple, for example, as shown in fig. 2, the number of points in the input point cloud data is N, the feature dimension is 32, then the number of output point sets of each layer is (N/4, N/16, N/64, N/256), and at the same time, the feature dimension of each layer is gradually increased from point to point (64, 128, 256, 512).

With continued reference to fig. 2, in an embodiment, downsampling the point cloud data for each coding layer to obtain an intermediate feature map corresponding to each coding layer includes:

Wherein, first KNN local feature extraction module obtains first enhancement eigenvector, includes:

carrying out position coding on K adjacent points;

Wherein, transmitting the first enhanced feature vector to the first residual attention module to obtain a first residual feature vector, comprising:

Specifically, K adjacent points of the first query point Pi are collected through a first KNN local feature extraction module, position coding is carried out on the K adjacent points of the first query point Pi, the position coding and the corresponding adjacent point features are connected in series, and a corresponding first enhanced feature vector X is obtained_i. A series of first enhanced feature vectors X_iTransmitting to a first residual attention module, aggregating the features of adjacent points to obtain an aggregated feature vector, and finally summing the input first enhanced feature vector to obtain a sumAnd subtracting the obtained aggregation characteristic vector to obtain a first residual characteristic vector, wherein the first residual characteristic vector is more stable in segmentation due to relativity of the first residual characteristic vector compared with the directly obtained aggregation characteristic vector. At the moment, the first residual error feature vector of each point aggregates important features of K surrounding adjacent points, and random downsampling is continuously carried out on the point cloud at the moment, so that the computing rate is improved, more local information can be reserved, and loss of important points caused by random sampling is avoided.

Specifically, as shown in FIG. 3, Q_i＝W_q·X_i

K_i＝W_k·X_i

V_i＝W_v·X_i

Wherein Q is_i、K_i、V_iRespectively corresponding to the first enhanced feature vector X_iAre respectively a first enhanced feature vector X_iThe query vector, the key vector, and the value vector; w_q、W_k、W_vAre all shared learnable linear transformations.

A_m＝Q_i·K_i ^T

A_f＝A_m·V_i

X_iout＝X_i-A_f

Wherein A is_mIs Attention map, A_fIs Attention feature (i.e. aggregated feature vector element), X_ioutIs the first residual feature vector element.

After passing through the coding layer, a U-net network is adopted, namely, the coder is symmetrically coupled with the decoder.

S130, upsampling the intermediate feature mapping by adopting a preset number of decoding layers to obtain upsampled feature mapping corresponding to each decoding layer; the decoding layer comprises a second KNN local feature extraction module, a second residual attention module and an up-sampling module.

Specifically, the number of decoding layers is the same as the number of encoding layers. In the examples of this application, 4 layers are used.

With continued reference to fig. 2, in an embodiment, upsampling the intermediate feature map by each decoding layer to obtain an upsampled feature map corresponding to each decoding layer includes:

Specifically, each input point feature is subjected to linear layer processing, then batch normalization and ReLU processing are performed, and then the feature is mapped to a point set with higher resolution through adjacent interpolation to obtain an up-sampling feature map.

S140, connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer for summarizing to obtain summarized feature mapping, wherein the summarized feature mapping comprises the following steps:

and connecting the up-sampling feature mapping and the intermediate feature mapping generated by the corresponding coding layer through jump connection for summarizing to obtain the summarized feature mapping.

And S150, mapping the summarized features to a final result by adopting a second MLP layer, and finishing point cloud semantic segmentation.

According to the point cloud semantic segmentation method based on the attention mechanism, by designing a mode of combining the random down-sampling module, the KNN local feature extraction module and the residual attention module, loss of a large amount of important point information possibly caused by random sampling is avoided by aggregating corresponding features of nearest K points through the KNN algorithm and the residual attention mechanism, sampling effectiveness is guaranteed, and meanwhile sampling speed is improved.

By adopting the point cloud semantic segmentation method based on the attention mechanism, the accuracy of vehicle track prediction and the flexibility of a model can be improved, and the robustness of an algorithm is enhanced.

Referring to fig. 4, a schematic structural diagram of a point cloud semantic segmentation apparatus based on an attention mechanism according to an embodiment of the present application is shown.

As shown in fig. 4, the point cloud semantic segmentation apparatus 400 based on attention mechanism may include:

an obtaining module 410, configured to obtain point cloud data, and extract features of each point in the point cloud data through a first MLP layer;

the down-sampling module 420 is configured to down-sample the point cloud data by using a preset number of coding layers to obtain intermediate feature maps corresponding to each coding layer; the coding layer comprises a first KNN local feature extraction module, a first residual error attention module and a random down-sampling module;

the upsampling module 430 is configured to upsample the intermediate feature map by using a preset number of decoding layers to obtain an upsampling feature map corresponding to each decoding layer; the decoding layer comprises a second KNN local feature extraction module, a second residual error attention module and an up-sampling module;

a summarizing module 440, configured to connect the upsampled feature map and the intermediate feature map generated by the corresponding coding layer for summarizing to obtain a summarized feature map;

and a mapping module 450, configured to map the summarized features to a final result by using the second MLP layer.

Optionally, the down-sampling module 420 is further configured to:

carrying out position coding on the K adjacent points;

Optionally, the down-sampling module 420 is further configured to:

Optionally, the upsampling module 430 is further configured to:

Optionally, the summarizing module 440 is further configured to:

The point cloud semantic segmentation apparatus based on the attention mechanism provided by this embodiment may implement the above method embodiments, and the implementation principle and technical effect thereof are similar, and are not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 5, a schematic structural diagram of an electronic device 300 suitable for implementing the embodiments of the present application is shown.

As shown in fig. 5, the electronic apparatus 300 includes a Central Processing Unit (CPU)301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage section 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the apparatus 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output section 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 306 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.

In particular, the process described above with reference to fig. 1 may be implemented as a computer software program, according to an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described attention-based point cloud semantic segmentation method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 309, and/or installed from the removable medium 311.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules do not in some cases constitute a limitation on the units or modules themselves.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a mobile phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

As another aspect, the present application also provides a storage medium, which may be the storage medium contained in the foregoing device in the above embodiment; or may be a storage medium that exists separately and is not assembled into the device. The storage medium stores one or more programs that are used by one or more processors to perform the attention-based point cloud semantic segmentation method described herein.

Storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A point cloud semantic segmentation method based on an attention mechanism is characterized by comprising the following steps:

adopting a preset number of coding layers to perform downsampling on the point cloud data to obtain intermediate feature mapping corresponding to each coding layer; the coding layer comprises a first KNN local feature extraction module, a first residual attention module and a random down-sampling module;

adopting a preset number of decoding layers to perform up-sampling on the intermediate feature mapping to obtain up-sampling feature mapping corresponding to each layer of decoding layer; the decoding layer comprises a second KNN local feature extraction module, a second residual error attention module and an up-sampling module;

2. The method of claim 1, wherein downsampling the point cloud data for each coding layer to obtain an intermediate feature map for each coding layer comprises:

obtaining a first enhanced feature vector through the first KNN local feature extraction module;

transmitting the first enhanced feature vector to the first residual attention module to obtain a first residual feature vector;

and carrying out random downsampling on the first residual error feature vector through the random downsampling module to obtain the intermediate feature mapping.

3. The method according to claim 2, wherein said obtaining, by said first KNN local feature extraction module, a first enhanced feature vector comprises:

collecting K adjacent points of a first query point through the first KNN local feature extraction module;

position coding is carried out on the K adjacent points;

and connecting the K position codes and the corresponding K adjacent point features in series to obtain corresponding first enhanced feature vectors.

4. The method of claim 2, wherein said passing the first enhanced feature vector to the first residual attention module resulting in a first residual feature vector comprises:

transmitting the first enhanced feature vector to the first residual attention module, and aggregating features of adjacent points to obtain an aggregated feature vector;

and subtracting the aggregation characteristic vector from the first enhancement characteristic vector to obtain the first residual characteristic vector.

5. The method of any of claims 1-4, wherein upsampling the intermediate feature map by each of the decoding layers to obtain an upsampled feature map corresponding to each of the decoding layers comprises:

obtaining a second enhanced feature vector through the second KNN local feature extraction module;

transmitting the second feature enhancement vector to the second residual attention module to obtain a second residual feature vector;

and upsampling the second residual error feature vector through adjacent interpolation to obtain the upsampled feature mapping.

6. The method according to any of claims 1-4, wherein said concatenating said upsampled feature map and intermediate feature maps generated by respective said coding layers for summarization to obtain a summarized feature map comprises:

7. An attention mechanism-based point cloud semantic segmentation apparatus, the apparatus comprising:

the acquisition module is used for acquiring point cloud data and extracting the characteristics of each point in the point cloud data through a first MLP layer;

the point cloud data acquisition module is used for acquiring point cloud data of a plurality of coding layers, and acquiring intermediate feature mapping corresponding to each coding layer; the coding layer comprises a first KNN local feature extraction module, a first residual attention module and a random down-sampling module;

8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the attention mechanism based point cloud semantic segmentation method of any one of claims 1-6.

9. A readable storage medium on which a computer program is stored which, when being executed by a processor, implements the method for semantic segmentation of point clouds based on an attention mechanism as claimed in any one of claims 1 to 6.