CN116128941A - Point cloud registration method based on jumping attention mechanism - Google Patents

Point cloud registration method based on jumping attention mechanism Download PDF

Info

Publication number
CN116128941A
CN116128941A CN202310094361.6A CN202310094361A CN116128941A CN 116128941 A CN116128941 A CN 116128941A CN 202310094361 A CN202310094361 A CN 202310094361A CN 116128941 A CN116128941 A CN 116128941A
Authority
CN
China
Prior art keywords
point
point cloud
cloud
origin
destination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310094361.6A
Other languages
Chinese (zh)
Inventor
武越
胡西道
马文萍
公茂果
苗启广
谢飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310094361.6A priority Critical patent/CN116128941A/en
Publication of CN116128941A publication Critical patent/CN116128941A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a point cloud registration method based on a jumping attention mechanism, which comprises the following steps: inputting the origin point cloud and the destination point cloud into a coding and decoding network, and acquiring point corresponding matrixes of the origin point cloud and the destination point cloud and confidence weight corresponding to the initial matching point pairs one by one through the network; performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result; the codec network includes an encoder and a decoder; the encoder is configured to output L-level point cloud interaction characteristics and L attention matrixes through L cascaded encoding units based on the origin point cloud and the destination point cloud; the decoder is configured to output confidence weights in one-to-one correspondence with the initial matching point pairs through the L decoding units. The invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, and can well complete the point cloud registration task and improve the completion efficiency of the point cloud registration.

Description

Point cloud registration method based on jumping attention mechanism
Technical Field
The invention belongs to the field of computer three-dimensional vision, and particularly relates to a point cloud registration method based on a jumping attention mechanism.
Background
With the rapid development of high-precision sensors such as LiDAR and Kinect, the point cloud has become a major data format representing the 3D world. Since the sensor can only capture scans over its limited field of view, a registration algorithm is required to generate a large 3D scene. Point cloud registration is a problem of estimating a transformation matrix between two point cloud scans. Applying the transformation matrix, partial scans of the same 3D scene or object may be merged into one complete 3D point cloud. The value of point cloud registration is its unique and critical role in many computer vision applications.
First, point cloud registration may be used for three-dimensional reconstruction. Generating a complete 3D scene is a fundamental and important technology for various computer vision applications, including high-precision 3D map reconstruction in autopilot, 3D environment reconstruction in robots, and 3D reconstruction for real-time monitoring of underground mining. For example, registration may build a 3D environment for route planning and decision making in robotic applications. Another example is large 3D scene reconstruction in underground mining spaces to accurately monitor mining safety.
Second, point cloud registration may be used for 3D positioning. Locating the position of an agent in a 3D environment is particularly important for robots. For example, an unmanned car estimates its position on a map and its distance from a road boundary line. The point cloud registration may accurately match the current real-time 3D view to the 3D environment to which it belongs to provide a high-precision positioning service. This application shows that registration provides a solution for autonomous agents (e.g. robots or unmanned vehicles) to interact with a 3D environment.
Furthermore, point cloud registration may also be used for pose estimation. Aligning point cloud a (3D real-time view) with another point cloud B (3D environment) may generate pose information for point cloud a related to point cloud B. The gesture information may be used for decisions in the robot. For example, registration may obtain pose information of the robotic arm to decide where to move to accurately grasp the object. The pose estimation application shows that registration also provides a solution to learning the environment agent information.
In the prior art, conventional point cloud registration schemes typically employ optimization-based methods. Among the most commonly used optimization-based methods are Iterative Closest Point (ICP) algorithms (P.J. Besl and N.D. McKay, "Method for registration of 3-d shapes," in Sensor Fusion IV: control Paradigms and Data Structures,1992, pp.586-606). The ICP algorithm requires that the two point clouds to be registered have a good initial position, i.e., that the two point clouds be approximately aligned. The basic idea is to select the nearest point in the two point clouds as the corresponding point, solve the rotation and translation transformation matrix through all the matching point pairs, and make the error between the two point clouds smaller and smaller in a continuous iteration mode until the preset threshold requirement or iteration times are met. The ICP has the disadvantage of easily sinking into a locally optimal solution, and the matching point pairs found in each iteration are only local to the point cloud, so that each iteration is performed only at the overlapping portion of two frames of point clouds, and the overlapping portion may not be the overlapping portion corresponding to the two frames of point clouds. There have also been many improvements based on the ICP algorithm, such as Go-ICP (J.Y ang, H.Li, D.Campbell, and y. Jia, "Go-ICP: A globally optimal solution to d ICP point-set registration," IEEE Transactions on Pattern Analysis and Machine intelligence, pp.2241-2254,2015), which adds a gaussian probability model to the cost function of ICP, the remainder being unchanged, to reduce complexity and real-time.
However, the conventional optimization-based method has a plurality of limitations in performing point cloud registration, and the registration can only be performed on a small-scale point cloud, cannot well perform the function when facing a large-scale point cloud, is very time-consuming, and has low efficiency. In addition, the optimization-based method is easier to fall into a locally optimal solution, and is usually a fine registration process, and a better coarse registration result is required to be obtained by using a coarse registration method before the fine registration process.
At present, research on point cloud registration has gradually turned to learning features through a deep network, and constructing a correspondence between two point clouds through different methods using the extracted features. The point cloud registration method based on the deep learning approximately comprises three steps: extracting discriminative matching features of an origin cloud and a destination point cloud by using a network, and then using the extracted features to construct an initial corresponding relationship between the point clouds; the second step is to filter out the error or low confidence corresponding relation between the point clouds, and to reserve the high-quality corresponding relation in the overlapping area; and thirdly, performing rigid transformation estimation by using the reserved corresponding relation so as to obtain a registration result.
In the prior art, some methods employ encoder-decoder networks to implement the first two steps. The encoder is intended to extract discriminating features for subsequent modules and construct an original point-by-point matching graph, and the decoder is intended to filter the correspondence between point clouds using the features output by the encoder. However, the use of only advanced features extracted by the encoder may result in the lack of some critical information, mainly detailed structural information of the local point cloud region, especially information of the point cloud overlapping region, which is critical for detecting the correct correspondence. While for the task of partially overlapping point cloud registration, not all features of each layer are conducive to corresponding filtering by the encoder.
Some approaches use a jump connection mechanism similar to U-Net to solve this problem. However, revisiting features using a hopped connection structure may introduce information that is detrimental to "correspondence filtering" and limit feature learning capabilities of the overall network. Furthermore, the use of a jump connection structure requires to follow a fixed point feature connection order and cannot be applied to registration of unordered point clouds.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a point cloud registration method based on a jumping attention mechanism.
The technical problems to be solved by the invention are realized by the following technical scheme:
a point cloud registration method based on a jumping attention mechanism, comprising:
acquiring an origin point cloud and a destination point cloud to be registered;
respectively inputting the origin cloud and the destination point cloud into a pre-trained encoding and decoding network, and acquiring point corresponding matrixes of the origin cloud and the destination point cloud and confidence degree weights corresponding to initial matching point pairs one by one through the encoding and decoding network; the initial matching point pairs are obtained based on the point correspondence matrix;
performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result;
Wherein the codec network comprises an encoder and a decoder;
the encoder is configured to output L-level point cloud interaction characteristics and L attention matrixes through L cascaded encoding units based on the origin point cloud and the destination point cloud; the point corresponding matrix is obtained by multiplying the L attention moment matrixes point by point;
the decoder is configured to output the confidence weights corresponding to the initial matching point pairs one by one through L decoding units; the point characteristics output by the 1 st-stage decoding unit are obtained through a jumping attention mechanism and a multi-layer perceptron based on the point cloud interaction characteristics output by the L-stage encoding unit;
the point characteristics output by the L-1 level decoding unit are obtained based on the point cloud interaction characteristics output by the L-l+1 level encoding unit and the point characteristics output by the L-1 level decoding unit; and the confidence degree weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit.
Optionally, the encoding unit includes: the device comprises a feature extraction module, a feature interaction module and an attention module;
the feature extraction module is used for respectively extracting original point cloud features and target point cloud features;
the attention module is used for calculating an attention matrix based on the origin cloud characteristics and the destination point cloud characteristics;
And the feature interaction module is used for calculating point cloud interaction features based on the origin cloud features and the destination point cloud features.
Optionally, the feature extraction module is:
Figure BDA0004085940800000041
wherein c represents a feature extraction module of the first-stage encoding unit,
Figure BDA0004085940800000042
the input feature representing c is the dimension O d (l-1) Matrix of->
Figure BDA0004085940800000043
The output characteristic representing c is the dimension O x d l/2 O is the number of points of the origin cloud or the destination point cloud, d (l-1) Is the longitudinal dimension of the input feature of c.
Optionally, the attention module calculates an attention matrix based on the origin cloud feature and the destination point cloud feature, including:
calculation of
Figure BDA0004085940800000051
Calculation of
Figure BDA0004085940800000052
Or calculate->
Figure BDA0004085940800000053
wherein ,
Figure BDA0004085940800000054
features representing the ith point of the origin cloud, +.>
Figure BDA0004085940800000055
Features representing the jth point of the destination point cloud, the superscript T representing the matrix transpose, II 2 Representing a two-norm operation, ">
Figure BDA0004085940800000056
For a first attention matrix from the origin cloud to the destination cloud>
Figure BDA0004085940800000057
For a second attention matrix from the destination point cloud to the origin cloud>
Figure BDA0004085940800000058
Is->
Figure BDA0004085940800000059
Attention score of the ith point of the origin cloud and the jth point of the destination point cloud,/for>
Figure BDA00040859408000000510
Is->
Figure BDA00040859408000000511
Represents the attention score of the ith point of the destination point cloud and the jth point of the origin point cloud.
Optionally, the feature interaction module calculates a point cloud interaction feature based on the origin cloud feature and the destination point cloud feature, including:
Calculation of
Figure BDA00040859408000000512
Calculation of
Figure BDA00040859408000000513
Calculation of
Figure BDA00040859408000000514
Or calculate
Figure BDA00040859408000000515
wherein ,
Figure BDA00040859408000000516
representing origin cloud features, ++>
Figure BDA00040859408000000517
Representing the destination point cloud feature, maxpool () represents maximum pooling, avgpool () represents average pooling, cat () represents stitching, +.>
Figure BDA00040859408000000518
Representing a multi-layer perceptron in a representation coding unit, < >>
Figure BDA00040859408000000519
Representing origin cloud global features,/->
Figure BDA00040859408000000520
Representing global features of a destination point cloud,>
Figure BDA00040859408000000521
the superscript T indicates the matrix transpose, ">
Figure BDA00040859408000000522
Representing origin cloud global interaction embedding ++>
Figure BDA00040859408000000523
Representing destination point cloud global interaction embedding, alpha l and βl Are all learnable parameters.
Optionally, the method for obtaining the initial matching point pair based on the point correspondence matrix includes:
and obtaining the initial matching point pair by adopting a sparse mapping method or a soft mapping method based on the point corresponding matrix.
Optionally, the decoding unit is specifically configured to:
and searching matching point pairs with similar local structures of the origin cloud and the destination point cloud by using a jumping attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, so as to output the point characteristics of the matching point pairs with low confidence.
Optionally, the jumping attention mechanism includes: a leachable jump attention mechanism or a jump attention mechanism based on cosine similarity.
Optionally, the decoding unit searches, according to the point cloud interaction feature from the encoding unit and the point feature output by the previous stage decoding unit, for a matching point pair having a similar local structure between the origin cloud and the destination point cloud by using the learnable jump attention mechanism, so as to output a point feature of the matching point pair filtered with low confidence, by the following formula:
Figure BDA0004085940800000061
Figure BDA0004085940800000062
or ,
Figure BDA0004085940800000063
Figure BDA0004085940800000064
wherein ,
Figure BDA0004085940800000065
is the point cloud interaction feature from the L-l+1 level encoding unit, input to the level I decoding unit,/and/or>
Figure BDA0004085940800000066
The characteristics input to the first-1 decoding unit and output by the first-1 decoding unit are that P corresponds to an origin point cloud and Q corresponds to a destination point cloud; />
Figure BDA0004085940800000067
and />
Figure BDA0004085940800000068
All representing a multi-layer perceptron with learnable parameters, b ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud ij Attention score representing the ith point of the destination point cloud and the jth point of the origin point cloud, M being the number of points of the origin point cloud, N being the number of points of the destination point cloud, ω h 、ω t and ωv Are all learnable parameters of the multi-layer perceptron.
Optionally, performing rigid transformation estimation based on the initial matching point pair and the confidence weight to obtain a point cloud registration result, including:
Based on the initial matching point pairs and the confidence coefficient weights, solving rigid transformation by using a preset rigid transformation estimation formula to obtain a point cloud registration result;
the rigid transformation estimation formula includes:
Figure BDA0004085940800000071
or ,
Figure BDA0004085940800000072
wherein (R, t) represents the rigid transformation,
Figure BDA0004085940800000073
representing an ith point p in the destination point cloud and the origin point cloud i Points with initial correspondence +_>
Figure BDA0004085940800000074
The confidence weight corresponding to the point is given; />
Figure BDA0004085940800000075
Representing the saidThe j-th point q of the origin point cloud and the destination point cloud j Points with initial correspondence, w qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the point number of the destination point cloud; II 2 Representing a two-norm operation; pi represents a continuous multiplication operation, R est and test Rotation and translation, respectively, of the European space in the solved rigid transformation.
In the point cloud registration method based on the jumping attention mechanism, the extraction of the initial matching points is realized by using the coding and decoding network, and the confidence weight of the initial matching point pairs is obtained. The codec network includes an encoder and a decoder; l-level point cloud interaction characteristics are respectively output through L cascaded coding units in the encoder, so that global information is shared, interaction between the origin point cloud and the destination point cloud is comprehensively learned, and the encoder can construct better matching characteristics by utilizing low-level geometric information and high-level context sensing information of the point cloud, so that an original point-by-point matching diagram is remarkably enhanced. The decoder bridges the point cloud interaction characteristics in the encoder through the jumping attention structure so as to output confidence scores of the matching point pairs, and therefore the initial matching point pairs are filtered; connecting local point cloud interaction region characteristics in the encoder and point characteristics in the decoder through a jumping attention mechanism between the encoder and the decoder, so that the local region information and global interaction information of each layer of the encoder are fused into the decoder, and the decoder is guided to search for correct matching point pairs with similar local structures; a decoder incorporating a skip attention mechanism can fully exploit and preserve the structural details of the local region, which enables it to efficiently extract high quality correspondence within the overlapping region. Therefore, the invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, provides attention selection for revisiting the characteristics of the registration network under different resolutions, enables the network to selectively combine the characteristics of expected geometric information codes, and avoids the problem of information redundancy, so that the invention can well complete the point cloud registration task and can well improve the completion efficiency of point cloud registration.
In addition, the jump attention mechanism has no pre-requirement on the sequence of the input features, so the invention can be popularized to unordered point clouds.
The present invention will be described in further detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart of a point cloud registration method based on a jumping attention mechanism provided by an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a codec network used in performing point cloud registration in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a feature extraction module FE in the network shown in fig. 2;
FIG. 4 is an experimental result of the amount of the embodiment of the present invention on a 3D day dataset with several other prior methods;
FIG. 5 is a visual registration of an embodiment of the present invention with several other prior methods on a 3D map dataset;
FIG. 6 is an experimental result of the amount of an embodiment of the present invention on a KITTI data set with several other prior approaches;
FIG. 7 is a visual registration result on a KITTI dataset for an embodiment of the invention;
FIG. 8 is an experimental result of the amount of the present invention on a ModelNet40 dataset with several other prior methods;
fig. 9 is a visual registration result on a model net40 dataset according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
In order to solve the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, the embodiment of the invention provides a point cloud registration method based on a jump attention mechanism. Referring to fig. 1, the method comprises the steps of:
s10: and acquiring an origin point cloud and a destination point cloud to be registered.
Here, the origin cloud is expressed as
Figure BDA0004085940800000091
M is the point number of the origin cloud; the destination point cloud is expressed as->
Figure BDA0004085940800000092
N is the number of points of the destination point cloud.
S20: respectively inputting an origin cloud and a destination point cloud into a pre-trained encoding and decoding network, and acquiring point corresponding matrixes of the origin cloud and the destination point cloud and confidence degree weights corresponding to the initial matching point pairs one by one through the encoding and decoding network; the initial matching point pairs are obtained based on the point correspondence matrix.
The structure of the codec network is shown in fig. 2, and includes an Encoder and a Decoder.
An Encoder Encoder configured to output L-level point cloud interaction features (Encoder features) through L cascaded encoding units, respectively, based on an origin cloud and a destination point cloud
Figure BDA0004085940800000093
And L attention matrices->
Figure BDA0004085940800000094
Or->
Figure BDA0004085940800000095
Here, a->
Figure BDA0004085940800000096
Representing an attention matrix when aligning from an origin cloud to a destination cloud, < >>
Figure BDA0004085940800000097
Representing an attention matrix registered from the destination point cloud to the origin cloud, l=1.. L is; the above point corresponding matrix is obtained by multiplying L attention moment matrixes point by point, and the point corresponding matrix is marked as
Figure BDA0004085940800000098
Or->
Figure BDA0004085940800000099
As indicated by the dot-wise multiplication.
Wherein the encoding unit includes: the feature extraction module FE, the feature interaction module FI and the attention module (not shown in fig. 2).
The feature extraction module FE in the coding unit is used for respectively extracting original point cloud features and target point cloud features; specifically, two point clouds share the same feature extraction module, and an origin cloud is input to the feature extraction module FE, so that an origin cloud feature can be obtained; and inputting the target point cloud into a feature extraction module FE, so that the target point cloud features can be obtained.
The feature extraction module is expressed as:
Figure BDA0004085940800000101
wherein c represents a feature extraction module of the first-stage encoding unit,
Figure BDA0004085940800000102
the input feature representing c is the dimension O d (l-1) Matrix of->
Figure BDA0004085940800000103
The output characteristic representing c is the dimension O x d l/2 O is the number of points of the origin cloud or the destination point cloud, d (l-1) Is the longitudinal dimension of the input feature of c. It can be seen that the entire encoder extracts interaction features from the L-level different resolution levels. / >
For example, the feature extraction module may comprise three residual blocks, each residual block comprising two fkcondv layers, three IN (InstanceNorm) layers, two ReLu layers and one convolutional layer (Conv 1 d), as shown in fig. 3, wherein the Addition represents a unit Addition operation.
Through the feature extraction module, the source point cloud P and the destination point cloud Q can be embedded into a public feature space, so that a point corresponding matrix between the two point clouds can be built conveniently.
In the related art, the correspondence between point clouds is predicted to be prone to false matching by using features containing only local information, especially in the case where there is an outlier. The local features do not contain structural information of the larger-scale point clouds and association information between the point clouds, which cannot provide discriminative features for the subsequent matching process to resolve ambiguity problems.
In order to solve the problem, a feature interaction module is introduced in the embodiment of the invention to share global information and comprehensively learn interaction between a source point cloud and a destination point cloud. Specifically, the feature interaction module is used for calculating point cloud interaction features based on the origin cloud features and the destination point cloud features; the specific calculation mode is as follows:
Calculation of
Figure BDA0004085940800000104
Calculation of
Figure BDA0004085940800000105
(3) If the registration from the origin cloud to the destination cloud is the case, calculating
Figure BDA0004085940800000111
Figure BDA0004085940800000112
(3') if the registration from the destination point cloud to the origin cloud is the case, calculating
Figure BDA0004085940800000113
Figure BDA0004085940800000114
wherein ,
Figure BDA0004085940800000115
representing origin cloud features extracted by the feature extraction module, < >>
Figure BDA0004085940800000116
Representing the destination point cloud features extracted by the feature extraction module, maxpool () represents maximum pooling, avgpool () represents average pooling, cat () represents stitching, < ->
Figure BDA0004085940800000117
Representing a multi-layer perceptron in a representation coding unit, < >>
Figure BDA0004085940800000118
Representing origin cloud global features,/->
Figure BDA0004085940800000119
Representing global features of a destination point cloud,>
Figure BDA00040859408000001110
for an interaction matrix, superscript T indicates the matrix transpose,/->
Figure BDA00040859408000001111
Representing origin cloud global interaction embedding ++>
Figure BDA00040859408000001112
Representing destination point cloud global interaction embedding, alpha l and βl Are all learnable parameters.
In this calculation mode, first, for
Figure BDA00040859408000001113
and />
Figure BDA00040859408000001114
Applying pooling operations to obtain origin cloud global features
Figure BDA00040859408000001115
And destination Point cloud Global feature->
Figure BDA00040859408000001116
Then splicing the two and using the origin cloud and the destination point cloud to share the multi-layer perceptron
Figure BDA00040859408000001117
It is refined. Then, construct +.>
Figure BDA00040859408000001118
Representing an interaction matrix, each element in the matrix +.>
Figure BDA00040859408000001119
Represented at B l Origin cloud global feature explicitly modeled in +.>
Figure BDA00040859408000001120
And destination Point cloud Global feature->
Figure BDA00040859408000001121
Possible interactions between them. To project the information contained in the interaction matrix into the characteristics of each point, the information is projected into the characteristics of each point
Figure BDA00040859408000001122
Interaction matrix->
Figure BDA00040859408000001123
Multiply together +.>
Figure BDA00040859408000001124
And->
Figure BDA00040859408000001125
Multiplying to obtain origin cloud global interaction embedding +.>
Figure BDA00040859408000001126
Global interaction embedding with destination point cloud>
Figure BDA00040859408000001127
Then, will->
Figure BDA00040859408000001128
and />
Figure BDA00040859408000001129
As residual terms, pass it through a leachable parameter α l and βl And connecting the original origin cloud characteristics and the destination point cloud characteristics to obtain point cloud interaction characteristics.
It can be understood that, compared with a mode of not performing information interaction between the origin cloud and the destination point cloud or performing interaction only at the deepest layer of the encoder, the embodiment of the invention can well correlate information between the origin cloud and the destination point cloud by introducing the feature interaction modules with different resolution levels into each coding unit, so that features between the point clouds are interdependent, and the extracted features can have more discriminative power and task relevance.
The attention module in the coding unit is used for calculating an attention matrix based on the original point cloud characteristics and the destination point cloud characteristics; specifically, first calculate
Figure BDA0004085940800000121
Then, if it is the case of origin cloud to destination point cloud registration, calculate +.>
Figure BDA0004085940800000122
Get attention matrix->
Figure BDA0004085940800000123
If the point cloud registration of the target point cloud to the original point cloud is the case, calculating +.>
Figure BDA0004085940800000124
Get attention matrix->
Figure BDA0004085940800000125
wherein ,
Figure BDA0004085940800000126
features representing the ith point of the origin cloud, +. >
Figure BDA0004085940800000127
Features representing the jth point of the destination point cloud, the superscript T representing the matrix transpose, II 2 Representing a two-norm operation, ">
Figure BDA00040859408000001219
For a first attention matrix from the origin cloud to the destination cloud>
Figure BDA0004085940800000128
For a second attention matrix from the destination point cloud to the origin cloud>
Figure BDA0004085940800000129
Is->
Figure BDA00040859408000001210
Attention score of the ith point of the origin cloud and the jth point of the destination point cloud,/for>
Figure BDA00040859408000001211
Is->
Figure BDA00040859408000001212
Represents the attention score of the ith point of the destination point cloud and the jth point of the origin point cloud.
In addition, a point correspondence matrix is calculated
Figure BDA00040859408000001213
Or->
Figure BDA00040859408000001214
Then, the method for obtaining the initial matching point pair based on the point corresponding matrix comprises the following steps: based on the point corresponding matrix, an initial matching point pair is obtained by adopting a sparse mapping method or a soft mapping method.
By way of exampleThat is, based on a point correspondence matrix
Figure BDA00040859408000001215
The method for obtaining the initial matching point pair by adopting the sparse mapping method comprises the following steps:
Figure BDA00040859408000001216
here the number of the elements is the number,
Figure BDA00040859408000001217
an ith point p representing the destination point cloud and the origin point cloud i Points with initial correspondence.
Alternatively, based on a point correspondence matrix
Figure BDA00040859408000001218
The method for obtaining the initial matching point pair by adopting the soft mapping method comprises the following steps:
Figure BDA0004085940800000131
thus, for the case of registration from origin cloud to destination point cloud, the resulting initial matching point pair is expressed as
Figure BDA0004085940800000132
The corresponding characteristic of the point cloud interaction characteristic output at the end of the encoder is +. >
Figure BDA0004085940800000133
For the case of point cloud registration from the destination point cloud to the original, the resulting initial matching point pair is denoted +.>
Figure BDA0004085940800000134
The corresponding characteristic of the point cloud interaction characteristic output at the end of the encoder is +.>
Figure BDA0004085940800000135
/>
A decoder configured to output confidence weights corresponding to the initial matching point pairs one by one through the L decoding units; wherein the level 1 decoding unit outputs a point Feature (Decoder Feature)
Figure BDA0004085940800000136
The point cloud interaction characteristics based on the output of the L-th level coding unit are obtained through a jumping attention mechanism and a multi-layer perceptron. Point characteristics outputted from the level L-1 decoding section [ E { 2. ]>
Figure BDA0004085940800000137
Point cloud interaction characteristics output by the L-l+1 level coding unit and point characteristics output by the L-1 level decoding unit>
Figure BDA0004085940800000138
Obtaining; confidence weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit
Figure BDA0004085940800000139
Here, the decoder filters low confidence initial matching point pairs by bridging point cloud interaction features in the encoder through a skip attention mechanism to output confidence scores for the initial matching point pairs. It will be appreciated that when processing partially overlapping point cloud registration tasks, only subsets of P and Q match each other. Thus, many incorrect matching pairs need to be filtered out. It is natural that the decoder generates point features in the same way considering that the encoder extracts point cloud interaction features from different resolution levels, so that the decoder in the embodiment of the present invention includes a plurality of decoder units and corresponds to a plurality of encoding units of the encoder one by one, but has the opposite resolution level, i.e. the first stage encoding unit of the encoder corresponds to the L-l+1 stage decoding unit of the decoder. By the jumping attention mechanism, point cloud interaction features extracted by the encoder can establish layer-to-layer connection between point features generated in the decoder, thereby outputting each at the end of the decoder Matching point pair
Figure BDA00040859408000001310
Or->
Figure BDA00040859408000001311
Their confidence score +.>
Figure BDA0004085940800000141
Or->
Figure BDA0004085940800000142
Referring to fig. 2, each level decoding unit of the decoder includes a skip attention structure SA to convey point cloud interaction characteristics from the corresponding level encoding unit, and a multi-layer perceptron (denoted by M) block to reduce feature dimensions. Based on such a structure, the decoding unit is specifically configured to: and searching matching point pairs with similar local structures of the origin cloud and the destination point cloud by using a jumping attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, thereby outputting the point characteristics of the matching point pairs with low confidence.
The skip attention mechanism refers to a feature pipeline based on attention, and the skip attention mechanism in the decoding unit may include a learnable skip attention mechanism or a skip attention mechanism based on cosine similarity. Wherein the smoothed learner-able attention may hold more information from the original point feature; the non-smooth cosine attention introduces more information from the previous encoder network, which can establish a strong link between the point features in the decoder and the interaction features in the encoder.
The decoding unit searches matching point pairs with similar local structures of the origin cloud and the destination point cloud by utilizing a leavable jump attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, so that the point characteristics of the matching point pairs with low confidence are output, and the matching point pairs with low confidence are realized by the following formula:
Figure BDA0004085940800000143
Figure BDA0004085940800000144
the calculation formula here corresponds to the case of registration from the origin cloud to the destination point cloud. Wherein the point cloud interaction characteristics in the encoder are added by element
Figure BDA0004085940800000145
Is fused to the point feature +.>
Figure BDA0004085940800000146
Is a kind of medium.
For the case of registration from the destination point cloud to the origin point cloud, the calculation formula of the decoding unit is:
Figure BDA0004085940800000151
Figure BDA0004085940800000152
wherein ,
Figure BDA0004085940800000153
is the point cloud interaction feature from the L-l+1 level encoding unit, input to the level I decoding unit,/and/or>
Figure BDA0004085940800000154
The characteristics input to the first-1 decoding unit and output by the first-1 decoding unit are that P corresponds to an origin point cloud and Q corresponds to a destination point cloud; />
Figure BDA0004085940800000155
and />
Figure BDA0004085940800000156
All representing a multi-layer perceptron with learnable parameters, b ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud ij Attention score representing i-th point of destination point cloud and j-th point of origin point cloud, M is point number of origin point cloud, N is point number of destination point cloud, ω h 、ω t and ωv Are all learnable parameters of the multi-layer perceptron.
In addition, the attention score under the jumping attention mechanism based on cosine similarity is calculated as follows:
Figure BDA0004085940800000157
here, the case of registration from the destination point cloud to the origin point cloud is shown, and the formula in the case of registration from the origin point cloud to the destination point cloud will not be described in detail.
It will be appreciated that the skip attention structure SA acts as a conduit for communicating the global point cloud interaction features extracted by the encoder with the point features generated by the decoder, and it can fuse the local area information and global interaction information of each layer of the encoder into the decoder, thereby guiding the decoder to search for correct matching point pairs with similar local structures. Wherein the semantic relationship between features in the decoder and the encoder is measured by an attention score. Higher scores indicate more significant pattern similarity. And then, fusing the point cloud interaction characteristics output by the weighted summation decoding unit and the point characteristics output by the previous stage decoding unit, so as to output fused point characteristics, and finally, predicting correct matching point pairs in an overlapping region.
S30: and performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result.
Specifically, based on the initial matching point pairs and the confidence coefficient weights, a preset rigid transformation estimation formula is utilized to solve rigid transformation, and a point cloud registration result is obtained.
Wherein, for the case of registering from origin cloud to destination point cloud, the rigid transformation estimation formula is:
Figure BDA0004085940800000161
for the case of registration from the destination point cloud to the origin point cloud, the rigid transformation estimation formula is:
Figure BDA0004085940800000162
/>
wherein (R, t) represents a rigid transformation,
Figure BDA0004085940800000163
an ith point p representing the destination point cloud and the origin point cloud i Points with initial correspondence +_>
Figure BDA0004085940800000164
The confidence weight corresponding to the point is given; />
Figure BDA0004085940800000165
The j-th point q of the origin point cloud and the destination point cloud j Points with initial correspondence, w qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the point number of the destination point cloud; II 2 Representing a two-norm operation; n represents a continuous multiplication operation, which realizes a hard elimination process, namely, 0 weight is allocated to a part of corresponding relation with low confidence score; r is R est and test The rotation and translation of European space in the solved rigid transformation are respectively realized, and the solving method is singular value decomposition.
It will be appreciated that the decoder extracts high quality correspondences by providing corresponding weights for each matching point pair, thereby utilizing the low confidence filtered high confidence matching point pairs to estimate the final rigid transformation. Compared to the original Singular Value Decomposition (SVD) method using equal weights, weighted Procrustes can scale to dense correspondence sets by weighting gradients rather than by position gradients, which can assign 0 weights to partial correspondences with low confidence scores.
In the point cloud registration method based on the jumping attention mechanism, provided by the embodiment of the invention, the extraction of the initial matching points is realized by using the coding and decoding network, and the confidence weight of the initial matching point pair is obtained. The codec network includes an encoder and a decoder; l-level point cloud interaction characteristics are respectively output through L cascaded coding units in the encoder, so that global information is shared, interaction between the origin point cloud and the destination point cloud is comprehensively learned, and the encoder can construct better matching characteristics by utilizing low-level geometric information and high-level context sensing information of the point cloud, so that an original point-by-point matching diagram is remarkably enhanced. The decoder bridges the point cloud interaction characteristics in the encoder through the jumping attention structure so as to output confidence scores of the matching point pairs, and therefore the initial matching point pairs are filtered; connecting local point cloud interaction region characteristics in the encoder and point characteristics in the decoder through a jumping attention mechanism between the encoder and the decoder, so that the local region information and global interaction information of each layer of the encoder are fused into the decoder, and the decoder is guided to search for correct matching point pairs with similar local structures; a decoder incorporating a skip attention mechanism can fully exploit and preserve the structural details of the local region, which enables it to efficiently extract high quality correspondence within the overlapping region. Therefore, the embodiment of the invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, provides attention selection for revisiting the characteristics of the registration network under different resolutions, enables the network to selectively combine the characteristics of expected geometric information codes, and avoids the problem of information redundancy, so that the invention can well complete the point cloud registration task and can well improve the completion efficiency of point cloud registration.
The following describes the training procedure of the codec network used in the implementation of the present invention.
With respect to the construction of the data set, since training all points during each back and forth cycle is inefficient and unnecessary, embodiments of the present invention use random sampling to extract points from the origin cloud and destination point cloud for training, represent the sample set of the origin cloud with P, represent the sample set of the destination point cloud with Q, P i Represents the ith point in P, Q j Represents the j-th point in Q.
Regarding model loss, training encoder is at loss
Figure BDA0004085940800000171
Training under the supervision of (a); decoder is at loss->
Figure BDA0004085940800000172
Is trained under the supervision of (a), the total loss is the sum of two losses: />
Figure BDA0004085940800000173
wherein ,
Figure BDA0004085940800000174
is a standard cross entropy loss for supervision of the attention matrix->
Figure BDA0004085940800000175
And
Figure BDA0004085940800000176
here, a->
Figure BDA0004085940800000177
For ensuring that a good correspondence can be established between P and Q, it satisfies the formula:
Figure BDA0004085940800000181
/>
Figure BDA0004085940800000182
Figure BDA0004085940800000183
wherein j* Is the index of the point in Q closest to the i-th point in P under the true value transform,
Figure BDA00040859408000001810
representing points P found from the correspondence matrix that correspond to the origin cloud i Points of the target point cloud with the highest probability. ε > 0 is a threshold that controls the minimum radius at which two points are considered to correspond to points.
Figure BDA0004085940800000184
In the same manner, the calculation is not repeated here.
Figure BDA0004085940800000185
Calculated is the corresponding filtering loss of the decoder, which is determined by measuring the distance between the input point and the found corresponding point. In practice, this loss assigns a positive label 1 to the point where the correspondence is found correctly and a negative label 0 to the point where the correspondence is not found. As training proceeds, those correspondences that are likely to be correct matches will have a higher confidence score, +.>
Figure BDA0004085940800000186
The following formula is satisfied:
Figure BDA0004085940800000187
Figure BDA0004085940800000188
Figure BDA0004085940800000189
the effectiveness and superiority of the embodiments of the present invention are illustrated by experimental data and results.
In order to verify the effectiveness and superiority of the coding and decoding network in the point cloud registration, the method provided by the embodiment of the invention is provided with comparison experiments in three data sets; wherein the data set used comprises: 3DMatch dataset, KITTI dataset and model net40 dataset was used. In the process of training the network, pyTorch is used for training, an AdamW optimizer is used for optimizing the network, the weight attenuation is 0.001, and the learning rate is 0.001. Wherein the network is trained for 100 periods on 3DMatch and KITTI and 20 periods on ModelNet 40. All experimental results were obtained using a model trained during the last period. Due to memory limitations, the batch size (batch size) used in all experiments was 1. All experiments were performed on a single RTX3090 Ti graphics card.
The measurement indexes comprise: rotation Error (RE), translation Error (TE), and Registration Recall (RR). The RR refers to the proportion of successfully aligning the point clouds, that is, the proportion of the aligned point clouds with rotation errors and translation errors smaller than a predefined threshold value to all the point clouds; evaluating TE by mean square error; RE is defined as:
Figure BDA0004085940800000191
wherein ,
Figure BDA0004085940800000192
representing the true value of the rotation matrix, R est Representing the predicted value of the rotation matrix, tr () represents the trace of the matrix.
In addition, the average rotation error RE of all registration pairs was recorded during the experiment all And translation error TE all And average rotational error RE of a successful registration point cloud set suc And translation error TE suc At the same time, the average absolute error between the transformed true value and the predicted value in ModelNet40 is measuredMAE) and Root Mean Square Error (RMSE). Wherein the recall rate is calculated on the 3DMatch dataset using 0.3 meters and 15 ° as thresholds and the confidence score hard threshold is set to 0.6 using 0.6 meters and 5 ° as thresholds on the KITTI dataset.
Experiment 1: the 3DMatch dataset was used as an indoor scene assessment dataset to generate datasets for training and testing. 3DMatch is a set of 62 real world scenes, 46 scenes are used for training in the experimental process, 8 scenes are verified, 8 scenes are tested, and the point cloud overlapping in all scenes is greater than 30%.
Since each point cloud in 3DMatch may have a different number of points, m=n=4096 points are randomly sampled from a voxelized point cloud with a voxel size of 5 cm. The network is trained by setting L in the encoder and decoder to 6 layers and the inlier threshold λ to 0.12. During training, data enhancement is performed by applying random rotation around a random axis variation [ -180 °,180 ° ] and random scaling from 0.8 to 1.2.
The examples of the present invention were compared with existing methods such as DGR (Deep Global Registration), PCAM (Product of Cross-Attention Matrices) and OMNet avoiding RANSAC (Random Sample Consensus). For fair comparison, DGR and PCAM use the same post-processing steps step by step, including filtering correspondences with low correspondence scores, employing optimization measures for fine-grained pose estimation, and using protection measures to preserve a fixed number of hypothetical correspondences. OMNet was trained on 3DMatch using 1024 random points, batch size 32, for 2000 periods.
Quantitative experimental results are shown in fig. 4, in which SACF-Net-Soft and SACF-Net-spark use Sparse mapping and Soft mapping methods for the functions of the embodiments of the present invention, respectively, and they uniformly use a leap attention mechanism based on a learning attention. Where ζ represents the threshold on the confidence score, opt represents fine-grained pose adjustment in DGR, saf represents registration assurance measure in DGR.
As can be seen from fig. 4, the best results were obtained with SACF-Net-spark of the present invention, with registration recall 4.7 percent higher than the nearest competitor. In addition, the SACF-Net-spark of the present invention achieves minimal rotational and translational errors over all registration pairs and successful registration pairs.
The result of the visual registration and the corresponding result between the point clouds is shown in fig. 5. The first 512 correspondences with highest confidence scores are visualized, (a) are overlapping, unregistered input point clouds (blue and yellow); (b) is a true transformation registration result; (c) Is the FCGF (Fully-Convolutional Geometric Features) registration result; (d) is a DGR registration result; (e) is a PCAM registration result; (f) is the SACF-Net-Spars registration result of the present invention; (g) The first 256 pairs of matching points with the highest confidence level are screened out by the method of the embodiment of the invention.
By contrast, the embodiment of the invention can automatically detect the correspondence relationship with more discernment in the overlapped area (such as a table, a chair and a sofa in a scene). Experimental results show that the information of the overlapping area of the point clouds can be selectively transmitted from the encoder to the decoder based on the jumping attention mechanism, so that the decoder can screen out correct matching point pairs with discrimination.
Experiment 2: the KITTI dataset was used as an outdoor assessment dataset containing 21 outdoor scenes, where only the first 11 scenes recorded a rigid transformation of ground truth. Scenes 0 to 5 were used for training, scenes 6 to 7 were verified, and scenes 8 to 10 were tested during the experiment. A GPS-IMU is used to create a scan point cloud pair at least 10 meters apart and the true conversion is calculated by GPS and ICP. In the training process, the point clouds of all experiments were downsampled, the voxel size was 0.3 meters, the interior point threshold was set to 0.6, and m=n=2048 points were randomly sampled.
Embodiments of the present invention were compared to other algorithms, including FGR, FPFH, FCGF, DGR and PCAM. For better comparison with PCAM and DGR, scores were reported using the same ICP refinement operations as they were. The quantitative comparison results are shown in FIG. 6. It can be seen that SACF-Net-spark and SACF-Net-Soft of the embodiments of the present invention are superior to DGR and PCAM in all metrics without adding other operations. SACF-Net-Soft, when combined with ICP, gives better results than other methods on essentially all indicators. As with PCAM, embodiments of the present invention may achieve better results than DGR without using hard thresholds for confidence scores and fine-grained gesture adjustments.
The visualization of an embodiment of the present invention under a KITTI dataset over registration and corresponding filtering tasks is shown in FIG. 7. Wherein (a) is an overlapping, unregistered input point cloud; (b) The registration result (c) of SACF-Net-Soft in the embodiment of the invention is the top 128 pairs of matching points with the highest confidence level screened out by the embodiment of the invention. The result shows that the embodiment of the invention can detect the corresponding points in the overlapping area with more discernment around the camera, thereby obtaining excellent registration results.
Experiment 3: to evaluate the composite object, a ModelNet40 dataset containing composite CAD models was used, containing 12311 CAD models of 40 categories. On this dataset, visible category, invisible category and noise experiments were performed, respectively. And randomly sampling the point cloud from the grid surface of the CAD model according to the data setting in the DCP, cutting and carrying out secondary sampling. The results obtained by the ModelNet40 on the test dataset settings with invisible objects, invisible categories, and invisible objects with noise are provided in FIG. 8. The model is first trained and tested on the same class of training and testing sets, and then the ModelNet40 is evenly divided into training and testing sets by class. The first 20 classes of models were trained and the remaining 20 invisible classes of test sets were tested.
Test results show that the SACF-Net-spark of the embodiment of the invention realizes better results than other methods. In addition, the robustness of the model to noise is also evaluated, which is always present in the real world point cloud. The SACF-Net-Soft of the embodiment of the invention achieves the best performance on all indexes. An example result of noise data is shown in fig. 9, where (a) is an overlapping, unregistered input point cloud; (b) is the SACF-Net-Soft registration result of the present invention; (c) The first 128 pairs of matching points with the highest confidence level are screened out by the method of the embodiment of the invention.
In summary, the embodiment of the invention can well complete the complete overlapping point cloud registration task and the partial overlapping point cloud registration task, and well improve the completion efficiency of point cloud registration.
The method provided by the embodiment of the invention can be applied to electronic equipment. Specifically, the electronic device may be: desktop computers, portable computers, intelligent mobile terminals, servers, etc. Any electronic device capable of implementing the present invention is not limited herein, and falls within the scope of the present invention.
The invention also provides a computer readable storage medium. The computer readable storage medium stores a computer program which, when executed by a processor, implements any of the method steps described above for the point cloud registration method based on the jumping attention mechanism.
Alternatively, the computer readable storage medium may be a Non-Volatile Memory (NVM), such as at least one disk Memory.
Optionally, the computer readable memory may also be at least one memory device located remotely from the aforementioned processor.
In a further embodiment of the invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of any of the above described point cloud registration methods based on a jump attention mechanism is also provided.
It should be noted that, for the storage medium/computer program product embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, and reference should be made to the description of the method embodiments for relevant points.
It should be noted that the terms "first," "second," and the like are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.
Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings and the disclosure. In the description of the present invention, the word "comprising" does not exclude other elements or steps, the "a" or "an" does not exclude a plurality, and the "a" or "an" means two or more, unless specifically defined otherwise. Moreover, some measures are described in mutually different embodiments, but this does not mean that these measures cannot be combined to produce a good effect.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects all generally referred to herein as a "module" or "system. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. A computer program may be stored/distributed on a suitable medium supplied together with or as part of other hardware, but may also take other forms, such as via the Internet or other wired or wireless telecommunication systems.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (10)

1. A point cloud registration method based on a jumping attention mechanism, comprising:
acquiring an origin point cloud and a destination point cloud to be registered;
respectively inputting the origin cloud and the destination point cloud into a pre-trained encoding and decoding network, and acquiring point corresponding matrixes of the origin cloud and the destination point cloud and confidence degree weights corresponding to initial matching point pairs one by one through the encoding and decoding network; the initial matching point pairs are obtained based on the point correspondence matrix;
performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result;
wherein the codec network comprises an encoder and a decoder;
the encoder is configured to output L-level point cloud interaction characteristics and L attention matrixes through L cascaded encoding units based on the origin point cloud and the destination point cloud; the point corresponding matrix is obtained by multiplying the L attention moment matrixes point by point;
the decoder is configured to output the confidence weights corresponding to the initial matching point pairs one by one through L decoding units; the point characteristics output by the 1 st-stage decoding unit are obtained through a jumping attention mechanism and a multi-layer perceptron based on the point cloud interaction characteristics output by the L-stage encoding unit; the point characteristics output by the L-1 level decoding unit are obtained based on the point cloud interaction characteristics output by the L-l+1 level encoding unit and the point characteristics output by the L-1 level decoding unit; and the confidence degree weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit.
2. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein the encoding unit includes: the device comprises a feature extraction module, a feature interaction module and an attention module;
the feature extraction module is used for respectively extracting original point cloud features and target point cloud features;
the attention module is used for calculating an attention matrix based on the origin cloud characteristics and the destination point cloud characteristics;
and the feature interaction module is used for calculating point cloud interaction features based on the origin cloud features and the destination point cloud features.
3. The point cloud registration method based on a jumping attention mechanism of claim 2, wherein the feature extraction module is:
c:
Figure FDA0004085940780000021
wherein c represents a feature extraction module of the first-stage encoding unit,
Figure FDA0004085940780000022
the input feature representing c is the dimension O d (l -1) Matrix of->
Figure FDA0004085940780000023
The output characteristic representing c is the dimension O x d l/2 O is the number of points of the origin cloud or the destination point cloud, d (l-1) Is the longitudinal dimension of the input feature of c.
4. The point cloud registration method based on a jumping attention mechanism of claim 2, wherein the attention module calculates an attention matrix based on the origin cloud feature and the destination point cloud feature, comprising:
Calculation of
Figure FDA0004085940780000024
Calculation of
Figure FDA0004085940780000025
Or calculate->
Figure FDA0004085940780000026
wherein ,
Figure FDA0004085940780000027
features representing the ith point of the origin cloud, +.>
Figure FDA0004085940780000028
Features representing the jth point of the destination point cloud, the superscript T representing the matrix transpose, II 2 Representing a two-norm operation, ">
Figure FDA0004085940780000029
For a first attention matrix from the origin cloud to the destination cloud>
Figure FDA00040859407800000210
For a second attention matrix from the destination point cloud to the origin cloud>
Figure FDA00040859407800000211
Is->
Figure FDA00040859407800000212
Attention score of the ith point of the origin cloud and the jth point of the destination point cloud,/for>
Figure FDA00040859407800000213
Is->
Figure FDA00040859407800000214
Represents the attention score of the ith point of the destination point cloud and the jth point of the origin point cloud.
5. The point cloud registration method based on the jumping attention mechanism of claim 2, wherein the feature interaction module calculates a point cloud interaction feature based on the origin cloud feature and the destination point cloud feature, comprising:
calculation of
Figure FDA00040859407800000215
Calculation of
Figure FDA00040859407800000216
Calculation of
Figure FDA0004085940780000031
Or calculate
Figure FDA0004085940780000032
wherein ,
Figure FDA0004085940780000033
representing origin cloud features, ++>
Figure FDA0004085940780000034
Representing the destination point cloud feature, maxpool () represents maximum pooling, avgpool () represents average pooling, cat () represents stitching, +.>
Figure FDA0004085940780000035
The representation represents a multi-layer perceptron in the coding unit,
Figure FDA0004085940780000036
representing origin cloud global features,/->
Figure FDA0004085940780000037
Representing global features of a destination point cloud,>
Figure FDA0004085940780000038
the superscript T denotes a matrix transpose,
Figure FDA0004085940780000039
representing origin cloud global interaction embedding ++ >
Figure FDA00040859407800000310
Representing destination point cloud global interaction embedding, alpha l and βl Are all learnable parameters.
6. The point cloud registration method based on a jumping attention mechanism of claim 1, wherein the manner of obtaining the initial matching point pair based on the point correspondence matrix includes:
and obtaining the initial matching point pair by adopting a sparse mapping method or a soft mapping method based on the point corresponding matrix.
7. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein the decoding unit is specifically configured to:
and searching matching point pairs with similar local structures of the origin cloud and the destination point cloud by using a jumping attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, so as to output the point characteristics of the matching point pairs with low confidence.
8. The point cloud registration method based on a jumping attention mechanism of claim 7, wherein the jumping attention mechanism comprises: a leachable jump attention mechanism or a jump attention mechanism based on cosine similarity.
9. The point cloud registration method based on the jumping attention mechanism as set forth in claim 8, wherein the decoding unit searches for a matching point pair having a similar local structure of both the origin cloud and the destination point cloud using the leachable jumping attention mechanism based on the point cloud interaction characteristics from the encoding unit and the point characteristics output from the previous stage decoding unit, thereby outputting the point characteristics of the matching point pair filtered with low confidence by the following formula:
Figure FDA0004085940780000041
Figure FDA0004085940780000042
or ,
Figure FDA0004085940780000043
Figure FDA0004085940780000044
wherein ,
Figure FDA0004085940780000045
is the point cloud interaction characteristic from the L-L +1 level encoding unit input to the L-level decoding unit,
Figure FDA0004085940780000046
the characteristics input to the first-1 decoding unit and output by the first-1 decoding unit are that P corresponds to an origin point cloud and Q corresponds to a destination point cloud; />
Figure FDA0004085940780000047
and />
Figure FDA0004085940780000048
All representing a multi-layer perceptron with learnable parameters, b ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud ij Attention representing the ith point of the destination point cloud and the jth point of the origin point cloudForce score, M is the point number of the origin point cloud, N is the point number of the destination point cloud, omega h 、ω t and ωv Are all learnable parameters of the multi-layer perceptron.
10. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein performing rigid transformation estimation based on the initial matching point pair and the confidence weight to obtain a point cloud registration result comprises:
based on the initial matching point pairs and the confidence coefficient weights, solving rigid transformation by using a preset rigid transformation estimation formula to obtain a point cloud registration result;
the rigid transformation estimation formula includes:
Figure FDA0004085940780000049
or ,
Figure FDA0004085940780000051
wherein (R, t) represents the rigid transformation,
Figure FDA0004085940780000052
Representing an ith point p in the destination point cloud and the origin point cloud i Points with initial correspondence +_>
Figure FDA0004085940780000053
The confidence weight corresponding to the point is given; />
Figure FDA0004085940780000054
Represents the jth point q of the origin point cloud and the destination point cloud j Points with initial correspondence, w qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the destination point cloudIs the number of points of (3); II 2 Representing a two-norm operation; pi represents a continuous multiplication operation, R est and test Rotation and translation, respectively, of the European space in the solved rigid transformation. />
CN202310094361.6A 2023-02-08 2023-02-08 Point cloud registration method based on jumping attention mechanism Pending CN116128941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310094361.6A CN116128941A (en) 2023-02-08 2023-02-08 Point cloud registration method based on jumping attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310094361.6A CN116128941A (en) 2023-02-08 2023-02-08 Point cloud registration method based on jumping attention mechanism

Publications (1)

Publication Number Publication Date
CN116128941A true CN116128941A (en) 2023-05-16

Family

ID=86307950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310094361.6A Pending CN116128941A (en) 2023-02-08 2023-02-08 Point cloud registration method based on jumping attention mechanism

Country Status (1)

Country Link
CN (1) CN116128941A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3693922A1 (en) * 2019-02-11 2020-08-12 Siemens Aktiengesellschaft An apparatus and a method for performing a data driven pairwise registration of three-dimensional point clouds
CN112837356A (en) * 2021-02-06 2021-05-25 湖南大学 WGAN-based unsupervised multi-view three-dimensional point cloud joint registration method
CN113139996A (en) * 2021-05-06 2021-07-20 南京大学 Point cloud registration method and system based on three-dimensional point cloud geometric feature learning
US20210350245A1 (en) * 2020-05-11 2021-11-11 Research & Business Foundation Sungkyunkwan University Point autoencoder, dual autoencoder, and dimensional transformation method of point cloud using same
CN113724340A (en) * 2021-07-09 2021-11-30 北京工业大学 Guiding type face image editing method and system based on jump connection attention
CN114332175A (en) * 2021-12-16 2022-04-12 广东工业大学 Attention mechanism-based low-overlap 3D dynamic point cloud registration method and system
US20220164566A1 (en) * 2020-11-20 2022-05-26 Shenzhen Deeproute.Ai Co., Ltd Methods for encoding point cloud feature
CN114627170A (en) * 2022-03-11 2022-06-14 平安科技(深圳)有限公司 Three-dimensional point cloud registration method and device, computer equipment and storage medium
CN114708315A (en) * 2022-04-15 2022-07-05 云南大学 Point cloud registration method and system based on depth virtual corresponding point generation
EP4047565A1 (en) * 2021-02-19 2022-08-24 Teraki GmbH Low level sensor fusion based on lightweight semantic segmentation of 3d point clouds
CN115186804A (en) * 2022-08-04 2022-10-14 吉林大学 Encoder-decoder network structure and point cloud data classification and segmentation method adopting same
CN115375910A (en) * 2022-09-14 2022-11-22 清华大学 Point cloud feature extraction method and device based on attention mechanism
US20220414821A1 (en) * 2021-06-29 2022-12-29 The Regents Of The University Of Michigan Systems and methods for point cloud registration

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3693922A1 (en) * 2019-02-11 2020-08-12 Siemens Aktiengesellschaft An apparatus and a method for performing a data driven pairwise registration of three-dimensional point clouds
US20210350245A1 (en) * 2020-05-11 2021-11-11 Research & Business Foundation Sungkyunkwan University Point autoencoder, dual autoencoder, and dimensional transformation method of point cloud using same
US20220164566A1 (en) * 2020-11-20 2022-05-26 Shenzhen Deeproute.Ai Co., Ltd Methods for encoding point cloud feature
CN112837356A (en) * 2021-02-06 2021-05-25 湖南大学 WGAN-based unsupervised multi-view three-dimensional point cloud joint registration method
EP4047565A1 (en) * 2021-02-19 2022-08-24 Teraki GmbH Low level sensor fusion based on lightweight semantic segmentation of 3d point clouds
CN113139996A (en) * 2021-05-06 2021-07-20 南京大学 Point cloud registration method and system based on three-dimensional point cloud geometric feature learning
US20220414821A1 (en) * 2021-06-29 2022-12-29 The Regents Of The University Of Michigan Systems and methods for point cloud registration
CN113724340A (en) * 2021-07-09 2021-11-30 北京工业大学 Guiding type face image editing method and system based on jump connection attention
CN114332175A (en) * 2021-12-16 2022-04-12 广东工业大学 Attention mechanism-based low-overlap 3D dynamic point cloud registration method and system
CN114627170A (en) * 2022-03-11 2022-06-14 平安科技(深圳)有限公司 Three-dimensional point cloud registration method and device, computer equipment and storage medium
CN114708315A (en) * 2022-04-15 2022-07-05 云南大学 Point cloud registration method and system based on depth virtual corresponding point generation
CN115186804A (en) * 2022-08-04 2022-10-14 吉林大学 Encoder-decoder network structure and point cloud data classification and segmentation method adopting same
CN115375910A (en) * 2022-09-14 2022-11-22 清华大学 Point cloud feature extraction method and device based on attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUE WU ET AL.: "SACF-Net: Skip-Attention Based Correspondence Filtering Network for Point Cloud Registration", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, pages 1 - 11 *
ZHENGHUA ZHANG ET AL.: "DDRNet: Fast point cloud registration network for large-scale scenes Author links open overlay panel", 《ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING》, vol. 175, pages 184 - 198 *
海琳琦 等: "基于动态图注意力机制的秦俑点云鲁棒配准", 《光学精密工程》, vol. 30, no. 24, pages 3210 - 3224 *

Similar Documents

Publication Publication Date Title
Chen et al. Automatic building information model reconstruction in high-density urban areas: Augmenting multi-source data with architectural knowledge
Verykokou et al. UAV-based 3D modelling of disaster scenes for Urban Search and Rescue
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
CN111832615A (en) Sample expansion method and system based on foreground and background feature fusion
CN110136058B (en) Drawing construction method based on overlook spliced drawing and vehicle-mounted terminal
CN110223351B (en) Depth camera positioning method based on convolutional neural network
CN110009614A (en) Method and apparatus for output information
CN114092697B (en) Building facade semantic segmentation method with attention fused with global and local depth features
CN113012177A (en) Three-dimensional point cloud segmentation method based on geometric feature extraction and edge perception coding
CN113838005A (en) Intelligent rock fracture identification and three-dimensional reconstruction method and system based on dimension conversion
CN113781519A (en) Target tracking method and target tracking device
CN114842180B (en) Point cloud completion method, device, equipment and medium
CN115409896A (en) Pose prediction method, pose prediction device, electronic device and medium
Zhao et al. Large-scale monocular SLAM by local bundle adjustment and map joining
CN115017805A (en) Method and system for planning optimal path of nuclear retired field based on bidirectional A-x algorithm
CN112489119B (en) Monocular vision positioning method for enhancing reliability
US11810251B2 (en) Remote sensing method to model terrain shape by detecting reliable ground points
CN114565092A (en) Neural network structure determining method and device
WO2023209560A1 (en) Machine learning for vector map generation
CN116128941A (en) Point cloud registration method based on jumping attention mechanism
De Geyter et al. Automated training data creation for semantic segmentation of 3D point clouds
CN115457081A (en) Hierarchical fusion prediction method based on graph neural network
Lu et al. Image-based 3D reconstruction for Multi-Scale civil and infrastructure Projects: A review from 2012 to 2022 with new perspective from deep learning methods
CN115393501A (en) Information processing method and device
CN111915618B (en) Peak response enhancement-based instance segmentation algorithm and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination