CN116128941A

CN116128941A - Point cloud registration method based on jumping attention mechanism

Info

Publication number: CN116128941A
Application number: CN202310094361.6A
Authority: CN
Inventors: 武越; 胡西道; 马文萍; 公茂果; 苗启广; 谢飞
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-05-16

Abstract

The invention discloses a point cloud registration method based on a jumping attention mechanism, which comprises the following steps: inputting the origin point cloud and the destination point cloud into a coding and decoding network, and acquiring point corresponding matrixes of the origin point cloud and the destination point cloud and confidence weight corresponding to the initial matching point pairs one by one through the network; performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result; the codec network includes an encoder and a decoder; the encoder is configured to output L-level point cloud interaction characteristics and L attention matrixes through L cascaded encoding units based on the origin point cloud and the destination point cloud; the decoder is configured to output confidence weights in one-to-one correspondence with the initial matching point pairs through the L decoding units. The invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, and can well complete the point cloud registration task and improve the completion efficiency of the point cloud registration.

Description

Point cloud registration method based on jumping attention mechanism

Technical Field

The invention belongs to the field of computer three-dimensional vision, and particularly relates to a point cloud registration method based on a jumping attention mechanism.

Background

With the rapid development of high-precision sensors such as LiDAR and Kinect, the point cloud has become a major data format representing the 3D world. Since the sensor can only capture scans over its limited field of view, a registration algorithm is required to generate a large 3D scene. Point cloud registration is a problem of estimating a transformation matrix between two point cloud scans. Applying the transformation matrix, partial scans of the same 3D scene or object may be merged into one complete 3D point cloud. The value of point cloud registration is its unique and critical role in many computer vision applications.

First, point cloud registration may be used for three-dimensional reconstruction. Generating a complete 3D scene is a fundamental and important technology for various computer vision applications, including high-precision 3D map reconstruction in autopilot, 3D environment reconstruction in robots, and 3D reconstruction for real-time monitoring of underground mining. For example, registration may build a 3D environment for route planning and decision making in robotic applications. Another example is large 3D scene reconstruction in underground mining spaces to accurately monitor mining safety.

Second, point cloud registration may be used for 3D positioning. Locating the position of an agent in a 3D environment is particularly important for robots. For example, an unmanned car estimates its position on a map and its distance from a road boundary line. The point cloud registration may accurately match the current real-time 3D view to the 3D environment to which it belongs to provide a high-precision positioning service. This application shows that registration provides a solution for autonomous agents (e.g. robots or unmanned vehicles) to interact with a 3D environment.

Furthermore, point cloud registration may also be used for pose estimation. Aligning point cloud a (3D real-time view) with another point cloud B (3D environment) may generate pose information for point cloud a related to point cloud B. The gesture information may be used for decisions in the robot. For example, registration may obtain pose information of the robotic arm to decide where to move to accurately grasp the object. The pose estimation application shows that registration also provides a solution to learning the environment agent information.

In the prior art, conventional point cloud registration schemes typically employ optimization-based methods. Among the most commonly used optimization-based methods are Iterative Closest Point (ICP) algorithms (P.J. Besl and N.D. McKay, "Method for registration of 3-d shapes," in Sensor Fusion IV: control Paradigms and Data Structures,1992, pp.586-606). The ICP algorithm requires that the two point clouds to be registered have a good initial position, i.e., that the two point clouds be approximately aligned. The basic idea is to select the nearest point in the two point clouds as the corresponding point, solve the rotation and translation transformation matrix through all the matching point pairs, and make the error between the two point clouds smaller and smaller in a continuous iteration mode until the preset threshold requirement or iteration times are met. The ICP has the disadvantage of easily sinking into a locally optimal solution, and the matching point pairs found in each iteration are only local to the point cloud, so that each iteration is performed only at the overlapping portion of two frames of point clouds, and the overlapping portion may not be the overlapping portion corresponding to the two frames of point clouds. There have also been many improvements based on the ICP algorithm, such as Go-ICP (J.Y ang, H.Li, D.Campbell, and y. Jia, "Go-ICP: A globally optimal solution to d ICP point-set registration," IEEE Transactions on Pattern Analysis and Machine intelligence, pp.2241-2254,2015), which adds a gaussian probability model to the cost function of ICP, the remainder being unchanged, to reduce complexity and real-time.

However, the conventional optimization-based method has a plurality of limitations in performing point cloud registration, and the registration can only be performed on a small-scale point cloud, cannot well perform the function when facing a large-scale point cloud, is very time-consuming, and has low efficiency. In addition, the optimization-based method is easier to fall into a locally optimal solution, and is usually a fine registration process, and a better coarse registration result is required to be obtained by using a coarse registration method before the fine registration process.

At present, research on point cloud registration has gradually turned to learning features through a deep network, and constructing a correspondence between two point clouds through different methods using the extracted features. The point cloud registration method based on the deep learning approximately comprises three steps: extracting discriminative matching features of an origin cloud and a destination point cloud by using a network, and then using the extracted features to construct an initial corresponding relationship between the point clouds; the second step is to filter out the error or low confidence corresponding relation between the point clouds, and to reserve the high-quality corresponding relation in the overlapping area; and thirdly, performing rigid transformation estimation by using the reserved corresponding relation so as to obtain a registration result.

In the prior art, some methods employ encoder-decoder networks to implement the first two steps. The encoder is intended to extract discriminating features for subsequent modules and construct an original point-by-point matching graph, and the decoder is intended to filter the correspondence between point clouds using the features output by the encoder. However, the use of only advanced features extracted by the encoder may result in the lack of some critical information, mainly detailed structural information of the local point cloud region, especially information of the point cloud overlapping region, which is critical for detecting the correct correspondence. While for the task of partially overlapping point cloud registration, not all features of each layer are conducive to corresponding filtering by the encoder.

Some approaches use a jump connection mechanism similar to U-Net to solve this problem. However, revisiting features using a hopped connection structure may introduce information that is detrimental to "correspondence filtering" and limit feature learning capabilities of the overall network. Furthermore, the use of a jump connection structure requires to follow a fixed point feature connection order and cannot be applied to registration of unordered point clouds.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a point cloud registration method based on a jumping attention mechanism.

The technical problems to be solved by the invention are realized by the following technical scheme:

a point cloud registration method based on a jumping attention mechanism, comprising:

acquiring an origin point cloud and a destination point cloud to be registered;

respectively inputting the origin cloud and the destination point cloud into a pre-trained encoding and decoding network, and acquiring point corresponding matrixes of the origin cloud and the destination point cloud and confidence degree weights corresponding to initial matching point pairs one by one through the encoding and decoding network; the initial matching point pairs are obtained based on the point correspondence matrix;

performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result;

Wherein the codec network comprises an encoder and a decoder;

the encoder is configured to output L-level point cloud interaction characteristics and L attention matrixes through L cascaded encoding units based on the origin point cloud and the destination point cloud; the point corresponding matrix is obtained by multiplying the L attention moment matrixes point by point;

the decoder is configured to output the confidence weights corresponding to the initial matching point pairs one by one through L decoding units; the point characteristics output by the 1 st-stage decoding unit are obtained through a jumping attention mechanism and a multi-layer perceptron based on the point cloud interaction characteristics output by the L-stage encoding unit;

the point characteristics output by the L-1 level decoding unit are obtained based on the point cloud interaction characteristics output by the L-l+1 level encoding unit and the point characteristics output by the L-1 level decoding unit; and the confidence degree weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit.

Optionally, the encoding unit includes: the device comprises a feature extraction module, a feature interaction module and an attention module;

the feature extraction module is used for respectively extracting original point cloud features and target point cloud features;

the attention module is used for calculating an attention matrix based on the origin cloud characteristics and the destination point cloud characteristics;

And the feature interaction module is used for calculating point cloud interaction features based on the origin cloud features and the destination point cloud features.

Optionally, the feature extraction module is:

wherein c represents a feature extraction module of the first-stage encoding unit,

the input feature representing c is the dimension O d ^(l-1) Matrix of->

The output characteristic representing c is the dimension O x d ^l/2 O is the number of points of the origin cloud or the destination point cloud, d ^(l-1) Is the longitudinal dimension of the input feature of c.

Optionally, the attention module calculates an attention matrix based on the origin cloud feature and the destination point cloud feature, including:

calculation of

Calculation of

Or calculate->

wherein ,

features representing the ith point of the origin cloud, +.>

Features representing the jth point of the destination point cloud, the superscript T representing the matrix transpose, II ₂ Representing a two-norm operation, ">

For a first attention matrix from the origin cloud to the destination cloud>

For a second attention matrix from the destination point cloud to the origin cloud>

Is->

Attention score of the ith point of the origin cloud and the jth point of the destination point cloud,/for>

Is->

Represents the attention score of the ith point of the destination point cloud and the jth point of the origin point cloud.

Optionally, the feature interaction module calculates a point cloud interaction feature based on the origin cloud feature and the destination point cloud feature, including:

Calculation of

Calculation of

Calculation of

Or calculate

wherein ,

representing origin cloud features, ++>

Representing the destination point cloud feature, maxpool () represents maximum pooling, avgpool () represents average pooling, cat () represents stitching, +.>

Representing a multi-layer perceptron in a representation coding unit, < >>

Representing origin cloud global features,/->

Representing global features of a destination point cloud,>

the superscript T indicates the matrix transpose, ">

Representing origin cloud global interaction embedding ++>

Representing destination point cloud global interaction embedding, alpha ^l and β^l Are all learnable parameters.

Optionally, the method for obtaining the initial matching point pair based on the point correspondence matrix includes:

and obtaining the initial matching point pair by adopting a sparse mapping method or a soft mapping method based on the point corresponding matrix.

Optionally, the decoding unit is specifically configured to:

and searching matching point pairs with similar local structures of the origin cloud and the destination point cloud by using a jumping attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, so as to output the point characteristics of the matching point pairs with low confidence.

Optionally, the jumping attention mechanism includes: a leachable jump attention mechanism or a jump attention mechanism based on cosine similarity.

Optionally, the decoding unit searches, according to the point cloud interaction feature from the encoding unit and the point feature output by the previous stage decoding unit, for a matching point pair having a similar local structure between the origin cloud and the destination point cloud by using the learnable jump attention mechanism, so as to output a point feature of the matching point pair filtered with low confidence, by the following formula:

or ,

wherein ,

is the point cloud interaction feature from the L-l+1 level encoding unit, input to the level I decoding unit,/and/or>

The characteristics input to the first-1 decoding unit and output by the first-1 decoding unit are that P corresponds to an origin point cloud and Q corresponds to a destination point cloud; />

and />

All representing a multi-layer perceptron with learnable parameters, b _ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud _ij Attention score representing the ith point of the destination point cloud and the jth point of the origin point cloud, M being the number of points of the origin point cloud, N being the number of points of the destination point cloud, ω _h 、ω _t and ω_v Are all learnable parameters of the multi-layer perceptron.

Optionally, performing rigid transformation estimation based on the initial matching point pair and the confidence weight to obtain a point cloud registration result, including:

Based on the initial matching point pairs and the confidence coefficient weights, solving rigid transformation by using a preset rigid transformation estimation formula to obtain a point cloud registration result;

the rigid transformation estimation formula includes:

or ,

wherein (R, t) represents the rigid transformation,

representing an ith point p in the destination point cloud and the origin point cloud _i Points with initial correspondence +_>

The confidence weight corresponding to the point is given; />

Representing the saidThe j-th point q of the origin point cloud and the destination point cloud _j Points with initial correspondence, w _qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the point number of the destination point cloud; II ₂ Representing a two-norm operation; pi represents a continuous multiplication operation, R _est and t_est Rotation and translation, respectively, of the European space in the solved rigid transformation.

In the point cloud registration method based on the jumping attention mechanism, the extraction of the initial matching points is realized by using the coding and decoding network, and the confidence weight of the initial matching point pairs is obtained. The codec network includes an encoder and a decoder; l-level point cloud interaction characteristics are respectively output through L cascaded coding units in the encoder, so that global information is shared, interaction between the origin point cloud and the destination point cloud is comprehensively learned, and the encoder can construct better matching characteristics by utilizing low-level geometric information and high-level context sensing information of the point cloud, so that an original point-by-point matching diagram is remarkably enhanced. The decoder bridges the point cloud interaction characteristics in the encoder through the jumping attention structure so as to output confidence scores of the matching point pairs, and therefore the initial matching point pairs are filtered; connecting local point cloud interaction region characteristics in the encoder and point characteristics in the decoder through a jumping attention mechanism between the encoder and the decoder, so that the local region information and global interaction information of each layer of the encoder are fused into the decoder, and the decoder is guided to search for correct matching point pairs with similar local structures; a decoder incorporating a skip attention mechanism can fully exploit and preserve the structural details of the local region, which enables it to efficiently extract high quality correspondence within the overlapping region. Therefore, the invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, provides attention selection for revisiting the characteristics of the registration network under different resolutions, enables the network to selectively combine the characteristics of expected geometric information codes, and avoids the problem of information redundancy, so that the invention can well complete the point cloud registration task and can well improve the completion efficiency of point cloud registration.

In addition, the jump attention mechanism has no pre-requirement on the sequence of the input features, so the invention can be popularized to unordered point clouds.

The present invention will be described in further detail with reference to the accompanying drawings.

Drawings

Fig. 1 is a flowchart of a point cloud registration method based on a jumping attention mechanism provided by an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a codec network used in performing point cloud registration in an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a feature extraction module FE in the network shown in fig. 2;

FIG. 4 is an experimental result of the amount of the embodiment of the present invention on a 3D day dataset with several other prior methods;

FIG. 5 is a visual registration of an embodiment of the present invention with several other prior methods on a 3D map dataset;

FIG. 6 is an experimental result of the amount of an embodiment of the present invention on a KITTI data set with several other prior approaches;

FIG. 7 is a visual registration result on a KITTI dataset for an embodiment of the invention;

FIG. 8 is an experimental result of the amount of the present invention on a ModelNet40 dataset with several other prior methods;

fig. 9 is a visual registration result on a model net40 dataset according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

In order to solve the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, the embodiment of the invention provides a point cloud registration method based on a jump attention mechanism. Referring to fig. 1, the method comprises the steps of:

s10: and acquiring an origin point cloud and a destination point cloud to be registered.

Here, the origin cloud is expressed as

M is the point number of the origin cloud; the destination point cloud is expressed as->

N is the number of points of the destination point cloud.

S20: respectively inputting an origin cloud and a destination point cloud into a pre-trained encoding and decoding network, and acquiring point corresponding matrixes of the origin cloud and the destination point cloud and confidence degree weights corresponding to the initial matching point pairs one by one through the encoding and decoding network; the initial matching point pairs are obtained based on the point correspondence matrix.

The structure of the codec network is shown in fig. 2, and includes an Encoder and a Decoder.

An Encoder Encoder configured to output L-level point cloud interaction features (Encoder features) through L cascaded encoding units, respectively, based on an origin cloud and a destination point cloud

And L attention matrices->

Or->

Here, a->

Representing an attention matrix when aligning from an origin cloud to a destination cloud, < >>

Representing an attention matrix registered from the destination point cloud to the origin cloud, l=1.. L is; the above point corresponding matrix is obtained by multiplying L attention moment matrixes point by point, and the point corresponding matrix is marked as

Or->

As indicated by the dot-wise multiplication.

Wherein the encoding unit includes: the feature extraction module FE, the feature interaction module FI and the attention module (not shown in fig. 2).

The feature extraction module FE in the coding unit is used for respectively extracting original point cloud features and target point cloud features; specifically, two point clouds share the same feature extraction module, and an origin cloud is input to the feature extraction module FE, so that an origin cloud feature can be obtained; and inputting the target point cloud into a feature extraction module FE, so that the target point cloud features can be obtained.

The feature extraction module is expressed as:

the input feature representing c is the dimension O d ^(l-1) Matrix of->

The output characteristic representing c is the dimension O x d ^l/2 O is the number of points of the origin cloud or the destination point cloud, d ^(l-1) Is the longitudinal dimension of the input feature of c. It can be seen that the entire encoder extracts interaction features from the L-level different resolution levels. / >

For example, the feature extraction module may comprise three residual blocks, each residual block comprising two fkcondv layers, three IN (InstanceNorm) layers, two ReLu layers and one convolutional layer (Conv 1 d), as shown in fig. 3, wherein the Addition represents a unit Addition operation.

Through the feature extraction module, the source point cloud P and the destination point cloud Q can be embedded into a public feature space, so that a point corresponding matrix between the two point clouds can be built conveniently.

In the related art, the correspondence between point clouds is predicted to be prone to false matching by using features containing only local information, especially in the case where there is an outlier. The local features do not contain structural information of the larger-scale point clouds and association information between the point clouds, which cannot provide discriminative features for the subsequent matching process to resolve ambiguity problems.

In order to solve the problem, a feature interaction module is introduced in the embodiment of the invention to share global information and comprehensively learn interaction between a source point cloud and a destination point cloud. Specifically, the feature interaction module is used for calculating point cloud interaction features based on the origin cloud features and the destination point cloud features; the specific calculation mode is as follows:

Calculation of

Calculation of

(3) If the registration from the origin cloud to the destination cloud is the case, calculating

(3') if the registration from the destination point cloud to the origin cloud is the case, calculating

wherein ,

representing origin cloud features extracted by the feature extraction module, < >>

Representing the destination point cloud features extracted by the feature extraction module, maxpool () represents maximum pooling, avgpool () represents average pooling, cat () represents stitching, < ->

Representing a multi-layer perceptron in a representation coding unit, < >>

Representing origin cloud global features,/->

Representing global features of a destination point cloud,>

for an interaction matrix, superscript T indicates the matrix transpose,/->

Representing origin cloud global interaction embedding ++>

In this calculation mode, first, for

and />

Applying pooling operations to obtain origin cloud global features

And destination Point cloud Global feature->

Then splicing the two and using the origin cloud and the destination point cloud to share the multi-layer perceptron

It is refined. Then, construct +.>

Representing an interaction matrix, each element in the matrix +.>

Represented at B _l Origin cloud global feature explicitly modeled in +.>

And destination Point cloud Global feature->

Possible interactions between them. To project the information contained in the interaction matrix into the characteristics of each point, the information is projected into the characteristics of each point

Interaction matrix->

Multiply together +.>

And->

Multiplying to obtain origin cloud global interaction embedding +.>

Global interaction embedding with destination point cloud>

Then, will->

and />

As residual terms, pass it through a leachable parameter α ^l and β^l And connecting the original origin cloud characteristics and the destination point cloud characteristics to obtain point cloud interaction characteristics.

It can be understood that, compared with a mode of not performing information interaction between the origin cloud and the destination point cloud or performing interaction only at the deepest layer of the encoder, the embodiment of the invention can well correlate information between the origin cloud and the destination point cloud by introducing the feature interaction modules with different resolution levels into each coding unit, so that features between the point clouds are interdependent, and the extracted features can have more discriminative power and task relevance.

The attention module in the coding unit is used for calculating an attention matrix based on the original point cloud characteristics and the destination point cloud characteristics; specifically, first calculate

Then, if it is the case of origin cloud to destination point cloud registration, calculate +.>

Get attention matrix->

If the point cloud registration of the target point cloud to the original point cloud is the case, calculating +.>

Get attention matrix->

wherein ,

features representing the ith point of the origin cloud, +. >

For a first attention matrix from the origin cloud to the destination cloud>

Is->

Is->

In addition, a point correspondence matrix is calculated

Or->

Then, the method for obtaining the initial matching point pair based on the point corresponding matrix comprises the following steps: based on the point corresponding matrix, an initial matching point pair is obtained by adopting a sparse mapping method or a soft mapping method.

By way of exampleThat is, based on a point correspondence matrix

The method for obtaining the initial matching point pair by adopting the sparse mapping method comprises the following steps:

here the number of the elements is the number,

an ith point p representing the destination point cloud and the origin point cloud _i Points with initial correspondence.

Alternatively, based on a point correspondence matrix

The method for obtaining the initial matching point pair by adopting the soft mapping method comprises the following steps:

thus, for the case of registration from origin cloud to destination point cloud, the resulting initial matching point pair is expressed as

The corresponding characteristic of the point cloud interaction characteristic output at the end of the encoder is +. >

For the case of point cloud registration from the destination point cloud to the original, the resulting initial matching point pair is denoted +.>

The corresponding characteristic of the point cloud interaction characteristic output at the end of the encoder is +.>

/>

A decoder configured to output confidence weights corresponding to the initial matching point pairs one by one through the L decoding units; wherein the level 1 decoding unit outputs a point Feature (Decoder Feature)

The point cloud interaction characteristics based on the output of the L-th level coding unit are obtained through a jumping attention mechanism and a multi-layer perceptron. Point characteristics outputted from the level L-1 decoding section [ E { 2. ]>

Point cloud interaction characteristics output by the L-l+1 level coding unit and point characteristics output by the L-1 level decoding unit>

Obtaining; confidence weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit

Here, the decoder filters low confidence initial matching point pairs by bridging point cloud interaction features in the encoder through a skip attention mechanism to output confidence scores for the initial matching point pairs. It will be appreciated that when processing partially overlapping point cloud registration tasks, only subsets of P and Q match each other. Thus, many incorrect matching pairs need to be filtered out. It is natural that the decoder generates point features in the same way considering that the encoder extracts point cloud interaction features from different resolution levels, so that the decoder in the embodiment of the present invention includes a plurality of decoder units and corresponds to a plurality of encoding units of the encoder one by one, but has the opposite resolution level, i.e. the first stage encoding unit of the encoder corresponds to the L-l+1 stage decoding unit of the decoder. By the jumping attention mechanism, point cloud interaction features extracted by the encoder can establish layer-to-layer connection between point features generated in the decoder, thereby outputting each at the end of the decoder Matching point pair

Or->

Their confidence score +.>

Or->

Referring to fig. 2, each level decoding unit of the decoder includes a skip attention structure SA to convey point cloud interaction characteristics from the corresponding level encoding unit, and a multi-layer perceptron (denoted by M) block to reduce feature dimensions. Based on such a structure, the decoding unit is specifically configured to: and searching matching point pairs with similar local structures of the origin cloud and the destination point cloud by using a jumping attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, thereby outputting the point characteristics of the matching point pairs with low confidence.

The skip attention mechanism refers to a feature pipeline based on attention, and the skip attention mechanism in the decoding unit may include a learnable skip attention mechanism or a skip attention mechanism based on cosine similarity. Wherein the smoothed learner-able attention may hold more information from the original point feature; the non-smooth cosine attention introduces more information from the previous encoder network, which can establish a strong link between the point features in the decoder and the interaction features in the encoder.

The decoding unit searches matching point pairs with similar local structures of the origin cloud and the destination point cloud by utilizing a leavable jump attention mechanism according to the point cloud interaction characteristics from the encoding unit and the point characteristics output by the previous stage decoding unit, so that the point characteristics of the matching point pairs with low confidence are output, and the matching point pairs with low confidence are realized by the following formula:

the calculation formula here corresponds to the case of registration from the origin cloud to the destination point cloud. Wherein the point cloud interaction characteristics in the encoder are added by element

Is fused to the point feature +.>

Is a kind of medium.

For the case of registration from the destination point cloud to the origin point cloud, the calculation formula of the decoding unit is:

wherein ,

and />

All representing a multi-layer perceptron with learnable parameters, b _ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud _ij Attention score representing i-th point of destination point cloud and j-th point of origin point cloud, M is point number of origin point cloud, N is point number of destination point cloud, ω _h 、ω _t and ω_v Are all learnable parameters of the multi-layer perceptron.

In addition, the attention score under the jumping attention mechanism based on cosine similarity is calculated as follows:

here, the case of registration from the destination point cloud to the origin point cloud is shown, and the formula in the case of registration from the origin point cloud to the destination point cloud will not be described in detail.

It will be appreciated that the skip attention structure SA acts as a conduit for communicating the global point cloud interaction features extracted by the encoder with the point features generated by the decoder, and it can fuse the local area information and global interaction information of each layer of the encoder into the decoder, thereby guiding the decoder to search for correct matching point pairs with similar local structures. Wherein the semantic relationship between features in the decoder and the encoder is measured by an attention score. Higher scores indicate more significant pattern similarity. And then, fusing the point cloud interaction characteristics output by the weighted summation decoding unit and the point characteristics output by the previous stage decoding unit, so as to output fused point characteristics, and finally, predicting correct matching point pairs in an overlapping region.

S30: and performing rigid transformation estimation based on the initial matching point pairs and the confidence weights to obtain a point cloud registration result.

Specifically, based on the initial matching point pairs and the confidence coefficient weights, a preset rigid transformation estimation formula is utilized to solve rigid transformation, and a point cloud registration result is obtained.

Wherein, for the case of registering from origin cloud to destination point cloud, the rigid transformation estimation formula is:

for the case of registration from the destination point cloud to the origin point cloud, the rigid transformation estimation formula is:

/>

wherein (R, t) represents a rigid transformation,

an ith point p representing the destination point cloud and the origin point cloud _i Points with initial correspondence +_>

The confidence weight corresponding to the point is given; />

The j-th point q of the origin point cloud and the destination point cloud _j Points with initial correspondence, w _qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the point number of the destination point cloud; II ₂ Representing a two-norm operation; n represents a continuous multiplication operation, which realizes a hard elimination process, namely, 0 weight is allocated to a part of corresponding relation with low confidence score; r is R _est and t_est The rotation and translation of European space in the solved rigid transformation are respectively realized, and the solving method is singular value decomposition.

It will be appreciated that the decoder extracts high quality correspondences by providing corresponding weights for each matching point pair, thereby utilizing the low confidence filtered high confidence matching point pairs to estimate the final rigid transformation. Compared to the original Singular Value Decomposition (SVD) method using equal weights, weighted Procrustes can scale to dense correspondence sets by weighting gradients rather than by position gradients, which can assign 0 weights to partial correspondences with low confidence scores.

In the point cloud registration method based on the jumping attention mechanism, provided by the embodiment of the invention, the extraction of the initial matching points is realized by using the coding and decoding network, and the confidence weight of the initial matching point pair is obtained. The codec network includes an encoder and a decoder; l-level point cloud interaction characteristics are respectively output through L cascaded coding units in the encoder, so that global information is shared, interaction between the origin point cloud and the destination point cloud is comprehensively learned, and the encoder can construct better matching characteristics by utilizing low-level geometric information and high-level context sensing information of the point cloud, so that an original point-by-point matching diagram is remarkably enhanced. The decoder bridges the point cloud interaction characteristics in the encoder through the jumping attention structure so as to output confidence scores of the matching point pairs, and therefore the initial matching point pairs are filtered; connecting local point cloud interaction region characteristics in the encoder and point characteristics in the decoder through a jumping attention mechanism between the encoder and the decoder, so that the local region information and global interaction information of each layer of the encoder are fused into the decoder, and the decoder is guided to search for correct matching point pairs with similar local structures; a decoder incorporating a skip attention mechanism can fully exploit and preserve the structural details of the local region, which enables it to efficiently extract high quality correspondence within the overlapping region. Therefore, the embodiment of the invention solves the problem that a large amount of redundant information is easily introduced by adopting a jump connection mechanism in the existing three-dimensional point cloud registration method based on deep learning so as to limit the learning capacity of the whole network, provides attention selection for revisiting the characteristics of the registration network under different resolutions, enables the network to selectively combine the characteristics of expected geometric information codes, and avoids the problem of information redundancy, so that the invention can well complete the point cloud registration task and can well improve the completion efficiency of point cloud registration.

The following describes the training procedure of the codec network used in the implementation of the present invention.

With respect to the construction of the data set, since training all points during each back and forth cycle is inefficient and unnecessary, embodiments of the present invention use random sampling to extract points from the origin cloud and destination point cloud for training, represent the sample set of the origin cloud with P, represent the sample set of the destination point cloud with Q, P _i Represents the ith point in P, Q _j Represents the j-th point in Q.

Regarding model loss, training encoder is at loss

Training under the supervision of (a); decoder is at loss->

Is trained under the supervision of (a), the total loss is the sum of two losses: />

wherein ,

is a standard cross entropy loss for supervision of the attention matrix->

And

here, a->

For ensuring that a good correspondence can be established between P and Q, it satisfies the formula:

/>

wherein j^* Is the index of the point in Q closest to the i-th point in P under the true value transform,

representing points P found from the correspondence matrix that correspond to the origin cloud _i Points of the target point cloud with the highest probability. ε > 0 is a threshold that controls the minimum radius at which two points are considered to correspond to points.

In the same manner, the calculation is not repeated here.

Calculated is the corresponding filtering loss of the decoder, which is determined by measuring the distance between the input point and the found corresponding point. In practice, this loss assigns a positive label 1 to the point where the correspondence is found correctly and a negative label 0 to the point where the correspondence is not found. As training proceeds, those correspondences that are likely to be correct matches will have a higher confidence score, +.>

The following formula is satisfied:

the effectiveness and superiority of the embodiments of the present invention are illustrated by experimental data and results.

In order to verify the effectiveness and superiority of the coding and decoding network in the point cloud registration, the method provided by the embodiment of the invention is provided with comparison experiments in three data sets; wherein the data set used comprises: 3DMatch dataset, KITTI dataset and model net40 dataset was used. In the process of training the network, pyTorch is used for training, an AdamW optimizer is used for optimizing the network, the weight attenuation is 0.001, and the learning rate is 0.001. Wherein the network is trained for 100 periods on 3DMatch and KITTI and 20 periods on ModelNet 40. All experimental results were obtained using a model trained during the last period. Due to memory limitations, the batch size (batch size) used in all experiments was 1. All experiments were performed on a single RTX3090 Ti graphics card.

The measurement indexes comprise: rotation Error (RE), translation Error (TE), and Registration Recall (RR). The RR refers to the proportion of successfully aligning the point clouds, that is, the proportion of the aligned point clouds with rotation errors and translation errors smaller than a predefined threshold value to all the point clouds; evaluating TE by mean square error; RE is defined as:

wherein ,

representing the true value of the rotation matrix, R _est Representing the predicted value of the rotation matrix, tr () represents the trace of the matrix.

In addition, the average rotation error RE of all registration pairs was recorded during the experiment _all And translation error TE _all And average rotational error RE of a successful registration point cloud set _suc And translation error TE _suc At the same time, the average absolute error between the transformed true value and the predicted value in ModelNet40 is measuredMAE) and Root Mean Square Error (RMSE). Wherein the recall rate is calculated on the 3DMatch dataset using 0.3 meters and 15 ° as thresholds and the confidence score hard threshold is set to 0.6 using 0.6 meters and 5 ° as thresholds on the KITTI dataset.

Experiment 1: the 3DMatch dataset was used as an indoor scene assessment dataset to generate datasets for training and testing. 3DMatch is a set of 62 real world scenes, 46 scenes are used for training in the experimental process, 8 scenes are verified, 8 scenes are tested, and the point cloud overlapping in all scenes is greater than 30%.

Since each point cloud in 3DMatch may have a different number of points, m=n=4096 points are randomly sampled from a voxelized point cloud with a voxel size of 5 cm. The network is trained by setting L in the encoder and decoder to 6 layers and the inlier threshold λ to 0.12. During training, data enhancement is performed by applying random rotation around a random axis variation [ -180 °,180 ° ] and random scaling from 0.8 to 1.2.

The examples of the present invention were compared with existing methods such as DGR (Deep Global Registration), PCAM (Product of Cross-Attention Matrices) and OMNet avoiding RANSAC (Random Sample Consensus). For fair comparison, DGR and PCAM use the same post-processing steps step by step, including filtering correspondences with low correspondence scores, employing optimization measures for fine-grained pose estimation, and using protection measures to preserve a fixed number of hypothetical correspondences. OMNet was trained on 3DMatch using 1024 random points, batch size 32, for 2000 periods.

Quantitative experimental results are shown in fig. 4, in which SACF-Net-Soft and SACF-Net-spark use Sparse mapping and Soft mapping methods for the functions of the embodiments of the present invention, respectively, and they uniformly use a leap attention mechanism based on a learning attention. Where ζ represents the threshold on the confidence score, opt represents fine-grained pose adjustment in DGR, saf represents registration assurance measure in DGR.

As can be seen from fig. 4, the best results were obtained with SACF-Net-spark of the present invention, with registration recall 4.7 percent higher than the nearest competitor. In addition, the SACF-Net-spark of the present invention achieves minimal rotational and translational errors over all registration pairs and successful registration pairs.

The result of the visual registration and the corresponding result between the point clouds is shown in fig. 5. The first 512 correspondences with highest confidence scores are visualized, (a) are overlapping, unregistered input point clouds (blue and yellow); (b) is a true transformation registration result; (c) Is the FCGF (Fully-Convolutional Geometric Features) registration result; (d) is a DGR registration result; (e) is a PCAM registration result; (f) is the SACF-Net-Spars registration result of the present invention; (g) The first 256 pairs of matching points with the highest confidence level are screened out by the method of the embodiment of the invention.

By contrast, the embodiment of the invention can automatically detect the correspondence relationship with more discernment in the overlapped area (such as a table, a chair and a sofa in a scene). Experimental results show that the information of the overlapping area of the point clouds can be selectively transmitted from the encoder to the decoder based on the jumping attention mechanism, so that the decoder can screen out correct matching point pairs with discrimination.

Experiment 2: the KITTI dataset was used as an outdoor assessment dataset containing 21 outdoor scenes, where only the first 11 scenes recorded a rigid transformation of ground truth. Scenes 0 to 5 were used for training, scenes 6 to 7 were verified, and scenes 8 to 10 were tested during the experiment. A GPS-IMU is used to create a scan point cloud pair at least 10 meters apart and the true conversion is calculated by GPS and ICP. In the training process, the point clouds of all experiments were downsampled, the voxel size was 0.3 meters, the interior point threshold was set to 0.6, and m=n=2048 points were randomly sampled.

Embodiments of the present invention were compared to other algorithms, including FGR, FPFH, FCGF, DGR and PCAM. For better comparison with PCAM and DGR, scores were reported using the same ICP refinement operations as they were. The quantitative comparison results are shown in FIG. 6. It can be seen that SACF-Net-spark and SACF-Net-Soft of the embodiments of the present invention are superior to DGR and PCAM in all metrics without adding other operations. SACF-Net-Soft, when combined with ICP, gives better results than other methods on essentially all indicators. As with PCAM, embodiments of the present invention may achieve better results than DGR without using hard thresholds for confidence scores and fine-grained gesture adjustments.

The visualization of an embodiment of the present invention under a KITTI dataset over registration and corresponding filtering tasks is shown in FIG. 7. Wherein (a) is an overlapping, unregistered input point cloud; (b) The registration result (c) of SACF-Net-Soft in the embodiment of the invention is the top 128 pairs of matching points with the highest confidence level screened out by the embodiment of the invention. The result shows that the embodiment of the invention can detect the corresponding points in the overlapping area with more discernment around the camera, thereby obtaining excellent registration results.

Experiment 3: to evaluate the composite object, a ModelNet40 dataset containing composite CAD models was used, containing 12311 CAD models of 40 categories. On this dataset, visible category, invisible category and noise experiments were performed, respectively. And randomly sampling the point cloud from the grid surface of the CAD model according to the data setting in the DCP, cutting and carrying out secondary sampling. The results obtained by the ModelNet40 on the test dataset settings with invisible objects, invisible categories, and invisible objects with noise are provided in FIG. 8. The model is first trained and tested on the same class of training and testing sets, and then the ModelNet40 is evenly divided into training and testing sets by class. The first 20 classes of models were trained and the remaining 20 invisible classes of test sets were tested.

Test results show that the SACF-Net-spark of the embodiment of the invention realizes better results than other methods. In addition, the robustness of the model to noise is also evaluated, which is always present in the real world point cloud. The SACF-Net-Soft of the embodiment of the invention achieves the best performance on all indexes. An example result of noise data is shown in fig. 9, where (a) is an overlapping, unregistered input point cloud; (b) is the SACF-Net-Soft registration result of the present invention; (c) The first 128 pairs of matching points with the highest confidence level are screened out by the method of the embodiment of the invention.

In summary, the embodiment of the invention can well complete the complete overlapping point cloud registration task and the partial overlapping point cloud registration task, and well improve the completion efficiency of point cloud registration.

The method provided by the embodiment of the invention can be applied to electronic equipment. Specifically, the electronic device may be: desktop computers, portable computers, intelligent mobile terminals, servers, etc. Any electronic device capable of implementing the present invention is not limited herein, and falls within the scope of the present invention.

The invention also provides a computer readable storage medium. The computer readable storage medium stores a computer program which, when executed by a processor, implements any of the method steps described above for the point cloud registration method based on the jumping attention mechanism.

Alternatively, the computer readable storage medium may be a Non-Volatile Memory (NVM), such as at least one disk Memory.

Optionally, the computer readable memory may also be at least one memory device located remotely from the aforementioned processor.

In a further embodiment of the invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of any of the above described point cloud registration methods based on a jump attention mechanism is also provided.

It should be noted that, for the storage medium/computer program product embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, and reference should be made to the description of the method embodiments for relevant points.

It should be noted that the terms "first," "second," and the like are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.

Although the invention is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings and the disclosure. In the description of the present invention, the word "comprising" does not exclude other elements or steps, the "a" or "an" does not exclude a plurality, and the "a" or "an" means two or more, unless specifically defined otherwise. Moreover, some measures are described in mutually different embodiments, but this does not mean that these measures cannot be combined to produce a good effect.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects all generally referred to herein as a "module" or "system. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. A computer program may be stored/distributed on a suitable medium supplied together with or as part of other hardware, but may also take other forms, such as via the Internet or other wired or wireless telecommunication systems.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A point cloud registration method based on a jumping attention mechanism, comprising:

acquiring an origin point cloud and a destination point cloud to be registered;

wherein the codec network comprises an encoder and a decoder;

the decoder is configured to output the confidence weights corresponding to the initial matching point pairs one by one through L decoding units; the point characteristics output by the 1 st-stage decoding unit are obtained through a jumping attention mechanism and a multi-layer perceptron based on the point cloud interaction characteristics output by the L-stage encoding unit; the point characteristics output by the L-1 level decoding unit are obtained based on the point cloud interaction characteristics output by the L-l+1 level encoding unit and the point characteristics output by the L-1 level decoding unit; and the confidence degree weight corresponding to the initial matching point pair one by one is the point characteristic output by the L-th level decoding unit.

2. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein the encoding unit includes: the device comprises a feature extraction module, a feature interaction module and an attention module;

3. The point cloud registration method based on a jumping attention mechanism of claim 2, wherein the feature extraction module is:

c:

the input feature representing c is the dimension O d ^(l ^-1) Matrix of->

4. The point cloud registration method based on a jumping attention mechanism of claim 2, wherein the attention module calculates an attention matrix based on the origin cloud feature and the destination point cloud feature, comprising:

Calculation of

Calculation of

Or calculate->

wherein ,

features representing the ith point of the origin cloud, +.>

For a first attention matrix from the origin cloud to the destination cloud>

Is->

Is->

5. The point cloud registration method based on the jumping attention mechanism of claim 2, wherein the feature interaction module calculates a point cloud interaction feature based on the origin cloud feature and the destination point cloud feature, comprising:

calculation of

Calculation of

Calculation of

Or calculate

wherein ,

representing origin cloud features, ++>

The representation represents a multi-layer perceptron in the coding unit,

representing origin cloud global features,/->

Representing global features of a destination point cloud,>

the superscript T denotes a matrix transpose,

representing origin cloud global interaction embedding ++ >

6. The point cloud registration method based on a jumping attention mechanism of claim 1, wherein the manner of obtaining the initial matching point pair based on the point correspondence matrix includes:

7. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein the decoding unit is specifically configured to:

8. The point cloud registration method based on a jumping attention mechanism of claim 7, wherein the jumping attention mechanism comprises: a leachable jump attention mechanism or a jump attention mechanism based on cosine similarity.

9. The point cloud registration method based on the jumping attention mechanism as set forth in claim 8, wherein the decoding unit searches for a matching point pair having a similar local structure of both the origin cloud and the destination point cloud using the leachable jumping attention mechanism based on the point cloud interaction characteristics from the encoding unit and the point characteristics output from the previous stage decoding unit, thereby outputting the point characteristics of the matching point pair filtered with low confidence by the following formula:

or ,

wherein ,

is the point cloud interaction characteristic from the L-L +1 level encoding unit input to the L-level decoding unit,

and />

All representing a multi-layer perceptron with learnable parameters, b _ji Attention score, b, representing the i-th point of the origin cloud and the j-th point of the destination cloud _ij Attention representing the ith point of the destination point cloud and the jth point of the origin point cloudForce score, M is the point number of the origin point cloud, N is the point number of the destination point cloud, omega _h 、ω _t and ω_v Are all learnable parameters of the multi-layer perceptron.

10. The point cloud registration method based on a jumping attention mechanism according to claim 1, wherein performing rigid transformation estimation based on the initial matching point pair and the confidence weight to obtain a point cloud registration result comprises:

the rigid transformation estimation formula includes:

or ,

wherein (R, t) represents the rigid transformation,

The confidence weight corresponding to the point is given; />

Represents the jth point q of the origin point cloud and the destination point cloud _j Points with initial correspondence, w _qj The confidence weight corresponding to the point is given; m is the point number of the origin point cloud, and N is the destination point cloudIs the number of points of (3); II ₂ Representing a two-norm operation; pi represents a continuous multiplication operation, R _est and t_est Rotation and translation, respectively, of the European space in the solved rigid transformation. />