CN114066987B

CN114066987B - Camera pose estimation method, device, equipment and storage medium

Info

Publication number: CN114066987B
Application number: CN202210029300.7A
Authority: CN
Inventors: 程飞洋; 刘国清; 杨广; 王启程; 郑伟
Original assignee: Shenzhen Minieye Innovation Technology Co Ltd
Current assignee: Shenzhen Youjia Innovation Technology Co.,Ltd.
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-04-26
Anticipated expiration: 2042-01-12
Also published as: CN114066987A

Abstract

The invention discloses a camera position and attitude estimation method, a device, equipment and a storage medium, wherein two adjacent frames of images are obtained, and characteristic points in the two frames of images are extracted, so that the two frames of images are subjected to image block division, and an image block sequence corresponding to each frame of image is generated and extracted; and inputting the image block sequence into the trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result. Compared with the prior art, the pose solving is carried out by training a neural network model, the camera pose solving problems are unified into a joint optimization problem, the complex calculation flow in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera pose is improved, and the method has strong practicability.

Description

Camera pose estimation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for estimating a camera pose.

Background

The camera pose estimation refers to the calculation of motion parameters of a camera, and comprises a vision method relying only on sequence image data, a fusion method relying on laser radar point cloud, a direct measurement method relying on a high-precision pose measurement sensor and the like. The traditional method has the problems that the feature point is difficult to define, the feature point detection repeatability cannot be guaranteed, strong matching features are needed for matching the feature points, more mismatching is easily caused, and the like.

In the prior art, some progress is made in the aspects of feature point detection, feature point feature learning, feature point screening pose optimization solution and the like by methods based on deep learning, but the methods still follow the traditional camera pose estimation process, only solve the precision improvement problem of a certain step or steps, and fail to solve the camera pose solution problem as an integral joint optimization problem.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the camera pose estimation method, the device, the equipment and the storage medium are provided, pose solution is carried out by training a neural network model, the camera pose solution problems are unified into a joint optimization problem, a complex calculation flow in a traditional method is avoided, the calculation amount of the model is reduced, the camera pose acquisition efficiency is improved, and the method has strong practicability.

In order to solve the above technical problem, the present invention provides a camera pose estimation method, including:

acquiring two adjacent frames of images, and performing feature point detection on the two frames of images to generate feature point sets corresponding to the two frames of images;

respectively carrying out image block division on the two frames of images by taking each feature point in the feature point set as a center to generate image block sequences corresponding to the two frames of images;

inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;

the neural network model acquires a feature vector of each image block in the image block sequence through a neural network encoder, acquires a position code of each image block according to a preset position coding rule, and acquires a coding feature of each image block by combining the feature vector and the position code;

inputting the coding features into a global coding module to obtain global coding features of each image block;

and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.

Further, the optimization solution is performed on the basis matrix according to the matching point pairs and the matching probability, and specifically includes:

constructing a linear equation according to the matching point pairs to generate a linear equation set;

and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.

Further, the coding features are input into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:

acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;

and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.

Further, the present invention provides a camera pose estimation device, including: the system comprises an extraction module, a camera pose acquisition module and a neural network training module;

the extraction module is used for acquiring two adjacent frames of images, detecting feature points of the two frames of images, generating feature point sets corresponding to the two frames of images, and respectively dividing image blocks of the two frames of images by taking each feature point in the feature point sets as a center to generate image block sequences corresponding to the two frames of images;

the camera pose acquisition module is used for inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result;

the neural network training module is used for pre-training the neural network model, acquiring a feature vector of each image block in the image block sequence through a neural network encoder, acquiring a position code of each image block according to a preset position coding rule, and acquiring a coding feature of each image block by combining the feature vector and the position code;

Further, the neural network training module is configured to perform optimization solution on a basis matrix according to the matching point pairs and the matching probability, specifically:

Further, the neural network training module is used for inputting the coding features into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows: acquiring coding features of image block sequences corresponding to the two frames of images, and inputting the coding features of each image block into the global coding module so that the coding features of each image block traverse the coding features of all the image blocks to acquire the global coding features corresponding to each image block;

Further, the present invention also provides a terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method as described in any one of the above when executing the computer program.

Further, the present invention also provides a computer-readable storage medium including a stored computer program, wherein when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the camera pose estimation method according to any one of the above.

Compared with the prior art, the camera pose estimation method, the camera pose estimation device, the camera pose estimation equipment and the storage medium have the following beneficial effects:

the image blocks of the two acquired frames of images are divided, the image block sequence is used as the input of the pre-trained neural network model, the problem that the calculated amount of the whole image is too large when the whole image is coded is avoided, the lightweight of the model can be ensured, the pose solving is directly carried out on the basis of the estimated basic matrix output by the trained neural network model, the camera pose solving problem is unified into a joint optimization problem, and the complex calculating process in the traditional method is avoided. And the neural network model is pre-trained, based on the integrated neural network encoder and the matching classifier, a large amount of image data can be used for training the neural network model, the convergence of model training is ensured, an optimized basic matrix is output, and the accuracy of subsequent camera pose solving is improved. Compared with the prior art, the method has the advantages that the position and pose solution is carried out by training a neural network model, the camera position and pose solution problems are unified into a joint optimization problem, the complex calculation process in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera position and pose is improved, and the method has strong practicability.

Drawings

Fig. 1 is a schematic flowchart of an embodiment of a camera pose estimation method provided in the present invention;

FIG. 2 is a schematic structural diagram of a neural network model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an embodiment of a camera pose estimation apparatus provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1, fig. 1 is a schematic flowchart of an embodiment of a camera pose estimation method provided by the present invention, and as shown in fig. 1, the method includes steps 101 to 102, specifically as follows:

step 101: acquiring two adjacent frames of images, and extracting feature points in the two frames of images to divide the two frames of images into image blocks, and generating and extracting an image block sequence corresponding to each frame of image.

In this embodiment, two adjacent frames of images captured by a camera are obtained, and feature point detection is performed on the two frames of images. Specifically, any existing feature point detection method is adopted to detect feature points of two adjacent acquired images, wherein the existing feature point detection method includes, but is not limited to, a traditional FAST, SIFT feature point detection method and a feature point detection method based on deep learning; similarly, in this embodiment, the above-mentioned multiple feature point detection method may also be used to perform multiple detections on the same frame image, so as to obtain more feature points, so as to improve the probability that the detected feature points have matching feature points in another frame image.

In this embodiment, after feature point detection is performed, feature point sets corresponding to the two frames of images are generated; specifically, the two frame images include a first frame image and a second frame image, after feature points of the two frame images are detected by the feature point detection method, feature points in the detected first frame image are obtained, and feature points in the first frame image are collected and recorded as a feature point set P of the first frame image, where P = (P1, P2,.., pN 1), which is common to all

Similarly, the feature points in the acquired second frame image are collected and recorded as a feature point set Q = (Q1, Q2.., qN 2) of the second frame image, and the feature points are collectively recorded as feature points in the second frame image

And (4) a characteristic point.

In this embodiment, with each feature point in the feature point set as a center, image block division is performed on the two frames of images, so as to generate image block sequences corresponding to the two frames of images. Specifically, taking each feature point in the feature point set P of the first frame image and the feature point set Q of the second frame image as a center, intercepting image blocks with the same size in the corresponding frame images, as an example in this embodiment, intercepting image blocks of 16 × 16 with the same size in the corresponding frame images, and performing sorting and aggregation on the image blocks intercepted from the corresponding frame images, an image block sequence corresponding to the first frame image and an image block sequence corresponding to the second frame image can be obtained, where the image block sequences corresponding to the first frame image and the second frame image include all the image blocks partitioned from the frame images.

In this embodiment, all image blocks in the image block sequence corresponding to the acquired first frame image and second frame image are used as input data of a neural network model for subsequent pre-training.

Step 102: and inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result.

In this embodiment, the image block sequences corresponding to the first frame image and the second frame image acquired in step 101 are input into a trained neural network model, and the neural network model can output an estimated fundamental matrix, where the fundamental matrix includes a camera pose, and the camera pose includes a camera rotation matrix and a displacement between the two acquired frame images; and decomposing the output basic matrix to obtain a camera pose estimation result.

In the embodiment, the image block sequence is used as the input of the neural network model, so that the subsequent encoding of the whole image is avoided, the calculated amount of the model is greatly reduced, the embedded type neural network model can operate on an embedded platform, and the embedded type neural network model has high practicability.

In this embodiment, for the decomposition of the basic matrix, a relatively mature method may be directly adopted, for example, the pose parameter of the camera may be solved by calculating the essential matrix through the basic matrix and then decomposing the essential matrix through the decomposesesentatimat function of the OpenCV.

As a preferred scheme in this embodiment, referring to fig. 2, fig. 2 is a schematic structural diagram of a neural network model, a neural network encoder, a transform global encoding module, a matching classifier, and a basis matrix calculation module are integrated in a pre-trained neural network model, so that feature point matching is performed on two frames of images by using the trained neural network model, the feature point matching probability is calculated, and the basis matrix is solved, so that direct calculation of the camera pose between two frames of images can be realized end-to-end, and a complex calculation flow in a conventional method is avoided.

In this embodiment, a neural network encoder, a transform global encoding module, and a matching classifier designed in a neural network model need to be pre-trained. Firstly, acquiring a feature vector of each image block in the image block sequence through a neural network encoder. Specifically, according to the method in step 101, each image block in the image block sequence corresponding to two frames of images is extracted as an input of the neural network model, each image block is encoded into a d-dimensional feature vector through the same neural network encoder, and then the two frames of images generate N feature vectors in total, wherein,

and d is a preset dimension number, such as generating a 128-dimensional feature vector.

In this embodiment, the position code of each image block is further obtained according to a preset position coding rule, and the coding feature of each image block is obtained by combining the feature vector and the position code. Specifically, the preset position coding rule is that corresponding N position codes are generated by aiming at [0, 1., N-1] one-dimensional coordinates; wherein, the definition of the position code is as follows:

；

wherein d is an input feature dimension,

for the position coding of the t-th input, i is the index of the feature dimension.

And adding the N position codes with the feature vectors corresponding to the input N image blocks respectively to serve as the coding features of each image block, so that each coding feature contains the index information of the image block and simultaneously constitutes the input features of a subsequent transform global coding module.

In the embodiment, after the coding features of all image blocks in two image block sequences are input into the Transformer global coding module, the Transformer global coding module completes the global information transmission between the image block sequences corresponding to two frames of images through a multiple attention mechanism, so that the global coding features corresponding to the input of each image block are finally output and contain the required global information; the input coding feature quantity and the output global coding feature quantity of the transform global coding module are the same as the number N of image blocks divided by two frames of images, namely, each image block corresponds to one global coding feature.

In this embodiment, the global coding features output by the transform global coding module are input into a matching classifier to obtain a matching image block index corresponding to the coding feature corresponding to each image block, so as to obtain a matching point pair and a matching probability between the two frames of images. Specifically, the global coding feature corresponding to each image block output by the transform global coding module is used as an input of a matching classifier, so that the global coding feature corresponding to each image block is calculated by the classifier to obtain a matching probability vector, the matching probability vector is used as the matching degree of the feature point represented by the global coding feature and other feature points, the feature point most matched with the feature point represented by the global feature is obtained according to the maximum matching degree, and the two matched feature points form a matching point pair between two frames of images. As an example in this embodiment, for the first frame image, each feature point corresponds to one feature point

The classification probability of the dimension represents the feature point and the feature point in the second frame image

The probability of matching individual feature points and the probability of no matching points. Similarly, for the second frame image, each feature point corresponds to one

The classification probability of the dimension represents the feature point and the first frame image

The probability of matching individual feature points and the probability of no matching points.

As a preferred scheme in this embodiment, a matching point pair required by pre-training is obtained, and a method of directly matching SIFT feature descriptors can be selected, so that parameters of a pre-trained neural network encoder, a Transformer module, and a matching classifier can be helped to converge to the vicinity of an optimized value, and the training time and difficulty of subsequent supervised training on a neural network model are reduced. In the embodiment, in the pre-training process, data labeling is not needed to be carried out on the image and the divided image blocks, a large amount of image data can be used for directly training, data bottleneck does not exist, the output precision of the neural network model is improved, and the convergence speed of the subsequent joint optimization training on the model is ensured.

In this embodiment, the basis matrix is optimized and solved according to the matching point pairs. Specifically, based on the obtained matching point pair of the first frame image and the second frame image, a basis matrix F is generated based on the relationship between the matching point pair and the camera pose parameter, where the basis matrix F satisfies the following formula:

；

；

where K is a known camera intrinsic parameter, R is a camera rotation matrix in the camera pose parameter, t is a displacement in the camera pose parameter,

for the matching point in the first frame image,

is a matching point in the second frame image, wherein,

and

is a pair of matching point pairs for two frame images. And the relationship of the basis matrix F and R, t for the camera pose parameters is described in the equation, the estimated camera pose is thus embodied as a problem that solves the basis matrix F by matching pairs of feature points.

In this embodiment, F is a 3 × 3 matrix with a degree of freedom of 8, so more than 8 matching point pairs can be obtained to solve and calculate F, but since the basic matrix F is solved through the matched feature point pairs, the solved basic matrix F is not accurate enough due to possible errors of the matching point pairs. Therefore, in this embodiment, an end-to-end neural network model is constructed through the basis matrix calculation module, and the matching point pairs, the matching probability and the solved basis matrix F are used as a joint optimization process to perform joint optimization training. The training true value of the basis matrix F can be obtained through calculation of a high-precision pose sensor or motion reconstruction software through global optimization, the obtaining difficulty is relatively low, and the problem that the feature point matching training true value is difficult to obtain is solved.

In this embodiment, based on the obtained matching point pairs and the matching probability thereof, an F for optimal solution can be obtained through the basis matrix calculation module. Specifically, the parameters of the basis matrix F of 3 × 3 are expanded into a parameter vector θ, and the matching point pairs p = (u ', v') and q = (u, v) satisfy the following equation:

;

therefore, the N matching point pairs form N linear equations, which can be formed into a linear equation set:

;

where W is a diagonal matrix, and each diagonal element represents the matching probability of each pair of matching point pairs. And taking the optimal solution of the linear equation set as the optimal solution of the basic matrix F, wherein the optimal solution of the linear equation set can be obtained through a singular value decomposition algorithm.

As a preferred scheme in this embodiment, since the neural network encoder, the transform global encoding module, the matching classifier, and the basis matrix calculation module in the entire neural network model are all derivable, all parameters can be jointly optimized based on an optimization algorithm of gradient descent, that is, supervised training is performed through a real pose. Specifically, a loss function applied to training and evaluating the accuracy of the basis matrix F is calculated, wherein the loss function is a symmetric limit distance, and the definition is as follows:

;

in the formula, p and q are based on the real basic matrix

Virtual matching point pairs, i.e. pairs that satisfy the following real limit geometric constraints:

;

referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of the camera pose estimation apparatus provided by the present invention, as shown in fig. 3, the apparatus includes an extraction module 301, a camera pose acquisition module 302, and a neural network training module 3021, which are as follows:

an extraction module 301, configured to acquire two adjacent frames of images, and extract feature points in the two frames of images, so as to perform image block division on the two frames of images, generate and extract an image block sequence corresponding to each frame of image;

a camera pose acquisition module 302, configured to input the image block sequence into a trained neural network model, so that the neural network model outputs an estimated basic matrix, and decompose the basic matrix to obtain a camera pose estimation result;

the neural network training module 3021 is configured to pre-train the neural network model, obtain a feature vector of each image block in the image block sequence through a neural network encoder, obtain a position code of each image block according to a preset position coding rule, and obtain a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block; and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability.

In this embodiment, the extraction module 301 is configured to obtain two adjacent frames of images, and extract feature points in the two frames of images, so as to perform image block division on the two frames of images, generate and extract an image block sequence corresponding to each frame of image; specifically, two adjacent frames of images are obtained, and feature point detection is performed on the two frames of images, so that feature point sets corresponding to the two frames of images are generated; and respectively carrying out image block division on the two frames of images by taking each feature point in the feature point set as a center to generate image block sequences corresponding to the two frames of images.

In this embodiment, the neural network training module 3021 is configured to perform optimization solution on the basis matrix according to the matching point pairs and the matching probabilities; specifically, a linear equation is constructed according to the matching point pairs, and a linear equation set is generated; and obtaining and solving the optimal solution of the linear equation according to the matching probability of the matching point pair, and taking the optimal solution of the linear equation as the optimal solution of the basic matrix so as to optimally solve the basic matrix.

In this embodiment, the neural network training module 3021 is configured to input the coding features into the global coding module to obtain global coding features of each image block; finally, inputting the global coding characteristics into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images; specifically, the encoding features of the image block sequences corresponding to the two frames of images are obtained, and the encoding features of each image block are input into the global encoding module, so that the encoding features of each image block traverse the encoding features of all the image blocks, and the global encoding features corresponding to each image block are obtained; and inputting the global coding features corresponding to each image block into a matching classifier, and acquiring a matching image block index corresponding to each image block to obtain a matching point pair and a matching probability between the two frames of images.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

It should be noted that the above-mentioned embodiments of the camera pose estimation apparatus are merely illustrative, where the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical units, that is, may be located in one place, or may also be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

On the basis of the above-described embodiment of the camera pose estimation method, another embodiment of the present invention provides a camera pose estimation terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method according to any one of the embodiments of the present invention when executing the computer program.

Illustratively, the computer program may be partitioned in this embodiment into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of instruction segments of a computer program capable of performing a specific function, the instruction segments describing an execution process of the computer program in the camera pose estimation terminal device.

The camera pose estimation terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The camera pose estimation terminal apparatus can include, but is not limited to, a processor, a memory.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor is a control center of the camera pose estimation terminal device, and various interfaces and lines are used to connect the respective parts of the entire camera pose estimation terminal device.

The memory may be configured to store the computer program and/or the module, and the processor may implement various functions of the camera pose estimation terminal device by executing or executing the computer program and/or the module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

On the basis of the above-described embodiment of the camera pose estimation method, another embodiment of the present invention provides a storage medium including a stored computer program, wherein when the computer program runs, an apparatus on which the storage medium is controlled performs the camera pose estimation method according to any one of the embodiments of the present invention.

In this embodiment, the storage medium is a computer-readable storage medium, and the computer program includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, and so on. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In summary, according to the camera pose estimation method, apparatus, device and storage medium provided by the present invention, the two adjacent frames of images are obtained, and the feature points in the two frames of images are extracted, so that the two frames of images are divided into image blocks, and an image block sequence corresponding to each frame of image is generated and extracted; inputting the image block sequence into a trained neural network model so that the neural network model outputs an estimated basic matrix, and decomposing the basic matrix to obtain a camera pose estimation result; the neural network model acquires a feature vector of each image block in the image block sequence through a neural network encoder, acquires a position code of each image block according to a preset position coding rule, and acquires a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block; and inputting the global coding features into a matching classifier, acquiring and obtaining a matching point pair and a matching probability between the two frames of images according to a matching image block index corresponding to each image block, and performing optimization solution on a basic matrix according to the matching point pair and the matching probability. Compared with the prior art, the pose solving is carried out by training a neural network model, the camera pose solving problems are unified into a joint optimization problem, the complex calculation flow in the traditional method is avoided, the calculation amount of the model is reduced, the acquisition efficiency of the camera pose is improved, and the method has strong practicability.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims

1. A camera pose estimation method is characterized by comprising the following steps:

taking each feature point in the feature point set as a center, respectively carrying out image block division on the two frames of images, and generating and extracting image block sequences corresponding to the two frames of images;

2. The method according to claim 1, wherein the optimal solution is performed on the basis matrix according to the matching point pairs and the matching probabilities, and specifically comprises:

3. The method according to claim 1, wherein the coding features are input into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:

4. A camera pose estimation device, comprising: the system comprises an extraction module, a camera pose acquisition module and a neural network training module;

the neural network training module is used for pre-training the neural network model, acquiring a feature vector of each image block in the image block sequence through a neural network encoder, acquiring a position code of each image block according to a preset position coding rule, and acquiring a coding feature of each image block by combining the feature vector and the position code; inputting the coding features into a global coding module to obtain global coding features of each image block;

5. The camera pose estimation apparatus according to claim 4, wherein the neural network training module is configured to perform an optimal solution on a basis matrix according to the matching point pairs and the matching probability, and specifically:

6. The camera pose estimation apparatus according to claim 4, wherein the neural network training module is configured to input the coding features into a global coding module to obtain global coding features of each image block; and finally, inputting the global coding characteristics into a matching classifier, acquiring a matching image block index corresponding to each image block, and obtaining a matching point pair and a matching probability between the two frames of images, wherein the specific steps are as follows:

7. A terminal device characterized by comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the camera pose estimation method according to any one of claims 1 to 3 when executing the computer program.

8. A computer-readable storage medium characterized by comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the camera pose estimation method according to any one of claims 1 to 3.